Hi Tatsuya,

On Thu, Jun 3, 2010 at 5:06 PM, Tatsuya Kawano <[email protected]>wrote:

> Hello,
>
> I remember Jon was talking other day that he was trying a single HBase
> server with existing HDFS cluster to serve map reduce (MR) results. I wonder
> if this went well or not.
>
> A couple of friends in Tokyo are considering HBase to do a similar thing.
> They want to serve MR results inside the clients' companies via HBase. They
> both have existing MR/HDFS emvironment; one has a small (< 10) and another
> has a large (> 50) clusters.
>
> They'll use the incremental loading to existing table (HBASE-1923) to add
> the MR results to the HBase table, and only few users will read and export
> (web CSV download) the results via HBase. So HBase will be lightly loaded.
> They probably won't even need high availability (HA) option on HBase.
>
> So I'm thinking to recommend them to add just one server (non-HA) or two
> servers (HA) to their Hadoop cluster, and run only HMaster and Region Server
> processes on the server(s). The HBase cluster will utilize the existing
> (small or large) HDFS cluster and ZooKeeper ensemble.
>
>
If your "exported dataset" from the MR job is small enough to fit on one
server, you can certainly use a single HBase RS plus the bulk load
functionality. However, with a small dataset like that it might make more
sense to simply export TSV/CSV and then use a tool like Sqoop to export to a
relational database. That way you'd have better off the shelf integration
with various other tools or access methods.


> The server spec will be 2 x 8-core processors and 8GB to 24GB RAM. The RAM
> size will be change depending on the data volume and access pattern.
>
> Has anybody tried a similar configuration? and how it goes?
>
>
> Also, I saw Jon's slides for Hadoop World in NYC 2009, and it was said that
> I'd better to have at least 5 Region Servers / Data Nodes in my cluster to
> get the typical performance. If I deploy RS and DN on separate servers,
> which one should be >= 5 nodes? DN? RS? or both?
>
>
Better to colocate the DNs and RSs for most deployments. You get
significantly better random read performance for uncached data.

-Todd


>
> Thanks,
> Tatsuya Kawano
> Tokyo, Japan
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Reply via email to