Hello,

I remember Jon was talking other day that he was trying a single HBase server with existing HDFS cluster to serve map reduce (MR) results. I wonder if this went well or not.

A couple of friends in Tokyo are considering HBase to do a similar thing. They want to serve MR results inside the clients' companies via HBase. They both have existing MR/HDFS emvironment; one has a small (< 10) and another has a large (> 50) clusters.

They'll use the incremental loading to existing table (HBASE-1923) to add the MR results to the HBase table, and only few users will read and export (web CSV download) the results via HBase. So HBase will be lightly loaded. They probably won't even need high availability (HA) option on HBase.

So I'm thinking to recommend them to add just one server (non-HA) or two servers (HA) to their Hadoop cluster, and run only HMaster and Region Server processes on the server(s). The HBase cluster will utilize the existing (small or large) HDFS cluster and ZooKeeper ensemble.

The server spec will be 2 x 8-core processors and 8GB to 24GB RAM. The RAM size will be change depending on the data volume and access pattern.

Has anybody tried a similar configuration? and how it goes?


Also, I saw Jon's slides for Hadoop World in NYC 2009, and it was said that I'd better to have at least 5 Region Servers / Data Nodes in my cluster to get the typical performance. If I deploy RS and DN on separate servers, which one should be >= 5 nodes? DN? RS? or both?


Thanks,
Tatsuya Kawano
Tokyo, Japan



Reply via email to