Hello,
I remember Jon was talking other day that he was trying a single HBase
server with existing HDFS cluster to serve map reduce (MR) results. I
wonder if this went well or not.
A couple of friends in Tokyo are considering HBase to do a similar
thing. They want to serve MR results inside the clients' companies via
HBase. They both have existing MR/HDFS emvironment; one has a small (<
10) and another has a large (> 50) clusters.
They'll use the incremental loading to existing table (HBASE-1923) to
add the MR results to the HBase table, and only few users will read
and export (web CSV download) the results via HBase. So HBase will be
lightly loaded. They probably won't even need high availability (HA)
option on HBase.
So I'm thinking to recommend them to add just one server (non-HA) or
two servers (HA) to their Hadoop cluster, and run only HMaster and
Region Server processes on the server(s). The HBase cluster will
utilize the existing (small or large) HDFS cluster and ZooKeeper
ensemble.
The server spec will be 2 x 8-core processors and 8GB to 24GB RAM. The
RAM size will be change depending on the data volume and access pattern.
Has anybody tried a similar configuration? and how it goes?
Also, I saw Jon's slides for Hadoop World in NYC 2009, and it was said
that I'd better to have at least 5 Region Servers / Data Nodes in my
cluster to get the typical performance. If I deploy RS and DN on
separate servers, which one should be >= 5 nodes? DN? RS? or both?
Thanks,
Tatsuya Kawano
Tokyo, Japan