Hi, I am running a 3 node cluster. HDFS datanode and Hbase regionserver are running on each node. The Hbase master and HDFS namenode run on different machines (not "different" in the sense of "not in the cluster". Just different in the sense of "not on the same box in the cluster").Quad core, 64-bit JVM, 32 GB RAM. 4 disk per machine. We had many troubles getting the cluster to stay alive when paired with an asymmetric (big) mapreduce cluster that was writing into Hbase. Ultimately, we achieved stability by disabling the WAL from code in our mapreduce jobs, and setting the Hfile block size lower than the default (we do a lot of random reads in the map phase). There are other tweaks that must be made, such as upping the OS file limit. I made a lot of posts in May, so you could look in the archive. At present, we're quite happy with the cluster.
-geoff -----Original Message----- From: Paul Smith [mailto:[email protected]] Sent: Thursday, July 22, 2010 3:56 PM To: [email protected] Subject: Smallest production HBase cluster anyone able to share their experience, thoughts on the 'smallest' production HBase cluster in operation? Thinking there may be some point in the # Nodes scale where one transitions from/to "that's silly" to "that's actually more like it". Anyone out there with a small HBase cluster in operation with < 10 nodes able to share any information? I notice on http://wiki.apache.org/hadoop/Hbase/PoweredBy there are some who have even just a 3 node cluster, perhaps that's out of date, but curious to know from the community on where people think 'the line' needs to be drawn on usage of Hbase. To take things to an extreme, is there anyone actually running a _single_ HBase node... ? (one would hope that machine is actually designed to be a bit more HA than normal) just to take advantage of a column-oriented store? thanks, Paul
