Hey Dan: On Wed, Oct 20, 2010 at 2:09 AM, Dan Harvey <[email protected]> wrote: > Hey, > > We're just looking into ways to run multiple instances/versions of HBase for > testing/development and were wondering how other people have gone about > doing this. >
Development of replication feature has made it so tests now can put up multiple concurrent clusters. See TestHBaseClusterUtility which starts up three clusters in the one JVM each homed on its own directory in a single zookeeper instance, each running its own hdfs (having them share an hdfs should work too though might need some HBaseTestingUtility fixup). At SU, there are mutliple clusters: a serving cluster for low-latency (replicating to backup cluster) and then a cluster for MR jobs, dev clusters, etc. Generally these don't share hdfs though again cluster's with like SLAs could. > If we used just one hadoop cluster then we can have a different paths / user > for each hbase instance, and then have a set of zookeeper nodes for each > instance (or run multiple zk's on each server binding to different hosts for > each instance..). You could do that. Have all share same zk ensemble (Run one per datacenter?) > If we used multiple hadoop clusters then the only difference would be just > using different hdfs for storing the data. > > Does anyone have experiences with problems or benefits to either of the > above? > > I'm tempted to go towards the single cluster for more efficient use of > hardware but I'm not sure if that's a good idea or not. > At SU the cluster serving the frontend is distinct from the cluster running the heavy-duty MR jobs. When a big MR job started up, the front-end latency tended to suffer. There might be some ratio of HDFS nodes to HBase nodes that would make it so low-latency and MR cluster could share HDFS but I've not done the work to figure it. St.Ack
