Re: Multiple HBase instances

Stack Wed, 20 Oct 2010 07:29:55 -0700

Hey Dan:

On Wed, Oct 20, 2010 at 2:09 AM, Dan Harvey <[email protected]> wrote:
> Hey,
>
> We're just looking into ways to run multiple instances/versions of HBase for
> testing/development and were wondering how other people have gone about
> doing this.
>


Development of replication feature has made it so tests now can put up
multiple concurrent clusters.   See TestHBaseClusterUtility which
starts up three clusters in the one JVM each homed on its own
directory in a single zookeeper instance, each running its own hdfs
(having them share an hdfs should work too though might need some
HBaseTestingUtility fixup).

At SU, there are mutliple clusters: a serving cluster for low-latency
(replicating to backup cluster) and then a cluster for MR jobs, dev
clusters, etc.  Generally these don't share hdfs though again
cluster's with like SLAs could.

> If we used just one hadoop cluster then we can have a different paths / user
> for each hbase instance, and then have a set of zookeeper nodes for each
> instance (or run multiple zk's on each server binding to different hosts for
> each instance..).

You could do that.  Have all share same zk ensemble (Run one per datacenter?)

> If we used multiple hadoop clusters then the only difference would be just
> using different hdfs for storing the data.
>
> Does anyone have experiences with problems or benefits to either of the
> above?
>
> I'm tempted to go towards the single cluster for more efficient use of
> hardware but I'm not sure if that's a good idea or not.
>

At SU the cluster serving the frontend is distinct from the cluster
running the heavy-duty MR jobs. When a big MR job started up, the
front-end latency tended to suffer.  There might be some ratio of HDFS
nodes to HBase nodes that would make it so low-latency and MR cluster
could share HDFS but I've not done the work to figure it.

St.Ack

Re: Multiple HBase instances

Reply via email to