Re: Hbase cluster for serving real time site traffic

Varun Sharma Tue, 30 Oct 2012 11:04:18 -0700

Thanks for the tips.

So, yes, secondary NameNode is probably more critical than the secondary
master - since the master is only responsible for metadata changes/region
splits/table creation etc and not for writes/reads.


Regarding the keys question - i meant that the (row + column) length is
24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
running with all the data loaded into hbase but it all runs with default
settings.

Thanks
Varun

On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
[email protected]> wrote:

> My 2¢.
>
> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
> recommanded for production.
> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
> to have one of each. And the master is critical. If you are loosing
> it, you are loosing your cluster.
> 3) NameNode is hadoop, not hbase. You should follow hadoop
> recommandations. Like you have secondarymaster, you have
> secondarynamenode. So I think you should have as many
> secondarynamenode as you have secondarymaster (on the same machine?).
> 4) I'm not sure to understanding this question. Key are binary. Array
> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
> This will only give you 2^32 different rows. You will have to
> pre-split them, or you will end with almost all of them on the same
> regionserver?
>
> JM
>
> 2012/10/30, Varun Sharma <[email protected]>:
> > Hi,
> >
> > We are planning to experiment with a cluster for serving production
> traffic
> > using hbase for pinterest. We are starting off with a 10 region server +
> 1
> > master cluster on Amazon EMR version 0.92. I had some very naive
> questions
> > (primarily around points of failure):
> >
> > 1) It seems hbase starts only one zookeeper on the master node - which is
> > critical for operation - how many zookeepers should I use and can I run
> > those on the region servers ?
> > 2) How many masters to use - does hbase support multiple masters (primary
> > and secondary) within the same cluster ? From my understanding, master
> > availability is not critical for operation.
> > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
> > single point of failure and we should really be running two name node(s)
> so
> > we can failover. Is it fine to run these on the region servers ?
> > 4) Our current application involves long row/column - 24-32 bytes with
> 0-1
> > bytes of values. Should we be using a different key encoding than the
> > default encoding ? What advantages could it buy us ?
> >
> > We are currently using amazon EMR for testing purposes which runs hbase
> > 0.92. If it works well, we would like to configure our own cluster with
> > probably the latest version of hbase which appears to be 0.94 at the
> > moment.
> >
> > Thanks
> > Varun
> >
>

Re: Hbase cluster for serving real time site traffic

Reply via email to