Thanks for the tips. So, yes, secondary NameNode is probably more critical than the secondary master - since the master is only responsible for metadata changes/region splits/table creation etc and not for writes/reads.
Regarding the keys question - i meant that the (row + column) length is 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster running with all the data loaded into hbase but it all runs with default settings. Thanks Varun On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari < [email protected]> wrote: > My 2ยข. > > 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum > recommanded for production. > 2) Yes, you have Master and SecondaryMaster. And it's also recommanded > to have one of each. And the master is critical. If you are loosing > it, you are loosing your cluster. > 3) NameNode is hadoop, not hbase. You should follow hadoop > recommandations. Like you have secondarymaster, you have > secondarynamenode. So I think you should have as many > secondarynamenode as you have secondarymaster (on the same machine?). > 4) I'm not sure to understanding this question. Key are binary. Array > of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot. > This will only give you 2^32 different rows. You will have to > pre-split them, or you will end with almost all of them on the same > regionserver? > > JM > > 2012/10/30, Varun Sharma <[email protected]>: > > Hi, > > > > We are planning to experiment with a cluster for serving production > traffic > > using hbase for pinterest. We are starting off with a 10 region server + > 1 > > master cluster on Amazon EMR version 0.92. I had some very naive > questions > > (primarily around points of failure): > > > > 1) It seems hbase starts only one zookeeper on the master node - which is > > critical for operation - how many zookeepers should I use and can I run > > those on the region servers ? > > 2) How many masters to use - does hbase support multiple masters (primary > > and secondary) within the same cluster ? From my understanding, master > > availability is not critical for operation. > > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a > > single point of failure and we should really be running two name node(s) > so > > we can failover. Is it fine to run these on the region servers ? > > 4) Our current application involves long row/column - 24-32 bytes with > 0-1 > > bytes of values. Should we be using a different key encoding than the > > default encoding ? What advantages could it buy us ? > > > > We are currently using amazon EMR for testing purposes which runs hbase > > 0.92. If it works well, we would like to configure our own cluster with > > probably the latest version of hbase which appears to be 0.94 at the > > moment. > > > > Thanks > > Varun > > >
