Thanks all for the helpful comments. I read up on HA and was wondering if there are good tools for setting up a HA HDFS + Hbase cluster on EC2 quickly. From my reading, it appears that tools like Whirr still have issues with bringing up the secondary NN on a different machine etc. Also for availability, would Master-Slave replication or Master-Master replication be a substitute for having the secondary NN.
For zookeeper, should the servers be running ZK only or is it fine to share with other services like the master ? Also, is it better to have a dedicated zookeeper cluster per hbase cluster ? Thanks Varun On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[email protected]> wrote: > Regards, Varun, answers in line > > On 10/30/2012 01:03 PM, Varun Sharma wrote: > > Thanks for the tips. > > So, yes, secondary NameNode is probably more critical than the secondary > master - since the master is only responsible for metadata changes/region > splits/table creation etc and not for writes/reads. > > Exactly, you have to create a good HA strategy for these nodes (Master > and Secondary Master) > > > Regarding the keys question - i meant that the (row + column) length is > 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster > running with all the data loaded into hbase but it all runs with default > settings. > > There are many areas that you can optimize in a HBase cluster: > - Write operations > - Compactions and Split optimization > - Region Servers size > - Snappy compression > - Schema design > - Use of Block caching to Scan optimization > - Use of asynchronous clients for HBase operations (asynchbase for > example[1]) > etc > > The excellent Lars's book: "HBase: The Definitive Guide" has a completed > chapter for this tricky topic (Chapter 11) > > Some additional resources: > > [1] https://github.com/stumbleupon/asynchbase > https://github.com/twitter/finagle > http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html > http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html > http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/ > > Look at Slidehare all tagged presentations from the last HBaseCon, for > example the Benoit's talk about > "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design": > http://www.slideshare.net/cloudera/tag/hbasecon-2012 > > Best wishes > > Thanks > Varun > > On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari > <[email protected]> wrote: > > > My 2¢. > > 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum > recommanded for production. > 2) Yes, you have Master and SecondaryMaster. And it's also recommanded > to have one of each. And the master is critical. If you are loosing > it, you are loosing your cluster. > 3) NameNode is hadoop, not hbase. You should follow hadoop > recommandations. Like you have secondarymaster, you have > secondarynamenode. So I think you should have as many > secondarynamenode as you have secondarymaster (on the same machine?). > 4) I'm not sure to understanding this question. Key are binary. Array > of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot. > This will only give you 2^32 different rows. You will have to > pre-split them, or you will end with almost all of them on the same > regionserver? > > JM > > 2012/10/30, Varun Sharma <[email protected]> <[email protected]>: > > Hi, > > We are planning to experiment with a cluster for serving production > > traffic > > using hbase for pinterest. We are starting off with a 10 region server + > > 1 > > master cluster on Amazon EMR version 0.92. I had some very naive > > questions > > (primarily around points of failure): > > 1) It seems hbase starts only one zookeeper on the master node - which is > critical for operation - how many zookeepers should I use and can I run > those on the region servers ? > 2) How many masters to use - does hbase support multiple masters (primary > and secondary) within the same cluster ? From my understanding, master > availability is not critical for operation. > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a > single point of failure and we should really be running two name node(s) > > so > > we can failover. Is it fine to run these on the region servers ? > 4) Our current application involves long row/column - 24-32 bytes with > > 0-1 > > bytes of values. Should we be using a different key encoding than the > default encoding ? What advantages could it buy us ? > > We are currently using amazon EMR for testing purposes which runs hbase > 0.92. If it works well, we would like to configure our own cluster with > probably the latest version of hbase which appears to be 0.94 at the > moment. > > Thanks > Varun > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > http://www.uci.cuhttp://www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci > > > -- > ** > > Marcos Luis Ortíz Valmaseda > about.me/marcosortiz > @marcosluis2186 <http://twitter.com/marcosluis2186> > ** > > <http://www.uci.cu/> > >
