On Thu, Nov 1, 2012 at 1:09 PM, Leonid Fedotov <[email protected]>wrote:
> Varun, > for HA NameNode you may want to look at Hortonworks HDP 1.1 release. It > supported on vSphere and on RedHat HA cluster. > HDP 1.1 based on Hadoop 1.0.3 and fully certified for production > environments. > Do not forget, Hadoop 2.0 is still in alpha testing stage and a can not be > recommended for production systems. > HA Namenode is actually running in a number of HBase production systems. > As of ZK nodes: > depending on the amount of ZK traffic, you may not need to put it to the > separate nodes, it could easily coexist with DN . > This is a very bad idea. You should never co-locate ZK on a worker node, as it can starve of CPU or IOPs and time-out (thereby causing cascading failures). This can happen, for example, when someone submits an MR job. > However, it is better to split NN and HBmaster to separate nodes. Like NN > on one node and HB Master and JT on other node. > Why? The HMaster exerts very little load on the host. If you have three masters and want HA, you can have the following config: Host 1: Primary NN, HMaster1, ZK1 Host 2: Standby NN, HMaster2, ZK2 Host 3: JT, HMaster3, ZK3 > > Thank you! > > Sincerely, > Leonid Fedotov > Technical Support Engineer > [email protected] > office: +1 855 846 7866 ext 292 > mobile: +1 650 430 1673 > > On Nov 1, 2012, at 4:17 AM, Marcos Ortiz Valmaseda wrote: > > > Regards, Varun. > > 1- I think that you should take a look to the Cloudera Manager for CDH > 4.1 to create a > > HA HDFS enviroment. Remember that the version 2.0.x is not ready for > production yet. The stable version is Hadoop 1.0.4 with HBase 0.94.2 > > > > 2- Yes, a recommended practice is to have a separate Zookeeper ensemble > (three, five or seven are good numbers for the ensemble) from your NN, HB > Master. For example: > > - 1 NN/HB Master, JT > > - 5 DN, HR Servers, TT > > - 3 nodes for the Zookeeper quorum. > > > > Best wishes. > > > > ----- Mensaje original ----- > > De: Varun Sharma <[email protected]> > > Para: Marcos Ortiz <[email protected]>, kevin odell < > [email protected]> > > CC: [email protected] > > Enviado: Thu, 01 Nov 2012 03:01:55 -0500 (CST) > > Asunto: Re: Hbase cluster for serving real time site traffic > > > > Thanks all for the helpful comments. I read up on HA and was wondering if > > there are good tools for setting up a HA HDFS + Hbase cluster on EC2 > > quickly. From my reading, it appears that tools like Whirr still have > > issues with bringing up the secondary NN on a different machine etc. Also > > for availability, would Master-Slave replication or Master-Master > > replication be a substitute for having the secondary NN. > > > > For zookeeper, should the servers be running ZK only or is it fine to > share > > with other services like the master ? Also, is it better to have a > > dedicated zookeeper cluster per hbase cluster ? > > > > Thanks > > Varun > > > > On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[email protected]> wrote: > > > >> Regards, Varun, answers in line > >> > >> On 10/30/2012 01:03 PM, Varun Sharma wrote: > >> > >> Thanks for the tips. > >> > >> So, yes, secondary NameNode is probably more critical than the secondary > >> master - since the master is only responsible for metadata > changes/region > >> splits/table creation etc and not for writes/reads. > >> > >> Exactly, you have to create a good HA strategy for these nodes (Master > >> and Secondary Master) > >> > >> > >> Regarding the keys question - i meant that the (row + column) length is > >> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a > cluster > >> running with all the data loaded into hbase but it all runs with default > >> settings. > >> > >> There are many areas that you can optimize in a HBase cluster: > >> - Write operations > >> - Compactions and Split optimization > >> - Region Servers size > >> - Snappy compression > >> - Schema design > >> - Use of Block caching to Scan optimization > >> - Use of asynchronous clients for HBase operations (asynchbase for > >> example[1]) > >> etc > >> > >> The excellent Lars's book: "HBase: The Definitive Guide" has a completed > >> chapter for this tricky topic (Chapter 11) > >> > >> Some additional resources: > >> > >> [1] https://github.com/stumbleupon/asynchbase > >> https://github.com/twitter/finagle > >> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html > >> http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html > >> http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/ > >> > >> Look at Slidehare all tagged presentations from the last HBaseCon, for > >> example the Benoit's talk about > >> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema > Design": > >> http://www.slideshare.net/cloudera/tag/hbasecon-2012 > >> > >> Best wishes > >> > >> Thanks > >> Varun > >> > >> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari < > [email protected]> wrote: > >> > >> > >> My 2¢. > >> > >> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum > >> recommanded for production. > >> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded > >> to have one of each. And the master is critical. If you are loosing > >> it, you are loosing your cluster. > >> 3) NameNode is hadoop, not hbase. You should follow hadoop > >> recommandations. Like you have secondarymaster, you have > >> secondarynamenode. So I think you should have as many > >> secondarynamenode as you have secondarymaster (on the same machine?). > >> 4) I'm not sure to understanding this question. Key are binary. Array > >> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot. > >> This will only give you 2^32 different rows. You will have to > >> pre-split them, or you will end with almost all of them on the same > >> regionserver? > >> > >> JM > >> > >> 2012/10/30, Varun Sharma <[email protected]> <[email protected]>: > >> > >> Hi, > >> > >> We are planning to experiment with a cluster for serving production > >> > >> traffic > >> > >> using hbase for pinterest. We are starting off with a 10 region server + > >> > >> 1 > >> > >> master cluster on Amazon EMR version 0.92. I had some very naive > >> > >> questions > >> > >> (primarily around points of failure): > >> > >> 1) It seems hbase starts only one zookeeper on the master node - which > is > >> critical for operation - how many zookeepers should I use and can I run > >> those on the region servers ? > >> 2) How many masters to use - does hbase support multiple masters > (primary > >> and secondary) within the same cluster ? From my understanding, master > >> availability is not critical for operation. > >> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a > >> single point of failure and we should really be running two name node(s) > >> > >> so > >> > >> we can failover. Is it fine to run these on the region servers ? > >> 4) Our current application involves long row/column - 24-32 bytes with > >> > >> 0-1 > >> > >> bytes of values. Should we be using a different key encoding than the > >> default encoding ? What advantages could it buy us ? > >> > >> We are currently using amazon EMR for testing purposes which runs hbase > >> 0.92. If it works well, we would like to configure our own cluster with > >> probably the latest version of hbase which appears to be 0.94 at the > >> moment. > >> > >> Thanks > >> Varun > >> > >> > >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > >> http://www.uci.cuhttp:// > www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci > >> > >> > >> -- > >> ** > >> > >> Marcos Luis Ortíz Valmaseda > >> about.me/marcosortiz > >> @marcosluis2186 <http://twitter.com/marcosluis2186> > >> ** > >> > >> <http://www.uci.cu/> > >> > >> > > > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > > > http://www.uci.cu > > http://www.facebook.com/universidad.uci > > http://www.flickr.com/photos/universidad_uci > > > > > > > > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS > INFORMATICAS... > > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION > > > > http://www.uci.cu > > http://www.facebook.com/universidad.uci > > http://www.flickr.com/photos/universidad_uci > >
