I should have added, that, if you have one host for all the master roles (NN, JT, HMaster) then you may as well go with a single ZK node (quorum = 1) on that same server.
On Thu, Nov 1, 2012 at 3:11 PM, Patrick Angeles <[email protected]>wrote: > > > On Thu, Nov 1, 2012 at 1:09 PM, Leonid Fedotov > <[email protected]>wrote: > >> Varun, >> for HA NameNode you may want to look at Hortonworks HDP 1.1 release. It >> supported on vSphere and on RedHat HA cluster. >> HDP 1.1 based on Hadoop 1.0.3 and fully certified for production >> environments. >> Do not forget, Hadoop 2.0 is still in alpha testing stage and a can not >> be recommended for production systems. >> > > HA Namenode is actually running in a number of HBase production systems. > > >> As of ZK nodes: >> depending on the amount of ZK traffic, you may not need to put it to the >> separate nodes, it could easily coexist with DN . >> > > This is a very bad idea. You should never co-locate ZK on a worker node, > as it can starve of CPU or IOPs and time-out (thereby causing cascading > failures). This can happen, for example, when someone submits an MR job. > > >> However, it is better to split NN and HBmaster to separate nodes. Like NN >> on one node and HB Master and JT on other node. >> > > Why? The HMaster exerts very little load on the host. If you have three > masters and want HA, you can have the following config: > > Host 1: Primary NN, HMaster1, ZK1 > Host 2: Standby NN, HMaster2, ZK2 > Host 3: JT, HMaster3, ZK3 > > >> >> Thank you! >> >> Sincerely, >> Leonid Fedotov >> Technical Support Engineer >> [email protected] >> office: +1 855 846 7866 ext 292 >> mobile: +1 650 430 1673 >> >> On Nov 1, 2012, at 4:17 AM, Marcos Ortiz Valmaseda wrote: >> >> > Regards, Varun. >> > 1- I think that you should take a look to the Cloudera Manager for CDH >> 4.1 to create a >> > HA HDFS enviroment. Remember that the version 2.0.x is not ready for >> production yet. The stable version is Hadoop 1.0.4 with HBase 0.94.2 >> > >> > 2- Yes, a recommended practice is to have a separate Zookeeper ensemble >> (three, five or seven are good numbers for the ensemble) from your NN, HB >> Master. For example: >> > - 1 NN/HB Master, JT >> > - 5 DN, HR Servers, TT >> > - 3 nodes for the Zookeeper quorum. >> > >> > Best wishes. >> > >> > ----- Mensaje original ----- >> > De: Varun Sharma <[email protected]> >> > Para: Marcos Ortiz <[email protected]>, kevin odell < >> [email protected]> >> > CC: [email protected] >> > Enviado: Thu, 01 Nov 2012 03:01:55 -0500 (CST) >> > Asunto: Re: Hbase cluster for serving real time site traffic >> > >> > Thanks all for the helpful comments. I read up on HA and was wondering >> if >> > there are good tools for setting up a HA HDFS + Hbase cluster on EC2 >> > quickly. From my reading, it appears that tools like Whirr still have >> > issues with bringing up the secondary NN on a different machine etc. >> Also >> > for availability, would Master-Slave replication or Master-Master >> > replication be a substitute for having the secondary NN. >> > >> > For zookeeper, should the servers be running ZK only or is it fine to >> share >> > with other services like the master ? Also, is it better to have a >> > dedicated zookeeper cluster per hbase cluster ? >> > >> > Thanks >> > Varun >> > >> > On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[email protected]> wrote: >> > >> >> Regards, Varun, answers in line >> >> >> >> On 10/30/2012 01:03 PM, Varun Sharma wrote: >> >> >> >> Thanks for the tips. >> >> >> >> So, yes, secondary NameNode is probably more critical than the >> secondary >> >> master - since the master is only responsible for metadata >> changes/region >> >> splits/table creation etc and not for writes/reads. >> >> >> >> Exactly, you have to create a good HA strategy for these nodes (Master >> >> and Secondary Master) >> >> >> >> >> >> Regarding the keys question - i meant that the (row + column) length is >> >> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a >> cluster >> >> running with all the data loaded into hbase but it all runs with >> default >> >> settings. >> >> >> >> There are many areas that you can optimize in a HBase cluster: >> >> - Write operations >> >> - Compactions and Split optimization >> >> - Region Servers size >> >> - Snappy compression >> >> - Schema design >> >> - Use of Block caching to Scan optimization >> >> - Use of asynchronous clients for HBase operations (asynchbase for >> >> example[1]) >> >> etc >> >> >> >> The excellent Lars's book: "HBase: The Definitive Guide" has a >> completed >> >> chapter for this tricky topic (Chapter 11) >> >> >> >> Some additional resources: >> >> >> >> [1] https://github.com/stumbleupon/asynchbase >> >> https://github.com/twitter/finagle >> >> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html >> >> http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html >> >> http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/ >> >> >> >> Look at Slidehare all tagged presentations from the last HBaseCon, for >> >> example the Benoit's talk about >> >> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema >> Design": >> >> http://www.slideshare.net/cloudera/tag/hbasecon-2012 >> >> >> >> Best wishes >> >> >> >> Thanks >> >> Varun >> >> >> >> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari < >> [email protected]> wrote: >> >> >> >> >> >> My 2¢. >> >> >> >> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum >> >> recommanded for production. >> >> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded >> >> to have one of each. And the master is critical. If you are loosing >> >> it, you are loosing your cluster. >> >> 3) NameNode is hadoop, not hbase. You should follow hadoop >> >> recommandations. Like you have secondarymaster, you have >> >> secondarynamenode. So I think you should have as many >> >> secondarynamenode as you have secondarymaster (on the same machine?). >> >> 4) I'm not sure to understanding this question. Key are binary. Array >> >> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot. >> >> This will only give you 2^32 different rows. You will have to >> >> pre-split them, or you will end with almost all of them on the same >> >> regionserver? >> >> >> >> JM >> >> >> >> 2012/10/30, Varun Sharma <[email protected]> <[email protected]>: >> >> >> >> Hi, >> >> >> >> We are planning to experiment with a cluster for serving production >> >> >> >> traffic >> >> >> >> using hbase for pinterest. We are starting off with a 10 region server >> + >> >> >> >> 1 >> >> >> >> master cluster on Amazon EMR version 0.92. I had some very naive >> >> >> >> questions >> >> >> >> (primarily around points of failure): >> >> >> >> 1) It seems hbase starts only one zookeeper on the master node - which >> is >> >> critical for operation - how many zookeepers should I use and can I run >> >> those on the region servers ? >> >> 2) How many masters to use - does hbase support multiple masters >> (primary >> >> and secondary) within the same cluster ? From my understanding, master >> >> availability is not critical for operation. >> >> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is >> a >> >> single point of failure and we should really be running two name >> node(s) >> >> >> >> so >> >> >> >> we can failover. Is it fine to run these on the region servers ? >> >> 4) Our current application involves long row/column - 24-32 bytes with >> >> >> >> 0-1 >> >> >> >> bytes of values. Should we be using a different key encoding than the >> >> default encoding ? What advantages could it buy us ? >> >> >> >> We are currently using amazon EMR for testing purposes which runs hbase >> >> 0.92. If it works well, we would like to configure our own cluster with >> >> probably the latest version of hbase which appears to be 0.94 at the >> >> moment. >> >> >> >> Thanks >> >> Varun >> >> >> >> >> >> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> >> http://www.uci.cuhttp:// >> www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci >> >> >> >> >> >> -- >> >> ** >> >> >> >> Marcos Luis Ortíz Valmaseda >> >> about.me/marcosortiz >> >> @marcosluis2186 <http://twitter.com/marcosluis2186> >> >> ** >> >> >> >> <http://www.uci.cu/> >> >> >> >> >> > >> > >> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> > >> > http://www.uci.cu >> > http://www.facebook.com/universidad.uci >> > http://www.flickr.com/photos/universidad_uci >> > >> > >> > >> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS >> INFORMATICAS... >> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION >> > >> > http://www.uci.cu >> > http://www.facebook.com/universidad.uci >> > http://www.flickr.com/photos/universidad_uci >> >> >
