Re: Hbase cluster for serving real time site traffic

Patrick Angeles Thu, 01 Nov 2012 12:21:23 -0700

I should have added, that, if you have one host for all the master roles
(NN, JT, HMaster) then you may as well go with a single ZK node (quorum =
1) on that same server.


On Thu, Nov 1, 2012 at 3:11 PM, Patrick Angeles <[email protected]>wrote:

>
>
> On Thu, Nov 1, 2012 at 1:09 PM, Leonid Fedotov 
> <[email protected]>wrote:
>
>> Varun,
>> for HA NameNode you may want to  look at Hortonworks HDP 1.1 release. It
>> supported on vSphere and on RedHat HA cluster.
>> HDP 1.1 based on Hadoop 1.0.3 and fully certified for production
>> environments.
>> Do not forget, Hadoop 2.0 is still in alpha testing stage and a can not
>> be recommended for production systems.
>>
>
> HA Namenode is actually running in a number of HBase production systems.
>
>
>> As of ZK nodes:
>> depending on the amount of ZK traffic, you may not need to put it to the
>> separate nodes, it could easily coexist with DN .
>>
>
> This is a very bad idea. You should never co-locate ZK on a worker node,
> as it can starve of CPU or IOPs and time-out (thereby causing cascading
> failures). This can happen, for example, when someone submits an MR job.
>
>
>> However, it is better to split NN and HBmaster to separate nodes. Like NN
>> on one node and HB Master and JT on other node.
>>
>
> Why? The HMaster exerts very little load on the host. If you have three
> masters and want HA, you can have the following config:
>
> Host 1: Primary NN, HMaster1, ZK1
> Host 2: Standby NN, HMaster2, ZK2
> Host 3: JT, HMaster3, ZK3
>
>
>>
>> Thank you!
>>
>> Sincerely,
>> Leonid Fedotov
>> Technical Support Engineer
>> [email protected]
>> office: +1 855 846 7866 ext 292
>> mobile: +1 650 430 1673
>>
>> On Nov 1, 2012, at 4:17 AM, Marcos Ortiz Valmaseda wrote:
>>
>> > Regards, Varun.
>> > 1- I think that you should take a look to the Cloudera Manager for CDH
>> 4.1 to create a
>> > HA HDFS enviroment. Remember that the version 2.0.x is not ready for
>> production yet. The stable version is Hadoop 1.0.4 with HBase 0.94.2
>> >
>> > 2- Yes, a recommended practice is to have a separate Zookeeper ensemble
>> (three, five or seven are good numbers for the ensemble) from your NN, HB
>> Master. For example:
>> > - 1 NN/HB Master, JT
>> > - 5 DN, HR Servers, TT
>> > - 3 nodes for the Zookeeper quorum.
>> >
>> > Best wishes.
>> >
>> > ----- Mensaje original -----
>> > De: Varun Sharma <[email protected]>
>> > Para: Marcos Ortiz <[email protected]>, kevin odell <
>> [email protected]>
>> > CC: [email protected]
>> > Enviado: Thu, 01 Nov 2012 03:01:55 -0500 (CST)
>> > Asunto: Re: Hbase cluster for serving real time site traffic
>> >
>> > Thanks all for the helpful comments. I read up on HA and was wondering
>> if
>> > there are good tools for setting up a HA HDFS + Hbase cluster on EC2
>> > quickly. From my reading, it appears that tools like Whirr still have
>> > issues with bringing up the secondary NN on a different machine etc.
>> Also
>> > for availability, would Master-Slave replication or Master-Master
>> > replication be a substitute for having the secondary NN.
>> >
>> > For zookeeper, should the servers be running ZK only or is it fine to
>> share
>> > with other services like the master ? Also, is it better to have a
>> > dedicated zookeeper cluster per hbase cluster ?
>> >
>> > Thanks
>> > Varun
>> >
>> > On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[email protected]> wrote:
>> >
>> >> Regards, Varun, answers in line
>> >>
>> >> On 10/30/2012 01:03 PM, Varun Sharma wrote:
>> >>
>> >> Thanks for the tips.
>> >>
>> >> So, yes, secondary NameNode is probably more critical than the
>> secondary
>> >> master - since the master is only responsible for metadata
>> changes/region
>> >> splits/table creation etc and not for writes/reads.
>> >>
>> >> Exactly, you have to create a good HA strategy for these nodes (Master
>> >> and Secondary Master)
>> >>
>> >>
>> >> Regarding the keys question - i meant that the (row + column) length is
>> >> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a
>> cluster
>> >> running with all the data loaded into hbase but it all runs with
>> default
>> >> settings.
>> >>
>> >> There are many areas that you can optimize in a HBase cluster:
>> >> - Write operations
>> >> - Compactions and Split optimization
>> >> - Region Servers size
>> >> - Snappy compression
>> >> - Schema design
>> >> - Use of Block caching to Scan optimization
>> >> - Use of asynchronous clients for HBase operations (asynchbase for
>> >> example[1])
>> >> etc
>> >>
>> >> The excellent Lars's book: "HBase: The Definitive Guide" has a
>> completed
>> >> chapter for this tricky topic (Chapter 11)
>> >>
>> >> Some additional resources:
>> >>
>> >> [1] https://github.com/stumbleupon/asynchbase
>> >> https://github.com/twitter/finagle
>> >> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
>> >> http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
>> >> http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
>> >>
>> >> Look at Slidehare all tagged presentations from the last HBaseCon, for
>> >> example the Benoit's talk about
>> >> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema
>> Design":
>> >> http://www.slideshare.net/cloudera/tag/hbasecon-2012
>> >>
>> >> Best wishes
>> >>
>> >> Thanks
>> >> Varun
>> >>
>> >> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
>> [email protected]> wrote:
>> >>
>> >>
>> >> My 2¢.
>> >>
>> >> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
>> >> recommanded for production.
>> >> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
>> >> to have one of each. And the master is critical. If you are loosing
>> >> it, you are loosing your cluster.
>> >> 3) NameNode is hadoop, not hbase. You should follow hadoop
>> >> recommandations. Like you have secondarymaster, you have
>> >> secondarynamenode. So I think you should have as many
>> >> secondarynamenode as you have secondarymaster (on the same machine?).
>> >> 4) I'm not sure to understanding this question. Key are binary. Array
>> >> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
>> >> This will only give you 2^32 different rows. You will have to
>> >> pre-split them, or you will end with almost all of them on the same
>> >> regionserver?
>> >>
>> >> JM
>> >>
>> >> 2012/10/30, Varun Sharma <[email protected]> <[email protected]>:
>> >>
>> >> Hi,
>> >>
>> >> We are planning to experiment with a cluster for serving production
>> >>
>> >> traffic
>> >>
>> >> using hbase for pinterest. We are starting off with a 10 region server
>> +
>> >>
>> >> 1
>> >>
>> >> master cluster on Amazon EMR version 0.92. I had some very naive
>> >>
>> >> questions
>> >>
>> >> (primarily around points of failure):
>> >>
>> >> 1) It seems hbase starts only one zookeeper on the master node - which
>> is
>> >> critical for operation - how many zookeepers should I use and can I run
>> >> those on the region servers ?
>> >> 2) How many masters to use - does hbase support multiple masters
>> (primary
>> >> and secondary) within the same cluster ? From my understanding, master
>> >> availability is not critical for operation.
>> >> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is
>> a
>> >> single point of failure and we should really be running two name
>> node(s)
>> >>
>> >> so
>> >>
>> >> we can failover. Is it fine to run these on the region servers ?
>> >> 4) Our current application involves long row/column - 24-32 bytes with
>> >>
>> >> 0-1
>> >>
>> >> bytes of values. Should we be using a different key encoding than the
>> >> default encoding ? What advantages could it buy us ?
>> >>
>> >> We are currently using amazon EMR for testing purposes which runs hbase
>> >> 0.92. If it works well, we would like to configure our own cluster with
>> >> probably the latest version of hbase which appears to be 0.94 at the
>> >> moment.
>> >>
>> >> Thanks
>> >> Varun
>> >>
>> >>
>> >>  10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >> http://www.uci.cuhttp://
>> www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci
>> >>
>> >>
>> >> --
>> >> **
>> >>
>> >> Marcos Luis Ortíz Valmaseda
>> >> about.me/marcosortiz
>> >> @marcosluis2186 <http://twitter.com/marcosluis2186>
>> >> **
>> >>
>> >>  <http://www.uci.cu/>
>> >>
>> >>
>> >
>> >
>> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >
>> > http://www.uci.cu
>> > http://www.facebook.com/universidad.uci
>> > http://www.flickr.com/photos/universidad_uci
>> >
>> >
>> >
>> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
>> INFORMATICAS...
>> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>> >
>> > http://www.uci.cu
>> > http://www.facebook.com/universidad.uci
>> > http://www.flickr.com/photos/universidad_uci
>>
>>
>

Re: Hbase cluster for serving real time site traffic

Reply via email to