Re: Hbase cluster for serving real time site traffic

Varun Sharma Thu, 01 Nov 2012 01:02:27 -0700

Thanks all for the helpful comments. I read up on HA and was wondering if
there are good tools for setting up a HA HDFS + Hbase cluster on EC2
quickly. From my reading, it appears that tools like Whirr still have
issues with bringing up the secondary NN on a different machine etc. Also
for availability, would Master-Slave replication or Master-Master
replication be a substitute for having the secondary NN.


For zookeeper, should the servers be running ZK only or is it fine to share
with other services like the master ? Also, is it better to have a
dedicated zookeeper cluster per hbase cluster ?

Thanks
Varun

On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <[email protected]> wrote:

>  Regards, Varun, answers in line
>
> On 10/30/2012 01:03 PM, Varun Sharma wrote:
>
> Thanks for the tips.
>
> So, yes, secondary NameNode is probably more critical than the secondary
> master - since the master is only responsible for metadata changes/region
> splits/table creation etc and not for writes/reads.
>
>  Exactly, you have to create a good HA strategy for these nodes (Master
> and Secondary Master)
>
>
>  Regarding the keys question - i meant that the (row + column) length is
> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
> running with all the data loaded into hbase but it all runs with default
> settings.
>
>  There are many areas that you can optimize in a HBase cluster:
> - Write operations
> - Compactions and Split optimization
> - Region Servers size
> - Snappy compression
> - Schema design
> - Use of Block caching to Scan optimization
> - Use of asynchronous clients for HBase operations (asynchbase for
> example[1])
> etc
>
> The excellent Lars's book: "HBase: The Definitive Guide" has a completed
> chapter for this tricky topic (Chapter 11)
>
> Some additional resources:
>
> [1] https://github.com/stumbleupon/asynchbase
> https://github.com/twitter/finagle
> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
> http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
> http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
>
> Look at Slidehare all tagged presentations from the last HBaseCon, for
> example the Benoit's talk about
> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":
> http://www.slideshare.net/cloudera/tag/hbasecon-2012
>
> Best wishes
>
> Thanks
> Varun
>
> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari 
> <[email protected]> wrote:
>
>
>  My 2¢.
>
> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
> recommanded for production.
> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
> to have one of each. And the master is critical. If you are loosing
> it, you are loosing your cluster.
> 3) NameNode is hadoop, not hbase. You should follow hadoop
> recommandations. Like you have secondarymaster, you have
> secondarynamenode. So I think you should have as many
> secondarynamenode as you have secondarymaster (on the same machine?).
> 4) I'm not sure to understanding this question. Key are binary. Array
> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
> This will only give you 2^32 different rows. You will have to
> pre-split them, or you will end with almost all of them on the same
> regionserver?
>
> JM
>
> 2012/10/30, Varun Sharma <[email protected]> <[email protected]>:
>
>  Hi,
>
> We are planning to experiment with a cluster for serving production
>
>  traffic
>
>  using hbase for pinterest. We are starting off with a 10 region server +
>
>  1
>
>  master cluster on Amazon EMR version 0.92. I had some very naive
>
>  questions
>
>  (primarily around points of failure):
>
> 1) It seems hbase starts only one zookeeper on the master node - which is
> critical for operation - how many zookeepers should I use and can I run
> those on the region servers ?
> 2) How many masters to use - does hbase support multiple masters (primary
> and secondary) within the same cluster ? From my understanding, master
> availability is not critical for operation.
> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
> single point of failure and we should really be running two name node(s)
>
>  so
>
>  we can failover. Is it fine to run these on the region servers ?
> 4) Our current application involves long row/column - 24-32 bytes with
>
>  0-1
>
>  bytes of values. Should we be using a different key encoding than the
> default encoding ? What advantages could it buy us ?
>
> We are currently using amazon EMR for testing purposes which runs hbase
> 0.92. If it works well, we would like to configure our own cluster with
> probably the latest version of hbase which appears to be 0.94 at the
> moment.
>
> Thanks
> Varun
>
>
>   10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> http://www.uci.cuhttp://www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci
>
>
> --
> **
>
> Marcos Luis Ortíz Valmaseda
> about.me/marcosortiz
> @marcosluis2186 <http://twitter.com/marcosluis2186>
>  **
>
>   <http://www.uci.cu/>
>
>

Re: Hbase cluster for serving real time site traffic

Reply via email to