Regards, Varun, answers in line
On 10/30/2012 01:03 PM, Varun Sharma wrote:
Thanks for the tips.
So, yes, secondary NameNode is probably more critical than the secondary
master - since the master is only responsible for metadata changes/region
splits/table creation etc and not for writes/reads.
Exactly, you have to create a good HA strategy for these nodes (Master
and Secondary Master)
Regarding the keys question - i meant that the (row + column) length is
24-32 bytes and the value length is 0-1 bytes. Currently, we have a cluster
running with all the data loaded into hbase but it all runs with default
settings.
There are many areas that you can optimize in a HBase cluster:
- Write operations
- Compactions and Split optimization
- Region Servers size
- Snappy compression
- Schema design
- Use of Block caching to Scan optimization
- Use of asynchronous clients for HBase operations (asynchbase for
example[1])
etc
The excellent Lars's book: "HBase: The Definitive Guide" has a completed
chapter for this tricky topic (Chapter 11)
Some additional resources:
[1] https://github.com/stumbleupon/asynchbase
https://github.com/twitter/finagle
http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
Look at Slidehare all tagged presentations from the last HBaseCon, for
example the Benoit's talk about
"Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema Design":
http://www.slideshare.net/cloudera/tag/hbasecon-2012
Best wishes
Thanks
Varun
On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
[email protected]> wrote:
My 2¢.
1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
recommanded for production.
2) Yes, you have Master and SecondaryMaster. And it's also recommanded
to have one of each. And the master is critical. If you are loosing
it, you are loosing your cluster.
3) NameNode is hadoop, not hbase. You should follow hadoop
recommandations. Like you have secondarymaster, you have
secondarynamenode. So I think you should have as many
secondarynamenode as you have secondarymaster (on the same machine?).
4) I'm not sure to understanding this question. Key are binary. Array
of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
This will only give you 2^32 different rows. You will have to
pre-split them, or you will end with almost all of them on the same
regionserver?
JM
2012/10/30, Varun Sharma <[email protected]>:
Hi,
We are planning to experiment with a cluster for serving production
traffic
using hbase for pinterest. We are starting off with a 10 region server +
1
master cluster on Amazon EMR version 0.92. I had some very naive
questions
(primarily around points of failure):
1) It seems hbase starts only one zookeeper on the master node - which is
critical for operation - how many zookeepers should I use and can I run
those on the region servers ?
2) How many masters to use - does hbase support multiple masters (primary
and secondary) within the same cluster ? From my understanding, master
availability is not critical for operation.
3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
single point of failure and we should really be running two name node(s)
so
we can failover. Is it fine to run these on the region servers ?
4) Our current application involves long row/column - 24-32 bytes with
0-1
bytes of values. Should we be using a different key encoding than the
default encoding ? What advantages could it buy us ?
We are currently using amazon EMR for testing purposes which runs hbase
0.92. If it works well, we would like to configure our own cluster with
probably the latest version of hbase which appears to be 0.94 at the
moment.
Thanks
Varun
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci
--
Marcos Luis Ortíz Valmaseda
about.me/marcosortiz <http://about.me/marcosortiz>
@marcosluis2186 <http://twitter.com/marcosluis2186>
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci