Sorry I also forgot. Do not run your NN and failover node with other services.
On Tue, Oct 30, 2012 at 2:15 PM, Kevin O'dell <[email protected]>wrote: > Varun, > > I will take a shot at answering this: > > 1) It seems hbase starts only one zookeeper on the master node - which is > critical for operation - how many zookeepers should I use and can I run > those on the region servers ? <-- 3 and they should be on dedicated > servers for a real production environment. > > 2) How many masters to use - does hbase support multiple masters (primary > and secondary) within the same cluster ? From my understanding, master > availability is not critical for operation. <--2 if you lose the master > you lose HBase. The Master is VERY critical. > > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a > single point of failure and we should really be running two name node(s) so > we can failover. Is it fine to run these on the region servers ? 2, you > will want to use HA for a real production workload. The SNN(Secondary Name > Node) is a very misleading name. > > So, yes, secondary NameNode is probably more critical than the secondary > master - since the master is only responsible for metadata changes/region > splits/table creation etc and not for writes/reads. <--- This is not > correct. The Secondary Name Node is not a failover node. You will want to > use a release that has HA to guarantee availability at the NN level. The > master is in charge of META data operations, but also with out the Master > the RS will not continue to just work. It is very important to have two > masters. > > I will defer Jean-Marc on the Schema designs. > > > > On Tue, Oct 30, 2012 at 1:03 PM, Varun Sharma <[email protected]> wrote: > >> Thanks for the tips. >> >> So, yes, secondary NameNode is probably more critical than the secondary >> master - since the master is only responsible for metadata changes/region >> splits/table creation etc and not for writes/reads. >> >> Regarding the keys question - i meant that the (row + column) length is >> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a >> cluster >> running with all the data loaded into hbase but it all runs with default >> settings. >> >> Thanks >> Varun >> >> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari < >> [email protected]> wrote: >> >> > My 2ยข. >> > >> > 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum >> > recommanded for production. >> > 2) Yes, you have Master and SecondaryMaster. And it's also recommanded >> > to have one of each. And the master is critical. If you are loosing >> > it, you are loosing your cluster. >> > 3) NameNode is hadoop, not hbase. You should follow hadoop >> > recommandations. Like you have secondarymaster, you have >> > secondarynamenode. So I think you should have as many >> > secondarynamenode as you have secondarymaster (on the same machine?). >> > 4) I'm not sure to understanding this question. Key are binary. Array >> > of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot. >> > This will only give you 2^32 different rows. You will have to >> > pre-split them, or you will end with almost all of them on the same >> > regionserver? >> > >> > JM >> > >> > 2012/10/30, Varun Sharma <[email protected]>: >> > > Hi, >> > > >> > > We are planning to experiment with a cluster for serving production >> > traffic >> > > using hbase for pinterest. We are starting off with a 10 region >> server + >> > 1 >> > > master cluster on Amazon EMR version 0.92. I had some very naive >> > questions >> > > (primarily around points of failure): >> > > >> > > 1) It seems hbase starts only one zookeeper on the master node - >> which is >> > > critical for operation - how many zookeepers should I use and can I >> run >> > > those on the region servers ? >> > > 2) How many masters to use - does hbase support multiple masters >> (primary >> > > and secondary) within the same cluster ? From my understanding, master >> > > availability is not critical for operation. >> > > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode >> is a >> > > single point of failure and we should really be running two name >> node(s) >> > so >> > > we can failover. Is it fine to run these on the region servers ? >> > > 4) Our current application involves long row/column - 24-32 bytes with >> > 0-1 >> > > bytes of values. Should we be using a different key encoding than the >> > > default encoding ? What advantages could it buy us ? >> > > >> > > We are currently using amazon EMR for testing purposes which runs >> hbase >> > > 0.92. If it works well, we would like to configure our own cluster >> with >> > > probably the latest version of hbase which appears to be 0.94 at the >> > > moment. >> > > >> > > Thanks >> > > Varun >> > > >> > >> > > > > -- > Kevin O'Dell > Customer Operations Engineer, Cloudera > -- Kevin O'Dell Customer Operations Engineer, Cloudera
