Re: Hbase cluster for serving real time site traffic

Kevin O'dell Tue, 30 Oct 2012 12:16:47 -0700

Sorry I also forgot.  Do not run your NN and failover node with other
services.


On Tue, Oct 30, 2012 at 2:15 PM, Kevin O'dell <[email protected]>wrote:

> Varun,
>
>   I will take a shot at answering this:
>
> 1) It seems hbase starts only one zookeeper on the master node - which is
> critical for operation - how many zookeepers should I use and can I run
> those on the region servers ? <-- 3 and they should be on dedicated
> servers for a real production environment.
>
> 2) How many masters to use - does hbase support multiple masters (primary
> and secondary) within the same cluster ? From my understanding, master
> availability is not critical for operation. <--2 if you lose the master
> you lose HBase.  The Master is VERY critical.
>
> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
> single point of failure and we should really be running two name node(s) so
> we can failover. Is it fine to run these on the region servers ? 2, you
> will want to use HA for a real production workload.  The SNN(Secondary Name
> Node) is a very misleading name.
>
> So, yes, secondary NameNode is probably more critical than the secondary
> master - since the master is only responsible for metadata changes/region
> splits/table creation etc and not for writes/reads. <--- This is not
> correct.  The Secondary Name Node is not a failover node.  You will want to
> use a release that has HA to guarantee availability at the NN level.  The
> master is in charge of META data operations, but also with out the Master
> the RS will not continue to just work.  It is very important to have two
> masters.
>
>  I will defer Jean-Marc on the Schema designs.
>
>
>
> On Tue, Oct 30, 2012 at 1:03 PM, Varun Sharma <[email protected]> wrote:
>
>> Thanks for the tips.
>>
>> So, yes, secondary NameNode is probably more critical than the secondary
>> master - since the master is only responsible for metadata changes/region
>> splits/table creation etc and not for writes/reads.
>>
>> Regarding the keys question - i meant that the (row + column) length is
>> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a
>> cluster
>> running with all the data loaded into hbase but it all runs with default
>> settings.
>>
>> Thanks
>> Varun
>>
>> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
>> [email protected]> wrote:
>>
>> > My 2¢.
>> >
>> > 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
>> > recommanded for production.
>> > 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
>> > to have one of each. And the master is critical. If you are loosing
>> > it, you are loosing your cluster.
>> > 3) NameNode is hadoop, not hbase. You should follow hadoop
>> > recommandations. Like you have secondarymaster, you have
>> > secondarynamenode. So I think you should have as many
>> > secondarynamenode as you have secondarymaster (on the same machine?).
>> > 4) I'm not sure to understanding this question. Key are binary. Array
>> > of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
>> > This will only give you 2^32 different rows. You will have to
>> > pre-split them, or you will end with almost all of them on the same
>> > regionserver?
>> >
>> > JM
>> >
>> > 2012/10/30, Varun Sharma <[email protected]>:
>> > > Hi,
>> > >
>> > > We are planning to experiment with a cluster for serving production
>> > traffic
>> > > using hbase for pinterest. We are starting off with a 10 region
>> server +
>> > 1
>> > > master cluster on Amazon EMR version 0.92. I had some very naive
>> > questions
>> > > (primarily around points of failure):
>> > >
>> > > 1) It seems hbase starts only one zookeeper on the master node -
>> which is
>> > > critical for operation - how many zookeepers should I use and can I
>> run
>> > > those on the region servers ?
>> > > 2) How many masters to use - does hbase support multiple masters
>> (primary
>> > > and secondary) within the same cluster ? From my understanding, master
>> > > availability is not critical for operation.
>> > > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode
>> is a
>> > > single point of failure and we should really be running two name
>> node(s)
>> > so
>> > > we can failover. Is it fine to run these on the region servers ?
>> > > 4) Our current application involves long row/column - 24-32 bytes with
>> > 0-1
>> > > bytes of values. Should we be using a different key encoding than the
>> > > default encoding ? What advantages could it buy us ?
>> > >
>> > > We are currently using amazon EMR for testing purposes which runs
>> hbase
>> > > 0.92. If it works well, we would like to configure our own cluster
>> with
>> > > probably the latest version of hbase which appears to be 0.94 at the
>> > > moment.
>> > >
>> > > Thanks
>> > > Varun
>> > >
>> >
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Hbase cluster for serving real time site traffic

Reply via email to