Re: multi-data center support

Jean-Daniel Cryans Thu, 03 May 2012 15:00:32 -0700

Well if you need a MR job to aggregate the counters maybe you have too
many data centers? :)


What I meant is that each physical counter should be tagged with the
data center its in. You want to count page_views so you'd have:

page_views_CA => 5
page_views_NY => 10
page_views_FL => 2

So on read you get those three and treat it as one, eg 17. No need to
do massive rollups unless you're really planning on having thousands
of data centers.

J-D

On Thu, May 3, 2012 at 2:39 PM, Marco Villalobos
<[email protected]> wrote:
> Hence a counter should be local to a data-center, and perhaps a map
> reduce job can aggregate them later, then replicate?
>
> I hope something like that works.
>
> On Thu, May 3, 2012 at 1:23 PM, Jean-Daniel Cryans <[email protected]> 
> wrote:
>> Since 0.92 you can replicate in a Master-Master fashion if you want,
>> just set each cluster to be the slave of the other, but it won't work
>> for counters. The reason is that a counter is a "Put" in the end with
>> a specific value.
>>
>> This issue is described here: 
>> https://issues.apache.org/jira/browse/HBASE-2804
>>
>> One way to solve it is to shard your counters, on read you just sum them up.
>>
>> J-D
>>
>> On Thu, May 3, 2012 at 1:14 PM, Marco Villalobos
>> <[email protected]> wrote:
>>> I'm fine with replication.
>>>
>>> But does that mean I can only write from one data-center?
>>>
>>> Ideally I would want counters to work across data-center, with the
>>> correct increment eventually merging.
>>>
>>> On Thu, May 3, 2012 at 11:26 AM, Jean-Daniel Cryans <[email protected]> 
>>> wrote:
>>>> A single HBase instance doesn't work across datacenters, maybe that's
>>>> why you haven't found any documentation.
>>>>
>>>> HBase does have replication between clusters, see
>>>> http://hbase.apache.org/replication.html
>>>>
>>>> J-D
>>>>
>>>> On Thu, May 3, 2012 at 11:10 AM, Marco Villalobos
>>>> <[email protected]> wrote:
>>>>> I have not found any documentation on how hbase would work across
>>>>> multiple data-centers.
>>>>>
>>>>> In fact, I am concerned about how a centralized zookeeper would make
>>>>> multi-data center support impossible.
>>>>>
>>>>> How is this handled?  What if somebody needs to read and write from
>>>>> multiple data-centers?
>>>>>
>>>>> Any advice?

Re: multi-data center support

Reply via email to