I'm glad you don't view that map reduce as massive.  Eventually, there
would be many rows though.

take your example:

> rowkey_1_CA => 5
> rowkey_1_NY => 10
> rowkey_1_FL => 2

will have to be

rowkey_1 => 17

for this to be useful in my application.

But there will be many rows.  So yes, I think map reduce might be appropriate.




On Thu, May 3, 2012 at 3:00 PM, Jean-Daniel Cryans <[email protected]> wrote:
> Well if you need a MR job to aggregate the counters maybe you have too
> many data centers? :)
>
> What I meant is that each physical counter should be tagged with the
> data center its in. You want to count page_views so you'd have:
>
> page_views_CA => 5
> page_views_NY => 10
> page_views_FL => 2
>
> So on read you get those three and treat it as one, eg 17. No need to
> do massive rollups unless you're really planning on having thousands
> of data centers.
>
> J-D
>
> On Thu, May 3, 2012 at 2:39 PM, Marco Villalobos
> <[email protected]> wrote:
>> Hence a counter should be local to a data-center, and perhaps a map
>> reduce job can aggregate them later, then replicate?
>>
>> I hope something like that works.
>>
>> On Thu, May 3, 2012 at 1:23 PM, Jean-Daniel Cryans <[email protected]> 
>> wrote:
>>> Since 0.92 you can replicate in a Master-Master fashion if you want,
>>> just set each cluster to be the slave of the other, but it won't work
>>> for counters. The reason is that a counter is a "Put" in the end with
>>> a specific value.
>>>
>>> This issue is described here: 
>>> https://issues.apache.org/jira/browse/HBASE-2804
>>>
>>> One way to solve it is to shard your counters, on read you just sum them up.
>>>
>>> J-D
>>>
>>> On Thu, May 3, 2012 at 1:14 PM, Marco Villalobos
>>> <[email protected]> wrote:
>>>> I'm fine with replication.
>>>>
>>>> But does that mean I can only write from one data-center?
>>>>
>>>> Ideally I would want counters to work across data-center, with the
>>>> correct increment eventually merging.
>>>>
>>>> On Thu, May 3, 2012 at 11:26 AM, Jean-Daniel Cryans <[email protected]> 
>>>> wrote:
>>>>> A single HBase instance doesn't work across datacenters, maybe that's
>>>>> why you haven't found any documentation.
>>>>>
>>>>> HBase does have replication between clusters, see
>>>>> http://hbase.apache.org/replication.html
>>>>>
>>>>> J-D
>>>>>
>>>>> On Thu, May 3, 2012 at 11:10 AM, Marco Villalobos
>>>>> <[email protected]> wrote:
>>>>>> I have not found any documentation on how hbase would work across
>>>>>> multiple data-centers.
>>>>>>
>>>>>> In fact, I am concerned about how a centralized zookeeper would make
>>>>>> multi-data center support impossible.
>>>>>>
>>>>>> How is this handled?  What if somebody needs to read and write from
>>>>>> multiple data-centers?
>>>>>>
>>>>>> Any advice?

Reply via email to