I'm glad you don't view that map reduce as massive. Eventually, there would be many rows though.
take your example: > rowkey_1_CA => 5 > rowkey_1_NY => 10 > rowkey_1_FL => 2 will have to be rowkey_1 => 17 for this to be useful in my application. But there will be many rows. So yes, I think map reduce might be appropriate. On Thu, May 3, 2012 at 3:00 PM, Jean-Daniel Cryans <[email protected]> wrote: > Well if you need a MR job to aggregate the counters maybe you have too > many data centers? :) > > What I meant is that each physical counter should be tagged with the > data center its in. You want to count page_views so you'd have: > > page_views_CA => 5 > page_views_NY => 10 > page_views_FL => 2 > > So on read you get those three and treat it as one, eg 17. No need to > do massive rollups unless you're really planning on having thousands > of data centers. > > J-D > > On Thu, May 3, 2012 at 2:39 PM, Marco Villalobos > <[email protected]> wrote: >> Hence a counter should be local to a data-center, and perhaps a map >> reduce job can aggregate them later, then replicate? >> >> I hope something like that works. >> >> On Thu, May 3, 2012 at 1:23 PM, Jean-Daniel Cryans <[email protected]> >> wrote: >>> Since 0.92 you can replicate in a Master-Master fashion if you want, >>> just set each cluster to be the slave of the other, but it won't work >>> for counters. The reason is that a counter is a "Put" in the end with >>> a specific value. >>> >>> This issue is described here: >>> https://issues.apache.org/jira/browse/HBASE-2804 >>> >>> One way to solve it is to shard your counters, on read you just sum them up. >>> >>> J-D >>> >>> On Thu, May 3, 2012 at 1:14 PM, Marco Villalobos >>> <[email protected]> wrote: >>>> I'm fine with replication. >>>> >>>> But does that mean I can only write from one data-center? >>>> >>>> Ideally I would want counters to work across data-center, with the >>>> correct increment eventually merging. >>>> >>>> On Thu, May 3, 2012 at 11:26 AM, Jean-Daniel Cryans <[email protected]> >>>> wrote: >>>>> A single HBase instance doesn't work across datacenters, maybe that's >>>>> why you haven't found any documentation. >>>>> >>>>> HBase does have replication between clusters, see >>>>> http://hbase.apache.org/replication.html >>>>> >>>>> J-D >>>>> >>>>> On Thu, May 3, 2012 at 11:10 AM, Marco Villalobos >>>>> <[email protected]> wrote: >>>>>> I have not found any documentation on how hbase would work across >>>>>> multiple data-centers. >>>>>> >>>>>> In fact, I am concerned about how a centralized zookeeper would make >>>>>> multi-data center support impossible. >>>>>> >>>>>> How is this handled? What if somebody needs to read and write from >>>>>> multiple data-centers? >>>>>> >>>>>> Any advice?
