Well if you need a MR job to aggregate the counters maybe you have too many data centers? :)
What I meant is that each physical counter should be tagged with the data center its in. You want to count page_views so you'd have: page_views_CA => 5 page_views_NY => 10 page_views_FL => 2 So on read you get those three and treat it as one, eg 17. No need to do massive rollups unless you're really planning on having thousands of data centers. J-D On Thu, May 3, 2012 at 2:39 PM, Marco Villalobos <[email protected]> wrote: > Hence a counter should be local to a data-center, and perhaps a map > reduce job can aggregate them later, then replicate? > > I hope something like that works. > > On Thu, May 3, 2012 at 1:23 PM, Jean-Daniel Cryans <[email protected]> > wrote: >> Since 0.92 you can replicate in a Master-Master fashion if you want, >> just set each cluster to be the slave of the other, but it won't work >> for counters. The reason is that a counter is a "Put" in the end with >> a specific value. >> >> This issue is described here: >> https://issues.apache.org/jira/browse/HBASE-2804 >> >> One way to solve it is to shard your counters, on read you just sum them up. >> >> J-D >> >> On Thu, May 3, 2012 at 1:14 PM, Marco Villalobos >> <[email protected]> wrote: >>> I'm fine with replication. >>> >>> But does that mean I can only write from one data-center? >>> >>> Ideally I would want counters to work across data-center, with the >>> correct increment eventually merging. >>> >>> On Thu, May 3, 2012 at 11:26 AM, Jean-Daniel Cryans <[email protected]> >>> wrote: >>>> A single HBase instance doesn't work across datacenters, maybe that's >>>> why you haven't found any documentation. >>>> >>>> HBase does have replication between clusters, see >>>> http://hbase.apache.org/replication.html >>>> >>>> J-D >>>> >>>> On Thu, May 3, 2012 at 11:10 AM, Marco Villalobos >>>> <[email protected]> wrote: >>>>> I have not found any documentation on how hbase would work across >>>>> multiple data-centers. >>>>> >>>>> In fact, I am concerned about how a centralized zookeeper would make >>>>> multi-data center support impossible. >>>>> >>>>> How is this handled? What if somebody needs to read and write from >>>>> multiple data-centers? >>>>> >>>>> Any advice?
