Yep, it just occurred to me while answering you :) I'm the only dev who worked on the replication stuff, any contribution or just testing out the software is really appreciated.
J-D On Thu, Mar 3, 2011 at 12:10 PM, Otis Gospodnetic <[email protected]> wrote: > Aha, so the fact that the age doesn't change when replication keeps retrying > is > really a bug? > > Otis > > > > > ----- Original Message ---- >> From: Jean-Daniel Cryans <[email protected]> >> To: [email protected] >> Sent: Thu, March 3, 2011 2:17:08 PM >> Subject: Re: Questions about HBase Cluster Replication >> >> No it's the age in ms: >> >> ageOfLastAppliedOp.set(System.currentTimeMillis() - timestamp); >> >> And the timestamp is the one given to the HLogEdit, not the timestamp >> of the cell. >> >> J-D >> >> On Thu, Mar 3, 2011 at 11:13 AM, Otis Gospodnetic >> <[email protected]> wrote: >> > Is that really the *age* really the *timestamp* of last successful log >>shipment? >> > If so, one could calculate the real age with age = now() - >> > ageOfLastShippedOnWhichIsReallyTimestamp . And that would be useful to >>have. >> > >> > Otis >> > ---- >> > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >> > Lucene ecosystem search :: http://search-lucene.com/ >> > >> > >> > >> > ----- Original Message ---- >> >> From: Jean-Daniel Cryans <[email protected]> >> >> To: [email protected] >> >> Sent: Thu, March 3, 2011 12:21:09 PM >> >> Subject: Re: Questions about HBase Cluster Replication >> >> >> >> It's a work in progress, that information is currently published by >> >> every region server in the master cluster (since it's push >> >> replication, not pull) through JMX under the name >> >> "ageOfLastShippedOp". It's really not perfect though, since if it >> >> fails to replicate and starts retrying then the age won't change but >> >> the actual lag will go up. Also it will have to be revisited when we >> >> add multiple slaves since you don't really want to publish the same >> >> metric for multiple slaves... it really wouldn't work. >> >> >> >> J-D >> >> >> >> On Thu, Mar 3, 2011 at 9:10 AM, Bill Graham <[email protected]> >> >> wrote: >> >> > Actually, how far behind replication is w.r.t. edit logs is different >> >> > than how out of sync they are, but you get the idea. >> >> > >> >> > On Thu, Mar 3, 2011 at 9:07 AM, Bill Graham <[email protected]> >> wrote: >> >> >> One more question for the FAQ: >> >> >> >> >> >> 6. Is it possible for an admin to tell just how out of sync the two >> >> >> clusters are? Something like Seconds_Behind_Master in MySQL's SHOW >> >> >> SLAVE STATUS? >> >> >> >> >> >> >> >> >> On Wed, Mar 2, 2011 at 9:32 PM, Jean-Daniel Cryans >><[email protected]> >> >>wrote: >> >> >>> Although, I would add that this feature is still experimental so >> >> who >>knows >> >>:) >> >> >>> >> >> >>> I think the worst that happened to us was that replication was >> >> >>> broken >> >> >>> (see the jira where if the master loses it's zk session with the >>slave >> >> >>> zk ensemble, it requires a HBase restart on the master side) for a >> few >> >> >>> days because of maintenance of the link between the two datacenters >> >> >>> which took more than a minute. When we finally did restart the >> >> >>> master >> >> >>> cluster, it had to process about 2TBs of HLogs... those ICVs can >> >> >>> really generate a lot of data! >> >> >>> >> >> >>> J-D >> >> >>> >> >> >>> On Wed, Mar 2, 2011 at 9:25 PM, Jean-Daniel Cryans >><[email protected]> >> >>wrote: >> >> >>>>> 5. If one is adding replication on the *production* Master > cluster, >> >>what's the >> >> >>>>> worst thing that can happen to this Master cluster? Nothing scary >>other >> >>than >> >> >>>>> changing configs + interruption during a restart? (which is >>currently >> >>still bad >> >> >>>>> because of region assignments?) >> >> >>>>> >> >> >>>> >> >> >>>> The replication code is pretty much encapsulated from the rest of >> the >> >> >>>> region server code, it won't mess with your Puts or change your >> >> >>>> birthday date. >> >> >>>> >> >> >>>> With 0.90 the regions are reassigned where they were before, so > it's >> >> >>>> really just the block cache that gets screwed. >> >> >>>> >> >> >>>> J-D >> >> >>>> >> >> >>> >> >> >> >> >> > >> >> >> > >> >
