So, we have a separate thread doing the recovered logs. That is good to know. I was mostly concerned about any potential races b/w the master renaming the log files, doing the distributed log split and doing a lease recovery over the final file when the DN also dies. Apart from that, it seemed to me since the master has an authoritative view of the cluster, it could do the log assignment better in wake of failures (possibly also using any block placements etc). However, I dont have data to illustrate that this is a must have but it looked like a somewhat cleaner solution and would only require each region server to care about its own replication znode.
One thing i am seeing in the region server logs though is that the deletion from the zookeeper is taking 30 minutes to an hour after the whole WAL is replicated - there are two outstanding WAL(s) and only the newer one is being replicated - only the znode of the newer WAL is getting updated while the older WAL is just lying around. Still digging into it... (this is on 0.94.7) Thanks Varun On Mon, May 20, 2013 at 4:14 PM, Jean-Daniel Cryans <[email protected]>wrote: > > Yes, but the region server now has 2X the number of WAL(s) to replicate > and > > could suffer higher replication lag as a result... > > In my experience this hasn't been an issue. Keep in mind that the RS > will only replicate what's in the queue when it was recovered and > nothing more. It means you have one more thread reading from a likely > remote disk (low penalty), then it has to build its own set of edits > to replicate (unless you are already severly CPU contented that won't > be an issue), then you send those edits to the other cluster (unless > you are already filling that machine's pipe, it won't be an issue). > > Was there anything you were thinking about? You'd rather spread those > logs to a bunch of machines? > > J-D >
