Re: Questions about HBase replication

Varun Sharma Mon, 20 May 2013 16:33:40 -0700

So, we have a separate thread doing the recovered logs. That is good to
know. I was mostly concerned about any potential races b/w the master
renaming the log files, doing the distributed log split and doing a lease
recovery over the final file when the DN also dies. Apart from that, it
seemed to me since the master has an authoritative view of the cluster, it
could do the log assignment better in wake of failures (possibly also using
any block placements etc). However, I dont have data to illustrate that
this is a must have but it looked like a somewhat cleaner solution and
would only require each region server to care about its own replication
znode.

One thing i am seeing in the region server logs though is that the deletion
from the zookeeper is taking 30 minutes to an hour after the whole WAL is
replicated - there are two outstanding WAL(s) and only the newer one is
being replicated - only the znode of the newer WAL is getting updated while
the older WAL is just lying around. Still digging into it... (this is on
0.94.7)

Thanks
Varun

On Mon, May 20, 2013 at 4:14 PM, Jean-Daniel Cryans <[email protected]>wrote:

> > Yes, but the region server now has 2X the number of WAL(s) to replicate
> and
> > could suffer higher replication lag as a result...
>
> In my experience this hasn't been an issue. Keep in mind that the RS
> will only replicate what's in the queue when it was recovered and
> nothing more. It means you have one more thread reading from a likely
> remote disk (low penalty), then it has to build its own set of edits
> to replicate (unless you are already severly CPU contented that won't
> be an issue), then you send those edits to the other cluster (unless
> you are already filling that machine's pipe, it won't be an issue).
>
> Was there anything you were thinking about? You'd rather spread those
> logs to a bunch of machines?
>
> J-D
>

Re: Questions about HBase replication

Reply via email to