Re: [sheepdog] [PATCH V2 00/11] INTRODUCE

MORITA Kazutaka Mon, 20 Aug 2012 11:29:33 -0700

At Tue, 21 Aug 2012 00:29:50 +0800,
Liu Yuan wrote:
> 
> On 08/21/2012 12:07 AM, Christoph Hellwig wrote:
> > Another thing that sprang into mind is that instead of the formal
> > recovery enable/disable we should simply always delay recovery, that
> > is only do recovery after every N seconds if changes happened.
> > Especially in the cases of whole racks going up/down or upgrades that
> > dramatically reduces the number of epochs required, and thus reduces
> > the recovery overhead.
> > 
> > I didn't actually have time to look into the implementation implications
> > of this yet, it's just high level thoughs.
> 
> I think negatively to delay recovery all the time. It is useful to delay 
> recovery
> in some time window for maintenance or operational purposes, so I think the 
> idea
> only to delay recovery manually at some controlled window is useful, but if 
> we extend
> this to all the running time, it will bring cluster to a less safe state (if 
> not
> dangerous) at any point. (we only upgrade cluster/maintain individual node 
> only at some time,
> not all the time, no?)
> 
> Trading data reliability is always the last resort for a distributed system, 
> which highlights
> data reliability compared to single data instance in local disk.


I think delaying recovery for a few seconds always is useful for many
users.  Under heavy network load, sheep can wrongly detect node
failure and node membership can change frequently.  Delaying recovery
for a short time makes Sheepdog tolerant against such situation.

Thanks,

Kazutaka
-- 
sheepdog mailing list
[email protected]
http://lists.wpkg.org/mailman/listinfo/sheepdog

Re: [sheepdog] [PATCH V2 00/11] INTRODUCE

Reply via email to