There are some corner cases that could lead you to lose data depending on your 
setting, even if forceSync is enabled. For example, if your disk write cache is 
enabled, then there are some sequences of events that could lead you to lose 
updates. With the disk write cache enabled, updates forced to disk could be 
lost locally, and depending on how many copies exist across servers, it may not 
be recovered.

Options I'm aware of to get around this are to use write barriers, 
battery-backed raid controllers, or other solution that uses some form of 
non-volatile memory. I must also say that I'm not aware of any such a case 
happening with production use. We observed it in lab experiments, though.  

-Flavio

On Jun 16, 2012, at 2:33 AM, Patrick Hunt wrote:

> On Fri, Jun 15, 2012 at 12:45 PM, Raj N <[email protected]> wrote:
>> Can zookeeper recover from a
>> corrupt transaction log using existing snapshots and then replaying
>> messages from its peers?
> 
> A server will try to recover as best it can (using the snaps/logs it
> has available), and then talk to the other servers in the quorum to
> see if anyone else has a more recent committed change. In the case
> where it doesn't it will download what's necessary to get in sync with
> the new leader.
> 
> What might have happened in your case is that you hit a bug, perhaps a
> type of corruption that we don't handle successfully. e.g. see
> ZOOKEEPER-1449
> 
> Patrick

Reply via email to