Hello again, unfortunately, it doesn't seem to be a silly mistake. I have dumped the snapshots using SnapshotFormatter and everything in the three snapshots matches, except for the missining info in one of the three servers. I have reopened ZOOKEEPER-1449 because of this. I hope it is possible to figure out what happened. Thanks for your help, Germán.
On Sat, Sep 28, 2013 at 8:33 AM, German Blanco < [email protected]> wrote: > Thank you, I will check, I must admit I haven't search for this enough :-( > I have also noticed that there is a development option to force > synchronisation via snapshot for 3.5.0. That should avoid the problem. > There is anyway something strange that I have noticed. There were two > nodes showing up in one of the followers of the ensemble (3.4.5 with 3 > nodes) that were not there in the rest, and they were ephemeral nodes. I > don't think that it is easy for ephemeral nodes to be included in data > files from another ensemble, since normally the session that created them > wouldn't be there and they would expire. > Unfortunately, I don't have the logs anymore of when this happened. They > were running with DEBUG and they rotated. > Any ideas? > > > On Fri, Sep 27, 2013 at 2:00 AM, Alexander Shraer <[email protected]>wrote: > >> Some time in the past we were discussing adding a unique identifier >> for each ensemble in the config files and checking it. For example >> when a server tries to connect to the leader. I'm not sure if the is a >> Jira >> for this. >> >> >> On Wed, Sep 25, 2013 at 7:44 AM, German Blanco < >> [email protected]> wrote: >> >> > Exactly. >> > I know it is silly, but I think this is what happened, and I would feel >> > better if there was a way to avoid it to happen again. >> > >> > >> > On Wed, Sep 25, 2013 at 4:37 AM, Benjamin Reed <[email protected]> >> wrote: >> > >> > > when you say inconsistent transaction log, are you talking about a >> > > transaction log from a different ensemble instance? >> > > >> > > for example, you ran zookeeper and did some things. then you reset the >> > all >> > > the servers but one and restarted everything. >> > > >> > > ben >> > > >> > > >> > > On Tue, Sep 24, 2013 at 5:45 AM, German Blanco < >> > > [email protected]> wrote: >> > > >> > > > Hello, >> > > > I have run into this situation a couple of times. >> > > > Because of an error, one ZooKeeper server in the ensemble is started >> > > > with an inconsistent transaction log. This leads to serious and >> > difficult >> > > > to trace problems, until you notice that clients connected to one of >> > the >> > > > servers see a different data tree than the others. >> > > > I would really like to avoid this, and it happens that the amount of >> > data >> > > > in my data tree is not that much (around 40 kBytes). So I would >> like to >> > > > propose a new option to force synchronization via snapshot in the >> > > ZooKeeper >> > > > Leader. >> > > > Any opinions? >> > > > Any other options? >> > > > Regards, >> > > > >> > > > Germán Blanco. >> > > > >> > > >> > >> > >
