Agreed with Chang on all fronts. I will repro the problem and upload logs. 2010/12/1 Chang Song <[email protected]>
> > I think it is not too difficult to reproduce. > Just create 3 node ensemble, and have some clients create ephemeral nodes. > And then kill one of ensemble by kill -9. > I don't remember it was a leader or a follower. > > and then if you see those ephemeral nodes gone, restart the ensemble Java > process. > > I think I have seen this happening twice when I continued this same > experiment multiple times. > > I am not trying to create FUD around Zookeeper. Actually it is exact > opposite. > I fell in love with Zookeeper, and I still am. I am using Zookeeper for > our production system. > In fact, it is THE only Java solution I believe in. Really. > > I just couldn't find time to reproduce and report a bug. > > Chang > > > Dec 1, 2010, 11:08 PM, Fournier, Camille F. [Tech] 작성: > > > Would love to hear more about your ensemble settings to try and recreate > this issue. Would be a very bad thing for my deployment as well... > > > > Camille > > > > ----- Original Message ----- > > From: Chang Song <[email protected]> > > To: [email protected] <[email protected]> > > Cc: [email protected] <[email protected]> > > Sent: Wed Dec 01 08:09:30 2010 > > Subject: Re: question about ZK robustness > > > > > > Ted. > > > > I have been inconsistency between different ensemble servers when we did > > some torture testing. > > > > I killed Java process with -9 on one ensemble server, and restarted it, > and saw > > that ephemeral nodes that disappeared from other two ensemble servers > stuck in > > newly restarted ensemble. No matter what I do, "create, sync, get", the > ephemeral > > nodes did not disappear. I had to remove the log and force re-sync from > scratch. > > > > I had seen this behavior twice. Exactly the same behavior. I had about > 2000 clients connected > > ensemble servers. I had no time to file a bug report, but when I have > time to do another > > torture testing, I will definitely file a bug report. > > > > This is not a data loss, but a serious, dead serious inconsistency as far > as my application goes. > > Please let me know if you happened to know related bug. > > > > Thank you. > > > > Chang > > > > > > Dec 1, 2010, 1:41 PM, Ted Dunning 작성: > > > >> Sure. Let me know when. I have learned a bit more from Ben since I > wrote > >> that first bit so I could amplify the exposition > >> just a bit when the time comes. > >> > >> On Tue, Nov 30, 2010 at 8:07 PM, Mahadev Konar <[email protected] > >wrote: > >> > >>> I meant to say, we can wait a while before we are done moving to the > new > >>> wiki tree. > >>> > > > >
