See https://issues.apache.org/jira/browse/ZOOKEEPER-919
2010/12/2 Benjamin Reed <[email protected]> > Chang, this is indeed a serious bug. it would be great if we could > reproduce it reliably. could you confirm the version of code you are > using. could you include enough detail that we could try to reproduce it > on our cluster? > > thanx > ben > > On 12/01/2010 07:05 AM, Vishal Kher wrote: > > Agreed with Chang on all fronts. I will repro the problem and upload > logs. > > > > 2010/12/1 Chang Song <[email protected]> > > > >> I think it is not too difficult to reproduce. > >> Just create 3 node ensemble, and have some clients create ephemeral > nodes. > >> And then kill one of ensemble by kill -9. > >> I don't remember it was a leader or a follower. > >> > >> and then if you see those ephemeral nodes gone, restart the ensemble > Java > >> process. > >> > >> I think I have seen this happening twice when I continued this same > >> experiment multiple times. > >> > >> I am not trying to create FUD around Zookeeper. Actually it is exact > >> opposite. > >> I fell in love with Zookeeper, and I still am. I am using Zookeeper for > >> our production system. > >> In fact, it is THE only Java solution I believe in. Really. > >> > >> I just couldn't find time to reproduce and report a bug. > >> > >> Chang > >> > >> > >> Dec 1, 2010, 11:08 PM, Fournier, Camille F. [Tech] 작성: > >> > >>> Would love to hear more about your ensemble settings to try and > recreate > >> this issue. Would be a very bad thing for my deployment as well... > >>> Camille > >>> > >>> ----- Original Message ----- > >>> From: Chang Song <[email protected]> > >>> To: [email protected] <[email protected]> > >>> Cc: [email protected] <[email protected] > > > >>> Sent: Wed Dec 01 08:09:30 2010 > >>> Subject: Re: question about ZK robustness > >>> > >>> > >>> Ted. > >>> > >>> I have been inconsistency between different ensemble servers when we > did > >>> some torture testing. > >>> > >>> I killed Java process with -9 on one ensemble server, and restarted it, > >> and saw > >>> that ephemeral nodes that disappeared from other two ensemble servers > >> stuck in > >>> newly restarted ensemble. No matter what I do, "create, sync, get", the > >> ephemeral > >>> nodes did not disappear. I had to remove the log and force re-sync > from > >> scratch. > >>> I had seen this behavior twice. Exactly the same behavior. I had about > >> 2000 clients connected > >>> ensemble servers. I had no time to file a bug report, but when I have > >> time to do another > >>> torture testing, I will definitely file a bug report. > >>> > >>> This is not a data loss, but a serious, dead serious inconsistency as > far > >> as my application goes. > >>> Please let me know if you happened to know related bug. > >>> > >>> Thank you. > >>> > >>> Chang > >>> > >>> > >>> Dec 1, 2010, 1:41 PM, Ted Dunning 작성: > >>> > >>>> Sure. Let me know when. I have learned a bit more from Ben since I > >> wrote > >>>> that first bit so I could amplify the exposition > >>>> just a bit when the time comes. > >>>> > >>>> On Tue, Nov 30, 2010 at 8:07 PM, Mahadev Konar <[email protected] > >>> wrote: > >>>>> I meant to say, we can wait a while before we are done moving to the > >> new > >>>>> wiki tree. > >>>>> > >> > >
