German, today it had happen on our secondary cluster which consist of 3 nodes, the leader didn't see the node but two other followers did.
Flavio, I browsed the logs but was unable to find anything interesting, only setData operations were issued. Problematic znode was last modified at 13 Jan 2015 17:xx, we have noticed the issue at 14 Jan 2015 11:xx. 2015-01-14 10:52 GMT+01:00 Flavio Junqueira <[email protected]>: > Hi there, > I suggest a couple of things here: > - Use LogFormatter to look into the transaction logs to check the > operations that are actually coming across.- It would be nice be able to > reproduce it outside your app, ideally as a junit test so that we can start > working on it. > I vaguely remember coming across such a problem, but I'll need to dig into > it. Does anyone on this list recall a similar problem? > -Flavio > > On Wednesday, January 14, 2015 9:14 AM, Kuba Lekstan < > [email protected]> wrote: > > > > German do you have any idea what might be causing these? Today same issue > had happen. > > 2014-11-21 5:42 GMT+01:00 Yogesh Patil <[email protected]>: > > > Hi Zookeepers, > > I am also experiencing the similar problem since yestderday. I have > pretty > > much similar setup and ephemeral znodes in place for keep-alive kind of > > function. I too see in spite of ZK session going down, ephemeral znodes > > still LIVES. > > > > I am using ZK 3.5.0. > > > > Any solution/fix for this type of an issue?? > > > > > > -- > > Sincerely, > > > > *Yogesh Patil* > > > > > > > > On Thu, Nov 13, 2014 at 2:10 PM, Kuba Lekstan <[email protected]> wrote: > > > > > Sorry, forgot to mention. Version: 3.4.6. > > > > > > Thanks. > > > > > > 2014-11-13 18:11 GMT+01:00 German Blanco < > [email protected] > > >: > > > > > > > Hello, > > > > > > > > which version of Zookeeper are you using? > > > > > > > > On Thu, Nov 13, 2014 at 5:25 PM, Kuba Lekstan <[email protected]> > > wrote: > > > > > > > > > Hello, > > > > > > > > > > A bit of details: > > > > > We have 5 node cluster, which we use for configuration distrubution > > and > > > > > monitoring active instances of our applications. Each application > > > creates > > > > > its ephemeral node, so we know which apps are alive, how many of > them > > > > there > > > > > is and what they are doing. > > > > > > > > > > The problem had happen at 4th November, first time it was around > 4AM, > > > > > second time around 12PM. > > > > > First time it was middle of the night when I got woken up, the > > support > > > > guys > > > > > told me that something is wrong with config distribution. > > > > > > > > > > First I've checked apps for errors but didn't find anything > > > interesting, > > > > > then I looked at what's in zookeeper (using node-zk-browser). > > > > > I've noticed that there are 3 ephemeral nodes which were created at > > 1st > > > > nov > > > > > (while the oldest application was started on 3rd nov), I could read > > its > > > > > data but was not able to delete them - was getting NONODE > exception. > > > > > > > > > > I thought wtf - why I cannot delete these nodes, something very bad > > had > > > > to > > > > > happen with ZK. > > > > > > > > > > So I sshed on the leader and using CLI I tried to read these nodes > > but > > > I > > > > > was not able to - the leader was telling me that such nodes doesn't > > > > exist. > > > > > After this I started to ssh to the rest of the nodes in cluster and > > > > trying > > > > > to read these nodes. Finally I found the server which did let me > read > > > the > > > > > data of these nodes. > > > > > Because of the inconsistency I've decided to restart it. Restart > did > > > > help, > > > > > everything went back to normal state. The ephemeral nodes > > disappeared. > > > > > > > > > > Similar situation had happen at 12PM but this time I had a lot more > > > time > > > > to > > > > > look what is wrong. Second time the problem was about 3 ephemeral > > nodes > > > > > which were created at 1st now (again?). This time I dig a bit > deeper > > > and > > > > > look into logs and 4 letter commands - but could not find anything > > > > > interesting except the all these 3 nodes were created under > different > > > > > sessionids but zk had no hosts connected under this sessionids. > > > > > Solution was similar to the one from 4AM but this time I've delete > > all > > > > > files in ZK data directory. > > > > > > > > > > Oddly enough the problem happened twice on the same ZK node, the > > final > > > > > solution was to clear ZK data directory. After clearing the > directory > > > the > > > > > problem didn't happen again. > > > > > > > > > > I tried to look for solution/similar problems, I found the posts > > where > > > > > people were complaining about ephemeral nodes not being removed > after > > > > > client session gets closed. But I was not able to find posts about > ZK > > > not > > > > > being consistent. > > > > > > > > > > What do you think about this? Can we do something to fix this? > > > > > > > > > > Sorry for my english, I was doing my best. :) > > > > > > > > > > Thanks, Kuba. > > > > > > > > > > > > > > > > > > >
