Just to update everyone, finally i was able to root cause the issue and it seems to be
https://issues.apache.org/jira/browse/ZOOKEEPER-2901 which is related to node id being > 127. it's fixed in 3.5.4-beta and it works fine. Thanks Harish On Wed, Jun 13, 2018 at 7:42 AM Andor Molnar <an...@cloudera.com> wrote: > Hi Harish, > > I see 2 things which need to be clarified here: > > 1. ZooKeeper session dies in 2 cases only: when client explicitly closes > the session (which is *not* equivalent to disconnection) or session timeout > expires, > 2. If quorum is not present, there'll be no updates committed and clients > are rejected to connect, so Kafka shouldn't be able to use the cluster. > > Similarly, when quorum comes back online, ZooKeeper will continue operating > normally: it receives client connections, performs updates and expire > sessions if necessary. > > I still believe therefore that your Kafka setup doesn't properly cleanup > znodes for some reason, but I'm not a Kafka expert. > > Regards, > Andor > > > > > On Wed, Jun 13, 2018 at 12:34 AM, harish lohar <hklo...@gmail.com> wrote: > > > Exactly , so in a case where there is jo quotum and no update can be > made , > > is there a way yo stop kafka failing to start. > > > > One way is to cleanup kafka related znodes after bringing up quorum and > > then starting kafka. > > > > I was looking to avoid this. > > > > > > On Tue, Jun 12, 2018 at 4:59 PM Brian Lininger <brian.linin...@veeva.com > > > > wrote: > > > > > Hi Harish, > > > I think I see what may be the problem for you. Based on your initial > > > description (6 ZK nodes, 3 down) I think the problem is that you no > > longer > > > have a quorum. When a Zookeeper cluster is running, updates (i.e. > > removing > > > znodes) can only occur when Zookeeper has a quorum, which 50.1% of the > > > configured Zookeeper nodes. If I understand correctly, then in your > case > > > you have 6 Zookeeper nodes configured but 3 are down. This means that > > you > > > only have 50.0% of the Zookeeper cluster working, and thus Zookeeper > does > > > not have a quorum so no updates can be made. I don't know much about > the > > > new TTL feature in 3.5, but my assumption is that it works on this same > > > principle which is that no updates can be made to the cluster's znodes > > when > > > there is no quorum. The same applies to the 3 Zookeeper node cluster, > > you > > > must have 2 nodes running to form a quorum and allow any updates to > > occur. > > > > > > Please correct me if I missed something.... > > > > > > Thanks, > > > Brian > > > > > > > > > On Tue, Jun 12, 2018 at 1:33 PM, harish lohar <hklo...@gmail.com> > wrote: > > > > > >> ---------- Forwarded message --------- > > >> From: harish lohar <hklo...@gmail.com> > > >> Date: Tue, Jun 12, 2018 at 3:26 PM > > >> Subject: Re: Kafka Failing to start due to existing ID > > >> To: <an...@apache.org> > > >> > > >> > > >> Hi Andor, > > >> > > >> Thanks for your reply. > > >> > > >> This issue is irrespective of number of nodes, even should be seen > with > > 3 > > >> Node cluster as well. > > >> > > >> Actually kafka has session_timeout config , but that seems to be in > > effect > > >> only if zookeeper cluster is up i.e. if kafka goes down when zookeeper > > >> cluster is up. > > >> > > >> Now let's say if 2 nodes of Zookeeper cluster is down , and then if > > kafka > > >> connected to 3rd Zookeeper Node goes down zookeeper cluster doesn't > > >> refresh > > >> the session for Kafka connected to 3rd Node. > > >> > > >> So when other Node comes up and zookeeper cluster becomes available it > > >> doesn't delete the id of the kafka which went down when zookeeper > > cluster > > >> was down. > > >> > > >> Regarding TTL I have already enquired the kafka forum and awaiting > > reply. > > >> > > >> Ideally once zookeper cluster is up , it should delete the kafka > broker > > >> id's which are not connected which doesn't seem to be happening > > >> > > >> I hope I am making some sense :) > > >> > > >> Thanks > > >> harish > > >> > > >> > > >> > > >> On Tue, Jun 12, 2018 at 2:59 PM Andor Molnár <an...@apache.org> > wrote: > > >> > > >> > Hi Harish, > > >> > > > >> > > > >> > I have a few questions to get some insight about your issue. > > >> > > > >> > 1. Why do run ZooKeeper with 6 nodes while odd number of nodes are > > >> > recommended (not an issue really, just for curiousity), > > >> > > > >> > 2. Does Kafka support ZK 3.5+ with TTL nodes? > > >> > > > >> > I think this is more of a Kafka question, but afaik Kafka doesn't > run > > >> and > > >> > cannot take advantage of 3.5 only features of ZK. Maybe I'm wrong, > > but I > > >> > think it has some cleanup mechanism to delete expired broker ids or > > you > > >> > must wait for the session to expire. > > >> > > > >> > > > >> > Regards, > > >> > > > >> > Andor > > >> > > > >> > > > >> > > > >> > On 06/12/2018 04:39 PM, harish lohar wrote: > > >> > > > >> > Hi All, > > >> > > > >> > Need help regarding below scenario if any configuration is available > > to > > >> > help. > > >> > > > >> > I have cluster of 6 nodes > > >> > 3 Nodes are stopped and brought up again, kafka fails to restart > > since > > >> > broker ID are still present in zookeeper znode /broker/ids/ > > >> > > > >> > Since the cluster goes down after removing 3 Nodes , session timeout > > >> > doesn't happen. > > >> > > > >> > Though i am aware about TTL feature in zookeeper , but how to make > > sure > > >> > kafka creates znodes with TTL > > >> > > > >> > Thanks > > >> > Harish > > >> > > > >> > > > >> > > > >> > > > >> > > > > > > > > > > > > -- > > > > > > [image: Veeva Systems - Zinc Team] > > > > > > *Brian Lininger* > > > Technical Architect, Infrastructure & Search > > > *Veeva Systems * > > > brian.linin...@veeva.com > > > www.veeva.com > > > > > > *This email and the information it contains are intended for the > intended > > > recipient only, are confidential and may be privileged information > exempt > > > from disclosure by law.* > > > *If you have received this email in error, please notify us immediately > > by > > > reply email and delete this message from your computer.* > > > *Please do not retain, copy or distribute this email.* > > > > > >