Re: Port 3888 closed on Leader
Thanks Everyone. Yes, even i was wondering how can a node can be leader without port 3888 being open, we just did a restart and everything became fine. Port 3888 must be in LISTEN mode at all Zk Nodes , while 2888 only opens at leader and other nodes connect to it. Thanks Harish On Thu, Aug 23, 2018 at 6:47 AM Shawn Heisey wrote: > On 8/15/2018 7:46 AM, harish lohar wrote: > > In a deployment of 3 Node Zk Cluster we have seen that sometime port 3888 > > is absent after the cluster is formed , this causes Follower node to not > > able to connect to leader if they restart. > > > > Don't leader itself should come out of clustering if this happens ?? > > I'm not well-versed in how ZK works internally, and don't have access > any more to systems I can check, but I seem to remember when looking at > a live ensemble that not every ZK instance will bind to all three ports > (2181, 2888, and 3888 if using the example configs). Surprised me when > I noticed it, but I didn't worry about it too much since ZK seemed to be > working correctly. > > Thanks, > Shawn > >
Port 3888 closed on Leader
Hi, In a deployment of 3 Node Zk Cluster we have seen that sometime port 3888 is absent after the cluster is formed , this causes Follower node to not able to connect to leader if they restart. Don't leader itself should come out of clustering if this happens ?? Thanks Harish
Re: Port :3888 Bind failure
Another Question on the same line, are there any specific number of retries configured in zookeeper in case there is intermittent problem with the interface or they are infinite ?? On Mon, Jul 23, 2018 at 2:37 PM harish lohar wrote: > Thanks Andor, > We found some issue with our interface configuration and correcting same > has solved the issue. > > Thanks > Harish > > On Mon, Jul 23, 2018 at 11:22 AM Andor Molnar > wrote: > >> Hi, >> >> Is the IP address valid that you're trying to bind the server? >> Please tell me some info about your environment: cloud? docker? >> kubernetes? >> ZooKeeper config files would also be beneficial to take a look. >> >> Regards, >> Andor >> >> >> >> >> On Mon, Jul 23, 2018 at 5:02 PM, harish lohar wrote: >> >> > Hi , >> > >> > I am seeing bind failure for zookeeper ports, these are random and not >> > easily reproducible. >> > There was no one else listening on these ports. >> > >> > We have recently upgraded to 3.5.4-beta , earlier i never saw this issue >> > >> > 2018-07-22 00:08:59,409 [myid:] - WARN [main:QuorumPeerConfig@644] - >> > Non-optimial configuration, consider an odd number of servers. >> > 2018-07-22 00:08:59,707 [myid:181] - ERROR >> > [/xx.xx.xxx.xxx:3888:QuorumCnxManager$Listener@878] - Exception while >> > listening >> > java.net.BindException: Cannot assign requested address (Bind failed) >> > at java.net.PlainSocketImpl.socketBind(Native Method) >> > at java.net >> .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) >> > at java.net.ServerSocket.bind(ServerSocket.java:375) >> > at java.net.ServerSocket.bind(ServerSocket.java:329) >> > at >> > org.apache.zookeeper.server.quorum.QuorumCnxManager$ >> > Listener.run(QuorumCnxManager.java:856) >> > ket.bind(ServerSocket.java:329) >> > >> >
Re: Port :3888 Bind failure
Thanks Andor, We found some issue with our interface configuration and correcting same has solved the issue. Thanks Harish On Mon, Jul 23, 2018 at 11:22 AM Andor Molnar wrote: > Hi, > > Is the IP address valid that you're trying to bind the server? > Please tell me some info about your environment: cloud? docker? kubernetes? > ZooKeeper config files would also be beneficial to take a look. > > Regards, > Andor > > > > > On Mon, Jul 23, 2018 at 5:02 PM, harish lohar wrote: > > > Hi , > > > > I am seeing bind failure for zookeeper ports, these are random and not > > easily reproducible. > > There was no one else listening on these ports. > > > > We have recently upgraded to 3.5.4-beta , earlier i never saw this issue > > > > 2018-07-22 00:08:59,409 [myid:] - WARN [main:QuorumPeerConfig@644] - > > Non-optimial configuration, consider an odd number of servers. > > 2018-07-22 00:08:59,707 [myid:181] - ERROR > > [/xx.xx.xxx.xxx:3888:QuorumCnxManager$Listener@878] - Exception while > > listening > > java.net.BindException: Cannot assign requested address (Bind failed) > > at java.net.PlainSocketImpl.socketBind(Native Method) > > at java.net > .AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) > > at java.net.ServerSocket.bind(ServerSocket.java:375) > > at java.net.ServerSocket.bind(ServerSocket.java:329) > > at > > org.apache.zookeeper.server.quorum.QuorumCnxManager$ > > Listener.run(QuorumCnxManager.java:856) > > ket.bind(ServerSocket.java:329) > > >
Port :3888 Bind failure
Hi , I am seeing bind failure for zookeeper ports, these are random and not easily reproducible. There was no one else listening on these ports. We have recently upgraded to 3.5.4-beta , earlier i never saw this issue 2018-07-22 00:08:59,409 [myid:] - WARN [main:QuorumPeerConfig@644] - Non-optimial configuration, consider an odd number of servers. 2018-07-22 00:08:59,707 [myid:181] - ERROR [/xx.xx.xxx.xxx:3888:QuorumCnxManager$Listener@878] - Exception while listening java.net.BindException: Cannot assign requested address (Bind failed) at java.net.PlainSocketImpl.socketBind(Native Method) at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387) at java.net.ServerSocket.bind(ServerSocket.java:375) at java.net.ServerSocket.bind(ServerSocket.java:329) at org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:856) ket.bind(ServerSocket.java:329)
Re: ZooKeeper Cluster Health Checking
We did it via java monitoring app , using zookeeper java api which sends 4 lw commands to zookeeper and returns the output. Thanks Harish On Tue, Jul 17, 2018 at 2:00 AM adrien ruffie wrote: > Hi Harish, > > > thank you very much for this advise and explanation ! > > Do you think with just a simple script shell for checking all this metrics > is enough ? Or would better to do it in a Java with a simple monitoring > application? > > > Thank again, > > > Best regards, > > > Adrien > > > De : harish lohar > Envoyé : mardi 17 juillet 2018 04:13:51 > À : user@zookeeper.apache.org > Objet : Re: ZooKeeper Cluster Health Checking > > Hi Adrian, > Below zookeeper commands are generally used to get health of zookeeper > cluster > stat > > Lists brief details for the server and connected clients. > > usage echo stat | nc server port > > This gives whether cluster is up /down. If down this will give that > > Zookeeper instance is currently not serving any request - which means > either the leader election is failing or <= 50% of zookeeper node in > cluster are down. > > > mntr > > *New in 3.4.0:* Outputs a list of variables that could be used for > monitoring the health of the cluster. > > $ echo mntr | nc localhost 2185 > > zk_version 3.4.0 > zk_avg_latency 0 > zk_max_latency 0 > zk_min_latency 0 > zk_packets_received 70 > zk_packets_sent 69 > zk_outstanding_requests 0 > zk_server_state leader > zk_znode_count 4 > zk_watch_count 0 > zk_ephemerals_count 0 > zk_approximate_data_size27 > zk_followers4 - only exposed by the Leader > zk_synced_followers 4 - only exposed by the Leader > zk_pending_syncs0 - only exposed by the Leader > zk_open_file_descriptor_count 23- only available on Unix platforms > zk_max_file_descriptor_count 1024 - only available on Unix platforms > > The output is compatible with java properties format and the content may > change over time (new keys added). Your scripts should expect changes. > > ATTENTION: Some of the keys are platform specific and some of the keys are > only exported by the Leader. > > The output contains multiple lines with the following format: > > > On Mon, Jul 16, 2018 at 10:13 AM adrien ruffie > wrote: > > > Hello all, > > > > > > In my company we have a Zookeeper production cluster. > > > > > > But we don't really know how can we check the health of our cluster... > > > > > > Can we advise us about this topic ? > > > > > > I know this topic may has been cropping up for a while, but I don't > really > > found any concrete solution. > > > > > > Do you use a monitoring tools ? Which can launch alert ? > > > > What metrics/properties/any thing which can indicate that our cluster > > isn't in good health. > > > > > > Thank you very much and best regards > > > > > > Adrien > > >
Re: ZooKeeper Cluster Health Checking
Hi Adrian, Below zookeeper commands are generally used to get health of zookeeper cluster stat Lists brief details for the server and connected clients. usage echo stat | nc server port This gives whether cluster is up /down. If down this will give that Zookeeper instance is currently not serving any request - which means either the leader election is failing or <= 50% of zookeeper node in cluster are down. mntr *New in 3.4.0:* Outputs a list of variables that could be used for monitoring the health of the cluster. $ echo mntr | nc localhost 2185 zk_version 3.4.0 zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 70 zk_packets_sent 69 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 4 zk_watch_count 0 zk_ephemerals_count 0 zk_approximate_data_size27 zk_followers4 - only exposed by the Leader zk_synced_followers 4 - only exposed by the Leader zk_pending_syncs0 - only exposed by the Leader zk_open_file_descriptor_count 23- only available on Unix platforms zk_max_file_descriptor_count 1024 - only available on Unix platforms The output is compatible with java properties format and the content may change over time (new keys added). Your scripts should expect changes. ATTENTION: Some of the keys are platform specific and some of the keys are only exported by the Leader. The output contains multiple lines with the following format: On Mon, Jul 16, 2018 at 10:13 AM adrien ruffie wrote: > Hello all, > > > In my company we have a Zookeeper production cluster. > > > But we don't really know how can we check the health of our cluster... > > > Can we advise us about this topic ? > > > I know this topic may has been cropping up for a while, but I don't really > found any concrete solution. > > > Do you use a monitoring tools ? Which can launch alert ? > > What metrics/properties/any thing which can indicate that our cluster > isn't in good health. > > > Thank you very much and best regards > > > Adrien >
Re: Kafka Failing to start due to existing ID
Just to update everyone, finally i was able to root cause the issue and it seems to be https://issues.apache.org/jira/browse/ZOOKEEPER-2901 which is related to node id being > 127. it's fixed in 3.5.4-beta and it works fine. Thanks Harish On Wed, Jun 13, 2018 at 7:42 AM Andor Molnar wrote: > Hi Harish, > > I see 2 things which need to be clarified here: > > 1. ZooKeeper session dies in 2 cases only: when client explicitly closes > the session (which is *not* equivalent to disconnection) or session timeout > expires, > 2. If quorum is not present, there'll be no updates committed and clients > are rejected to connect, so Kafka shouldn't be able to use the cluster. > > Similarly, when quorum comes back online, ZooKeeper will continue operating > normally: it receives client connections, performs updates and expire > sessions if necessary. > > I still believe therefore that your Kafka setup doesn't properly cleanup > znodes for some reason, but I'm not a Kafka expert. > > Regards, > Andor > > > > > On Wed, Jun 13, 2018 at 12:34 AM, harish lohar wrote: > > > Exactly , so in a case where there is jo quotum and no update can be > made , > > is there a way yo stop kafka failing to start. > > > > One way is to cleanup kafka related znodes after bringing up quorum and > > then starting kafka. > > > > I was looking to avoid this. > > > > > > On Tue, Jun 12, 2018 at 4:59 PM Brian Lininger > > > wrote: > > > > > Hi Harish, > > > I think I see what may be the problem for you. Based on your initial > > > description (6 ZK nodes, 3 down) I think the problem is that you no > > longer > > > have a quorum. When a Zookeeper cluster is running, updates (i.e. > > removing > > > znodes) can only occur when Zookeeper has a quorum, which 50.1% of the > > > configured Zookeeper nodes. If I understand correctly, then in your > case > > > you have 6 Zookeeper nodes configured but 3 are down. This means that > > you > > > only have 50.0% of the Zookeeper cluster working, and thus Zookeeper > does > > > not have a quorum so no updates can be made. I don't know much about > the > > > new TTL feature in 3.5, but my assumption is that it works on this same > > > principle which is that no updates can be made to the cluster's znodes > > when > > > there is no quorum. The same applies to the 3 Zookeeper node cluster, > > you > > > must have 2 nodes running to form a quorum and allow any updates to > > occur. > > > > > > Please correct me if I missed something > > > > > > Thanks, > > > Brian > > > > > > > > > On Tue, Jun 12, 2018 at 1:33 PM, harish lohar > wrote: > > > > > >> -- Forwarded message - > > >> From: harish lohar > > >> Date: Tue, Jun 12, 2018 at 3:26 PM > > >> Subject: Re: Kafka Failing to start due to existing ID > > >> To: > > >> > > >> > > >> Hi Andor, > > >> > > >> Thanks for your reply. > > >> > > >> This issue is irrespective of number of nodes, even should be seen > with > > 3 > > >> Node cluster as well. > > >> > > >> Actually kafka has session_timeout config , but that seems to be in > > effect > > >> only if zookeeper cluster is up i.e. if kafka goes down when zookeeper > > >> cluster is up. > > >> > > >> Now let's say if 2 nodes of Zookeeper cluster is down , and then if > > kafka > > >> connected to 3rd Zookeeper Node goes down zookeeper cluster doesn't > > >> refresh > > >> the session for Kafka connected to 3rd Node. > > >> > > >> So when other Node comes up and zookeeper cluster becomes available it > > >> doesn't delete the id of the kafka which went down when zookeeper > > cluster > > >> was down. > > >> > > >> Regarding TTL I have already enquired the kafka forum and awaiting > > reply. > > >> > > >> Ideally once zookeper cluster is up , it should delete the kafka > broker > > >> id's which are not connected which doesn't seem to be happening > > >> > > >> I hope I am making some sense :) > > >> > > >> Thanks > > >> harish > > >> > > >> > > >> > > >> On Tue, Jun 12, 2018 at 2:59 PM Andor Molnár > wrote: > > >> > > >> > Hi Harish, > &g
How to new quorum leader in ZK Cluster ( except from stat command)
Hi, Is there a way to query on any follower node and find out about the leader of the ZK cluster. Thanks Harish
Kafka Failing to start due to existing ID
Hi All, Need help regarding below scenario if any configuration is available to help. I have cluster of 6 nodes 3 Nodes are stopped and brought up again, kafka fails to restart since broker ID are still present in zookeeper znode /broker/ids/ Since the cluster goes down after removing 3 Nodes , session timeout doesn't happen. Though i am aware about TTL feature in zookeeper , but how to make sure kafka creates znodes with TTL Thanks Harish
Non-incremental reconfig failing while trying to bind to same local client port
Hi All, Need help resolving below issue: 2018-05-10 00:59:16,584 [myid:1] - WARN [RecvWorker:3:QuorumCnxManager$RecvWorker@922] - Interrupting SendWorker 2018-05-10 00:59:16,584 [myid:1] - INFO [QuorumPeerListener:QuorumCnxManager$Listener@636] - My election bind port: /10.60.11.240:3888 2018-05-10 00:59:16,584 [myid:1] - INFO [QuorumPeer[myid=1](plain=/127.0.0.1:2181 )(secure=disabled):NIOServerCnxnFactory@706] - binding to port localhost/ 127.0.0.1:2181 2018-05-10 00:59:16,585 [myid:1] - ERROR [QuorumPeer[myid=1](plain=/127.0.0.1:2181 )(secure=disabled):NIOServerCnxnFactory@722] - Error reconfiguring client port to localhost/127.0.0.1:2181 Address already in use
Re: removing ZK installation
Could someone please let me know where to get RPM for Centos for Zookeeper. Thanks Harish On Tue, May 8, 2018 at 1:57 PM, Washko, Danielwrote: > Steve, how was zookeeper installed? That should be the method with which > you remove it. > > If you are not sure how it was installed, you can do: > > rpm -qa |grep zookeeper > > To determine whether it was installed via an RPM package. If that does not > unearth a matching RPM then it was probably installed some other way. More > than likely it could have binary in an archive extracted to, maybe, > /opt/zookeeper. > > If you look at the running zookeeper process it should give you an idea of > where zookeeper is installed and where the data directory is: > > ps -ef |grep zookeeper > > How zookeeper is starting is dependent on which version of Centos you are > running. Centos 6 uses upstart and service command. More than likely you > will find the zookeeper init script in /etc/init.d. If this is Centos 7 > then it's systemd. As root you can run systemctl by itself to get a list of > service scripts. Hit the "/" key and type in zookeeper. It will take you to > any service script with zookeeper in the name. This will help you determine > how to stop zookeeper. > > If neither systemd is showing a zookeeper service nor you see a service > script in /etc/init.d (or if service zookeeper stop doesn't work), then it > would appear that zookeeper was started in some other way, maybe manually > without a service or systemd script. > > You'll want to figure this out because if you have to manually remove > zookeeper, instead of using a package manager like RPM, you'll want to > disable any startup scripts from running and throwing errors once Zookeeper > is removed. > > On 5/8/18, 10:32 AM, "Steph van Schalkwyk" > wrote: > > Find where it is installed - typically /opt/zookeeper. > Also do a which zookeeper to see if it is linked to /usr/bin or some > such > place. > Make sure zookeeper is stopped. > Far as I recall, Centos has Upstart, so sudo stop zookeeper and sudo > disable zookeeper. Or sudo systemctl stop zookeeper and sudo systemctl > disable zookeeper. > Then cat the /opt/zookeeper/conf/zoo.cfg to see where the data > directories > and logs are. Delete the data and log directories. > Then delete /opt/zookeeper. > Steph > > > > On Tue, May 8, 2018 at 9:07 AM, Steve Pruitt > wrote: > > > Hi, > > > > I need to remove ZooKeeper from a Centos machine. I tried yum > remove to > > no avail using instructions I found online. > > > > Thanks. > > > > -S > > > > > > >
Does 3.4.11 supports Reconfig feature ??
Hi, Could anyone please clarify if 3.4.11 release supports reconfig feature.
Getting Authentication Not valid while running reconfig Command
I am connecting from ./zkCli.sh and trying to add an server to zookeeper ensemble I see i am authenticated on prompt 2018-03-01 11:21:41,716 [myid:localhost:2181] - INFO [main-SendThread(localhost:2181):ZooKeeperSaslClient@274] - Client will use DIGEST-MD5 as SASL mechanism. 2018-03-01 11:21:41,770 [myid:localhost:2181] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1113] - Opening socket connection to server localhost/127.0.0.1:2181. Will attempt to SASL-authenticate using Login Context section 'Client' WatchedEvent state:SaslAuthenticated type:None path:null Even Set ACL doesnt work [zk: localhost:2181(CONNECTED) 1] setAcl /zookeeper/config world:anyone:cdrwa Authentication is not valid : /zookeeper/config same issue happens with "reconfig" command as well. I am using zookeeper-3.5.3-beta release Appreciate your quick response. Thanks harish