[jira] [Comment Edited] (ZOOKEEPER-3815) Support a new comprehensive parent znode watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098121#comment-17098121 ] Jordan Zimmerman edited comment on ZOOKEEPER-3815 at 5/2/20, 10:39 PM: --- How is this different from ZOOKEEPER-1416? was (Author: randgalt): How is this different than ZOOKEEPER-1416? > Support a new comprehensive parent znode watcher > > > Key: ZOOKEEPER-3815 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3815 > Project: ZooKeeper > Issue Type: New Feature >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > > When a client registers this new watcher(for time being lets call it > comprehensive parent znode watcher, we can give better name later.) on a > parent znode then > # Client should be notified on following events > ## When a child is added > ## When a child is deleted > ## When a child is updated > ## When parent is deleted > # Client should be notified with znode data, This should be optional. There > are many scenarios where znode data is always required. This can avoid > unnecessary RPC calls. > # If Client keeps all child znode data in memory, there should be way to > check whether client data is consistent with Zookeeper server data. This is > to ensure that no notification is lost. > # This watcher should be persistent watcher, not one time watcher -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3815) Support a new comprehensive parent znode watcher
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098121#comment-17098121 ] Jordan Zimmerman commented on ZOOKEEPER-3815: - How is this different than ZOOKEEPER-1416? > Support a new comprehensive parent znode watcher > > > Key: ZOOKEEPER-3815 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3815 > Project: ZooKeeper > Issue Type: New Feature >Reporter: Mohammad Arshad >Assignee: Mohammad Arshad >Priority: Major > > When a client registers this new watcher(for time being lets call it > comprehensive parent znode watcher, we can give better name later.) on a > parent znode then > # Client should be notified on following events > ## When a child is added > ## When a child is deleted > ## When a child is updated > ## When parent is deleted > # Client should be notified with znode data, This should be optional. There > are many scenarios where znode data is always required. This can avoid > unnecessary RPC calls. > # If Client keeps all child znode data in memory, there should be way to > check whether client data is consistent with Zookeeper server data. This is > to ensure that no notification is lost. > # This watcher should be persistent watcher, not one time watcher -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ZOOKEEPER-3815) Support a new comprehensive parent znode watcher
Mohammad Arshad created ZOOKEEPER-3815: -- Summary: Support a new comprehensive parent znode watcher Key: ZOOKEEPER-3815 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3815 Project: ZooKeeper Issue Type: New Feature Reporter: Mohammad Arshad Assignee: Mohammad Arshad When a client registers this new watcher(for time being lets call it comprehensive parent znode watcher, we can give better name later.) on a parent znode then # Client should be notified on following events ## When a child is added ## When a child is deleted ## When a child is updated ## When parent is deleted # Client should be notified with znode data, This should be optional. There are many scenarios where znode data is always required. This can avoid unnecessary RPC calls. # If Client keeps all child znode data in memory, there should be way to check whether client data is consistent with Zookeeper server data. This is to ensure that no notification is lost. # This watcher should be persistent watcher, not one time watcher -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097875#comment-17097875 ] Rajkiran Sura commented on ZOOKEEPER-3814: -- FTR: We haven't enabled dynamic reconfig at all. > ZooKeeper caching of config > --- > > Key: ZOOKEEPER-3814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum, server >Affects Versions: 3.5.6 >Reporter: Rajkiran Sura >Priority: Major > > Hello, > We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. > Encountered no issues as such. > This is how the ZooKeeper config looks like: > {quote}tickTime=2000 > dataDir=/zookeeper-data/ > initLimit=5 > syncLimit=2 > maxClientCnxns=2048 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > 4lw.commands.whitelist=stat, ruok, conf, isro, mntr > authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider > requireClientAuthScheme=sasl > quorum.cnxn.threads.size=20 > quorum.auth.enableSasl=true > quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST > quorum.auth.learnerRequireSasl=true > quorum.auth.learner.saslLoginContext=QuorumLearner > quorum.auth.serverRequireSasl=true > quorum.auth.server.saslLoginContext=QuorumServer > server.17=node1.foo.bar.com:2888:3888;2181 > server.19=node2.foo.bar.com:2888:3888;2181 > server.20=node3.foo.bar.com:2888:3888;2181 > server.21=node4.foo.bar.com:2888:3888;2181 > server.22=node5.bar.com:2888:3888;2181 > {quote} > Post upgrade, we had to migrate server.22 on the same node, but with > *FOO*.bar.com domain name due to kerberos referral issues. And, we used > different server-identifier, i.e., *23* when we migrated. So, here is how the > new config looked like: > {quote}server.17=node1.foo.bar.com:2888:3888;2181 > server.19=node2.foo.bar.com:2888:3888;2181 > server.20=node3.foo.bar.com:2888:3888;2181 > server.21=node4.foo.bar.com:2888:3888;2181 > *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181* > {quote} > We restarted all the nodes in the ensemble with the above updated config. And > the migrated node joined the quorum successfully and was serving all clients > directly connected to it, without any issues. > Recently, when a leader election happened, > server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has > highest ID). But then, ZooKeeper was unable to serve any clients and *all* > the servers were _somehow still_ trying to establish a channel to 22 (old DNS > name: node5.bar.com) and were throwing below error in a loop: > {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN > [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve > address: node4.bar.com}} > {{java.net.UnknownHostException: node5.bar.com: Name or service not known}} > {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}} > {{ at > java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}} > {{ at > java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}} > {{ at > java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}} > {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}} > {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}} > {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}} > {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}} > {{ at > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}} > {{ at > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}} > {{ at java.base/java.lang.Thread.run(Thread.java:834)}} > {{2020-05-02 01:43:03,026 [myid:23] - WARN > [WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at > election address node5.bar.com:3888}} > {{java.net.UnknownHostException: node5.bar.com}} > {{ at > java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}} > {{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}} > {{ at java.base/java.net.Socket.connect(Socket.java:591)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)}} > {{ at >
[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097874#comment-17097874 ] Rajkiran Sura commented on ZOOKEEPER-3814: -- Latest observation, we noticed that ZooKeeper was complaining about dynamic.next file, event though we HAVE NOT ENABLED dynamic-reconfiguration. {quote}2020-05-02 01:43:05,870 [myid:21] - ERROR [QuorumPeer[myid=21](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1637] - Error writing next dynamic config file to disk: {quote} And zookeeper user did not have perms to that config directory, so we fixed that restarted zookeeper. And then it dumped below dynamic.next, which contains the OLD migrated node as a member :O {quote}$ sudo cat /opt/zookeeper/conf/zoo.cfg.dynamic.next server.17=node1.foo.bar.com:2888:3888:participant;0.0.0.0:2181 server.19=node2.foo.bar.com:2888:3888:participant;0.0.0.0:2181 server.20=node3.foo.bar.com:2888:3888:participant;0.0.0.0:2181 server.21=node4.foo.bar.com:2888:3888:participant;0.0.0.0:2181 *server.{color:#de350b}22=node5.bar.com{color}*:2888:3888:participant;0.0.0.0:2181 {quote} So, this looks like a bug. And from where is it still fetching this? How do we fix it. Any lead/help is very much appreciated. Thanks in advance, Rajkiran > ZooKeeper caching of config > --- > > Key: ZOOKEEPER-3814 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection, quorum, server >Affects Versions: 3.5.6 >Reporter: Rajkiran Sura >Priority: Major > > Hello, > We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. > Encountered no issues as such. > This is how the ZooKeeper config looks like: > {quote}tickTime=2000 > dataDir=/zookeeper-data/ > initLimit=5 > syncLimit=2 > maxClientCnxns=2048 > autopurge.snapRetainCount=3 > autopurge.purgeInterval=1 > 4lw.commands.whitelist=stat, ruok, conf, isro, mntr > authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider > requireClientAuthScheme=sasl > quorum.cnxn.threads.size=20 > quorum.auth.enableSasl=true > quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST > quorum.auth.learnerRequireSasl=true > quorum.auth.learner.saslLoginContext=QuorumLearner > quorum.auth.serverRequireSasl=true > quorum.auth.server.saslLoginContext=QuorumServer > server.17=node1.foo.bar.com:2888:3888;2181 > server.19=node2.foo.bar.com:2888:3888;2181 > server.20=node3.foo.bar.com:2888:3888;2181 > server.21=node4.foo.bar.com:2888:3888;2181 > server.22=node5.bar.com:2888:3888;2181 > {quote} > Post upgrade, we had to migrate server.22 on the same node, but with > *FOO*.bar.com domain name due to kerberos referral issues. And, we used > different server-identifier, i.e., *23* when we migrated. So, here is how the > new config looked like: > {quote}server.17=node1.foo.bar.com:2888:3888;2181 > server.19=node2.foo.bar.com:2888:3888;2181 > server.20=node3.foo.bar.com:2888:3888;2181 > server.21=node4.foo.bar.com:2888:3888;2181 > *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181* > {quote} > We restarted all the nodes in the ensemble with the above updated config. And > the migrated node joined the quorum successfully and was serving all clients > directly connected to it, without any issues. > Recently, when a leader election happened, > server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has > highest ID). But then, ZooKeeper was unable to serve any clients and *all* > the servers were _somehow still_ trying to establish a channel to 22 (old DNS > name: node5.bar.com) and were throwing below error in a loop: > {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN > [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve > address: node4.bar.com}} > {{java.net.UnknownHostException: node5.bar.com: Name or service not known}} > {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}} > {{ at > java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}} > {{ at > java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}} > {{ at > java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}} > {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}} > {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}} > {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}} > {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}} > {{ at > org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}} > {{ at >
[jira] [Created] (ZOOKEEPER-3814) ZooKeeper caching of config
Rajkiran Sura created ZOOKEEPER-3814: Summary: ZooKeeper caching of config Key: ZOOKEEPER-3814 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814 Project: ZooKeeper Issue Type: Bug Components: leaderElection, quorum, server Affects Versions: 3.5.6 Reporter: Rajkiran Sura Hello, We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. Encountered no issues as such. This is how the ZooKeeper config looks like: {quote}tickTime=2000 dataDir=/zookeeper-data/ initLimit=5 syncLimit=2 maxClientCnxns=2048 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 4lw.commands.whitelist=stat, ruok, conf, isro, mntr authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider requireClientAuthScheme=sasl quorum.cnxn.threads.size=20 quorum.auth.enableSasl=true quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST quorum.auth.learnerRequireSasl=true quorum.auth.learner.saslLoginContext=QuorumLearner quorum.auth.serverRequireSasl=true quorum.auth.server.saslLoginContext=QuorumServer server.17=node1.foo.bar.com:2888:3888;2181 server.19=node2.foo.bar.com:2888:3888;2181 server.20=node3.foo.bar.com:2888:3888;2181 server.21=node4.foo.bar.com:2888:3888;2181 server.22=node5.bar.com:2888:3888;2181 {quote} Post upgrade, we had to migrate server.22 on the same node, but with *FOO*.bar.com domain name due to kerberos referral issues. And, we used different server-identifier, i.e., *23* when we migrated. So, here is how the new config looked like: {quote}server.17=node1.foo.bar.com:2888:3888;2181 server.19=node2.foo.bar.com:2888:3888;2181 server.20=node3.foo.bar.com:2888:3888;2181 server.21=node4.foo.bar.com:2888:3888;2181 *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181* {quote} We restarted all the nodes in the ensemble with the above updated config. And the migrated node joined the quorum successfully and was serving all clients directly connected to it, without any issues. Recently, when a leader election happened, server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has highest ID). But then, ZooKeeper was unable to serve any clients and *all* the servers were _somehow still_ trying to establish a channel to 22 (old DNS name: node5.bar.com) and were throwing below error in a loop: {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve address: node4.bar.com}} {{java.net.UnknownHostException: node5.bar.com: Name or service not known}} {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}} {{ at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}} {{ at java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}} {{ at java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}} {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}} {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}} {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}} {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}} {{ at org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}} {{ at org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}} {{ at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}} {{ at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}} {{ at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}} {{ at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}} {{ at java.base/java.lang.Thread.run(Thread.java:834)}} {{2020-05-02 01:43:03,026 [myid:23] - WARN [WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at election address node5.bar.com:3888}} {{java.net.UnknownHostException: node5.bar.com}} {{ at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}} {{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}} {{ at java.base/java.net.Socket.connect(Socket.java:591)}} {{ at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)}} {{ at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:714)}} {{ at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}} {{ at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}} {{ at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}} {{ at