[jira] [Comment Edited] (ZOOKEEPER-3815) Support a new comprehensive parent znode watcher

2020-05-02 Thread Jordan Zimmerman (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098121#comment-17098121
 ] 

Jordan Zimmerman edited comment on ZOOKEEPER-3815 at 5/2/20, 10:39 PM:
---

How is this different from ZOOKEEPER-1416?


was (Author: randgalt):
How is this different than ZOOKEEPER-1416?

> Support a new comprehensive parent znode watcher
> 
>
> Key: ZOOKEEPER-3815
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3815
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>
> When a client registers this new watcher(for time being lets call it 
> comprehensive parent znode watcher, we can give better name later.) on a 
> parent znode then
> # Client should be notified on following events
>  ## When a child is added
>  ## When a child is deleted
>  ## When a child is updated
>  ## When parent is deleted
> # Client should be notified with znode data, This should be optional. There 
> are many scenarios where znode data is always required. This can avoid 
> unnecessary RPC calls.
>  # If Client keeps all child znode data in memory, there should be way to 
> check whether client data is consistent with Zookeeper server data. This is 
> to ensure that no notification is lost.
>  # This watcher should be persistent watcher, not one time watcher



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3815) Support a new comprehensive parent znode watcher

2020-05-02 Thread Jordan Zimmerman (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098121#comment-17098121
 ] 

Jordan Zimmerman commented on ZOOKEEPER-3815:
-

How is this different than ZOOKEEPER-1416?

> Support a new comprehensive parent znode watcher
> 
>
> Key: ZOOKEEPER-3815
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3815
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Mohammad Arshad
>Assignee: Mohammad Arshad
>Priority: Major
>
> When a client registers this new watcher(for time being lets call it 
> comprehensive parent znode watcher, we can give better name later.) on a 
> parent znode then
> # Client should be notified on following events
>  ## When a child is added
>  ## When a child is deleted
>  ## When a child is updated
>  ## When parent is deleted
> # Client should be notified with znode data, This should be optional. There 
> are many scenarios where znode data is always required. This can avoid 
> unnecessary RPC calls.
>  # If Client keeps all child znode data in memory, there should be way to 
> check whether client data is consistent with Zookeeper server data. This is 
> to ensure that no notification is lost.
>  # This watcher should be persistent watcher, not one time watcher



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3815) Support a new comprehensive parent znode watcher

2020-05-02 Thread Mohammad Arshad (Jira)
Mohammad Arshad created ZOOKEEPER-3815:
--

 Summary: Support a new comprehensive parent znode watcher
 Key: ZOOKEEPER-3815
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3815
 Project: ZooKeeper
  Issue Type: New Feature
Reporter: Mohammad Arshad
Assignee: Mohammad Arshad


When a client registers this new watcher(for time being lets call it 
comprehensive parent znode watcher, we can give better name later.) on a parent 
znode then
# Client should be notified on following events
 ## When a child is added
 ## When a child is deleted
 ## When a child is updated
 ## When parent is deleted
# Client should be notified with znode data, This should be optional. There are 
many scenarios where znode data is always required. This can avoid unnecessary 
RPC calls.
 # If Client keeps all child znode data in memory, there should be way to check 
whether client data is consistent with Zookeeper server data. This is to ensure 
that no notification is lost.
 # This watcher should be persistent watcher, not one time watcher



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-02 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097875#comment-17097875
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

FTR: We haven't enabled dynamic reconfig at all.

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
> {{ at 
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
> {{ at java.base/java.lang.Thread.run(Thread.java:834)}}
> {{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at 
> election address node5.bar.com:3888}}
> {{java.net.UnknownHostException: node5.bar.com}}
> {{ at 
> java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}}
> {{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}}
> {{ at java.base/java.net.Socket.connect(Socket.java:591)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)}}
> {{ at 
> 

[jira] [Commented] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-02 Thread Rajkiran Sura (Jira)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097874#comment-17097874
 ] 

Rajkiran Sura commented on ZOOKEEPER-3814:
--

Latest observation, we noticed that ZooKeeper was complaining about 
dynamic.next file, event though we HAVE NOT ENABLED dynamic-reconfiguration.
{quote}2020-05-02 01:43:05,870 [myid:21] - ERROR 
[QuorumPeer[myid=21](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1637] - 
Error writing next dynamic config file to disk:
{quote}
And zookeeper user did not have perms to that config directory, so we fixed 
that restarted zookeeper. And then it dumped below dynamic.next, which contains 
the OLD migrated node as a member :O
{quote}$ sudo cat /opt/zookeeper/conf/zoo.cfg.dynamic.next
server.17=node1.foo.bar.com:2888:3888:participant;0.0.0.0:2181
server.19=node2.foo.bar.com:2888:3888:participant;0.0.0.0:2181
server.20=node3.foo.bar.com:2888:3888:participant;0.0.0.0:2181
server.21=node4.foo.bar.com:2888:3888:participant;0.0.0.0:2181
*server.{color:#de350b}22=node5.bar.com{color}*:2888:3888:participant;0.0.0.0:2181
{quote}
So, this looks like a bug. And from where is it still fetching this? How do we 
fix it.

Any lead/help is very much appreciated.

 

Thanks in advance, 

Rajkiran

> ZooKeeper caching of config
> ---
>
> Key: ZOOKEEPER-3814
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, quorum, server
>Affects Versions: 3.5.6
>Reporter: Rajkiran Sura
>Priority: Major
>
> Hello,
> We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
> Encountered no issues as such.
> This is how the ZooKeeper config looks like:
> {quote}tickTime=2000
> dataDir=/zookeeper-data/
> initLimit=5
> syncLimit=2
> maxClientCnxns=2048
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
> 4lw.commands.whitelist=stat, ruok, conf, isro, mntr
> authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
> requireClientAuthScheme=sasl
> quorum.cnxn.threads.size=20
> quorum.auth.enableSasl=true
> quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
> quorum.auth.learnerRequireSasl=true
> quorum.auth.learner.saslLoginContext=QuorumLearner
> quorum.auth.serverRequireSasl=true
> quorum.auth.server.saslLoginContext=QuorumServer
> server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> server.22=node5.bar.com:2888:3888;2181
> {quote}
> Post upgrade, we had to migrate server.22 on the same node, but with 
> *FOO*.bar.com domain name due to kerberos referral issues. And, we used 
> different server-identifier, i.e., *23* when we migrated. So, here is how the 
> new config looked like:
> {quote}server.17=node1.foo.bar.com:2888:3888;2181
> server.19=node2.foo.bar.com:2888:3888;2181
> server.20=node3.foo.bar.com:2888:3888;2181
> server.21=node4.foo.bar.com:2888:3888;2181
> *server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
> {quote}
> We restarted all the nodes in the ensemble with the above updated config. And 
> the migrated node joined the quorum successfully and was serving all clients 
> directly connected to it, without any issues.
> Recently, when a leader election happened, 
> server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
> highest ID). But then, ZooKeeper was unable to serve any clients and *all* 
> the servers were _somehow still_ trying to establish a channel to 22 (old DNS 
> name: node5.bar.com) and were throwing below error in a loop:
> {quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
> [WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
> address: node4.bar.com}}
> {{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
> {{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
> {{ at 
> java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
> {{ at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
> {{ at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
> {{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
> {{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
> {{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
> {{ at 
> org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
> {{ at 
> 

[jira] [Created] (ZOOKEEPER-3814) ZooKeeper caching of config

2020-05-02 Thread Rajkiran Sura (Jira)
Rajkiran Sura created ZOOKEEPER-3814:


 Summary: ZooKeeper caching of config
 Key: ZOOKEEPER-3814
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3814
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.6
Reporter: Rajkiran Sura


Hello,

We recently upgraded our 5 node ZooKeeper ensemble from v3.4.8 to v3.5.6. 
Encountered no issues as such.

This is how the ZooKeeper config looks like:
{quote}tickTime=2000
dataDir=/zookeeper-data/
initLimit=5
syncLimit=2
maxClientCnxns=2048
autopurge.snapRetainCount=3
autopurge.purgeInterval=1
4lw.commands.whitelist=stat, ruok, conf, isro, mntr
authProvider.1=org.apache.zookeeper.server.auth.SASLAuthenticationProvider
requireClientAuthScheme=sasl
quorum.cnxn.threads.size=20
quorum.auth.enableSasl=true
quorum.auth.kerberos.servicePrincipal= zookeeper/_HOST
quorum.auth.learnerRequireSasl=true
quorum.auth.learner.saslLoginContext=QuorumLearner
quorum.auth.serverRequireSasl=true
quorum.auth.server.saslLoginContext=QuorumServer
server.17=node1.foo.bar.com:2888:3888;2181
server.19=node2.foo.bar.com:2888:3888;2181
server.20=node3.foo.bar.com:2888:3888;2181
server.21=node4.foo.bar.com:2888:3888;2181
server.22=node5.bar.com:2888:3888;2181
{quote}
Post upgrade, we had to migrate server.22 on the same node, but with 
*FOO*.bar.com domain name due to kerberos referral issues. And, we used 
different server-identifier, i.e., *23* when we migrated. So, here is how the 
new config looked like:
{quote}server.17=node1.foo.bar.com:2888:3888;2181
server.19=node2.foo.bar.com:2888:3888;2181
server.20=node3.foo.bar.com:2888:3888;2181
server.21=node4.foo.bar.com:2888:3888;2181
*server.23=node5.{color:#00875a}foo{color}.bar.com:2888:3888;2181*
{quote}
We restarted all the nodes in the ensemble with the above updated config. And 
the migrated node joined the quorum successfully and was serving all clients 
directly connected to it, without any issues.

Recently, when a leader election happened, 
server.*23*=node5.foo.bar.com(migrated node) was chosen as Leader (as it has 
highest ID). But then, ZooKeeper was unable to serve any clients and *all* the 
servers were _somehow still_ trying to establish a channel to 22 (old DNS name: 
node5.bar.com) and were throwing below error in a loop:
{quote}{{2020-05-02 01:43:03,026 [myid:23] - WARN 
[WorkerSender[myid=23]:QuorumPeer$QuorumServer@196] - Failed to resolve 
address: node4.bar.com}}
{{java.net.UnknownHostException: node5.bar.com: Name or service not known}}
{{ at java.base/java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)}}
{{ at 
java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(InetAddress.java:929)}}
{{ at 
java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1515)}}
{{ at 
java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:848)}}
{{ at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1505)}}
{{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1364)}}
{{ at java.base/java.net.InetAddress.getAllByName(InetAddress.java:1298)}}
{{ at java.base/java.net.InetAddress.getByName(InetAddress.java:1248)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer.recreateSocketAddresses(QuorumPeer.java:194)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumPeer.recreateSocketAddresses(QuorumPeer.java:774)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:701)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
{{ at java.base/java.lang.Thread.run(Thread.java:834)}}
{{2020-05-02 01:43:03,026 [myid:23] - WARN 
[WorkerSender[myid=23]:QuorumCnxManager@679] - Cannot open channel to 22 at 
election address node5.bar.com:3888}}
{{java.net.UnknownHostException: node5.bar.com}}
{{ at 
java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:220)}}
{{ at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)}}
{{ at java.base/java.net.Socket.connect(Socket.java:591)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:714)}}
{{ at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)}}
{{ at 
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)}}
{{ at