[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han updated ZOOKEEPER-2202: --- Fix Version/s: (was: 3.5.3) 3.5.4 > Cluster crashes when reconfig adds an unreachable observer > -- > > Key: ZOOKEEPER-2202 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0, 3.6.0 >Reporter: Raul Gutierrez Segales >Assignee: Raul Gutierrez Segales > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2202.patch > > > While adding support for reconfig() in Kazoo > (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be > crashed if you add an observer whose election port isn't reachable (i.e.: > packets for that destination are dropped, not rejected). This will raise a > SocketTimeoutException which will bring down the PrepRequestProcessor: > {code} > 2015-06-02 14:37:16,473 [myid:3] - WARN [ProcessThread(sid:3 > cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election > address /8.8.8.8:38703 > java.net.SocketTimeoutException: connect timed out > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369) > at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315) > at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143) > {code} > A simple repro can be obtained by using the code in the referenced pull > request above and using 8.8.8.8:3888 (for example) instead of a free (but > closed) port in the loopback. > I think that adding an Observer (or a Participant) that isn't currently > reachable is a valid use case (i.e.: you are provisioning the machine and > it's not currently needed) so I think we could handle this with lower connect > timeouts, not sure. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Nauroth updated ZOOKEEPER-2202: - Fix Version/s: (was: 3.5.2) 3.5.3 > Cluster crashes when reconfig adds an unreachable observer > -- > > Key: ZOOKEEPER-2202 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0, 3.6.0 >Reporter: Raul Gutierrez Segales >Assignee: Raul Gutierrez Segales > Fix For: 3.6.0, 3.5.3 > > Attachments: ZOOKEEPER-2202.patch > > > While adding support for reconfig() in Kazoo > (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be > crashed if you add an observer whose election port isn't reachable (i.e.: > packets for that destination are dropped, not rejected). This will raise a > SocketTimeoutException which will bring down the PrepRequestProcessor: > {code} > 2015-06-02 14:37:16,473 [myid:3] - WARN [ProcessThread(sid:3 > cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election > address /8.8.8.8:38703 > java.net.SocketTimeoutException: connect timed out > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369) > at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315) > at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143) > {code} > A simple repro can be obtained by using the code in the referenced pull > request above and using 8.8.8.8:3888 (for example) instead of a free (but > closed) port in the loopback. > I think that adding an Observer (or a Participant) that isn't currently > reachable is a valid use case (i.e.: you are provisioning the machine and > it's not currently needed) so I think we could handle this with lower connect > timeouts, not sure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt updated ZOOKEEPER-2202: Assignee: Raul Gutierrez Segales > Cluster crashes when reconfig adds an unreachable observer > -- > > Key: ZOOKEEPER-2202 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0, 3.6.0 >Reporter: Raul Gutierrez Segales >Assignee: Raul Gutierrez Segales > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2202.patch > > > While adding support for reconfig() in Kazoo > (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be > crashed if you add an observer whose election port isn't reachable (i.e.: > packets for that destination are dropped, not rejected). This will raise a > SocketTimeoutException which will bring down the PrepRequestProcessor: > {code} > 2015-06-02 14:37:16,473 [myid:3] - WARN [ProcessThread(sid:3 > cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election > address /8.8.8.8:38703 > java.net.SocketTimeoutException: connect timed out > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369) > at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315) > at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143) > {code} > A simple repro can be obtained by using the code in the referenced pull > request above and using 8.8.8.8:3888 (for example) instead of a free (but > closed) port in the loopback. > I think that adding an Observer (or a Participant) that isn't currently > reachable is a valid use case (i.e.: you are provisioning the machine and > it's not currently needed) so I think we could handle this with lower connect > timeouts, not sure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-2202: -- Attachment: ZOOKEEPER-2202.patch [~shralex]: does this make sense to you? I'll add a test a bit later today. Thanks! cc: [~cnauroth], [~hdeng] > Cluster crashes when reconfig adds an unreachable observer > -- > > Key: ZOOKEEPER-2202 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202 > Project: ZooKeeper > Issue Type: Bug >Affects Versions: 3.5.0, 3.6.0 >Reporter: Raul Gutierrez Segales > Fix For: 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-2202.patch > > > While adding support for reconfig() in Kazoo > (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be > crashed if you add an observer whose election port isn't reachable (i.e.: > packets for that destination are dropped, not rejected). This will raise a > SocketTimeoutException which will bring down the PrepRequestProcessor: > {code} > 2015-06-02 14:37:16,473 [myid:3] - WARN [ProcessThread(sid:3 > cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election > address /8.8.8.8:38703 > java.net.SocketTimeoutException: connect timed out > at java.net.PlainSocketImpl.socketConnect(Native Method) > at > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) > at > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) > at > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) > at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > at java.net.Socket.connect(Socket.java:589) > at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369) > at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288) > at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315) > at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056) > at > org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78) > at > org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877) > at > org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143) > {code} > A simple repro can be obtained by using the code in the referenced pull > request above and using 8.8.8.8:3888 (for example) instead of a free (but > closed) port in the loopback. > I think that adding an Observer (or a Participant) that isn't currently > reachable is a valid use case (i.e.: you are provisioning the machine and > it's not currently needed) so I think we could handle this with lower connect > timeouts, not sure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michi Mutsuzaki updated ZOOKEEPER-2202: --- Fix Version/s: (was: 3.5.1) 3.5.2 Cluster crashes when reconfig adds an unreachable observer -- Key: ZOOKEEPER-2202 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0, 3.6.0 Reporter: Raul Gutierrez Segales Fix For: 3.5.2, 3.6.0 While adding support for reconfig() in Kazoo (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be crashed if you add an observer whose election port isn't reachable (i.e.: packets for that destination are dropped, not rejected). This will raise a SocketTimeoutException which will bring down the PrepRequestProcessor: {code} 2015-06-02 14:37:16,473 [myid:3] - WARN [ProcessThread(sid:3 cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election address /8.8.8.8:38703 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369) at org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288) at org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315) at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056) at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78) at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143) {code} A simple repro can be obtained by using the code in the referenced pull request above and using 8.8.8.8:3888 (for example) instead of a free (but closed) port in the loopback. I think that adding an Observer (or a Participant) that isn't currently reachable is a valid use case (i.e.: you are provisioning the machine and it's not currently needed) so I think we could handle this with lower connect timeouts, not sure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (ZOOKEEPER-2202) Cluster crashes when reconfig adds an unreachable observer
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raul Gutierrez Segales updated ZOOKEEPER-2202: -- Summary: Cluster crashes when reconfig adds an unreachable observer (was: Cluster crashes when reconfig adds an unreaachable observer) Cluster crashes when reconfig adds an unreachable observer -- Key: ZOOKEEPER-2202 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2202 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.5.0, 3.6.0 Reporter: Raul Gutierrez Segales Fix For: 3.5.1, 3.6.0 While adding support for reconfig() in Kazoo (https://github.com/python-zk/kazoo/pull/333) I found that the cluster can be crashed if you add an observer whose election port isn't reachable (i.e.: packets for that destination are dropped, not rejected). This will raise a SocketTimeoutException which will bring down the PrepRequestProcessor: {code} 2015-06-02 14:37:16,473 [myid:3] - WARN [ProcessThread(sid:3 cport:-1)::QuorumCnxManager@384] - Cannot open channel to 100 at election address /8.8.8.8:38703 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:369) at org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1288) at org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1315) at org.apache.zookeeper.server.quorum.Leader.propose(Leader.java:1056) at org.apache.zookeeper.server.quorum.ProposalRequestProcessor.processRequest(ProposalRequestProcessor.java:78) at org.apache.zookeeper.server.PrepRequestProcessor.pRequest(PrepRequestProcessor.java:877) at org.apache.zookeeper.server.PrepRequestProcessor.run(PrepRequestProcessor.java:143) {code} A simple repro can be obtained by using the code in the referenced pull request above and using 8.8.8.8:3888 (for example) instead of a free (but closed) port in the loopback. I think that adding an Observer (or a Participant) that isn't currently reachable is a valid use case (i.e.: you are provisioning the machine and it's not currently needed) so I think we could handle this with lower connect timeouts, not sure. -- This message was sent by Atlassian JIRA (v6.3.4#6332)