[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-23 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r317189846
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java
 ##
 @@ -417,71 +427,111 @@ public boolean isQuorumSynced(QuorumVerifier qv) {
 protected final Proposal newLeaderProposal = new Proposal();
 
 class LearnerCnxAcceptor extends ZooKeeperCriticalThread {
-private volatile boolean stop = false;
+private final AtomicBoolean stop = new AtomicBoolean(false);
+private final AtomicBoolean fail = new AtomicBoolean(false);
 
-public LearnerCnxAcceptor() {
-super("LearnerCnxAcceptor-" + ss.getLocalSocketAddress(), zk
-.getZooKeeperServerListener());
+LearnerCnxAcceptor() {
+super("LearnerCnxAcceptor-" + serverSockets.stream()
+  .map(ServerSocket::getLocalSocketAddress)
+  .map(Objects::toString)
+  .collect(Collectors.joining(",")),
+  zk.getZooKeeperServerListener());
 }
 
 @Override
 public void run() {
-try {
-while (!stop) {
-Socket s = null;
-boolean error = false;
-try {
-s = ss.accept();
-
-// start with the initLimit, once the ack is processed
-// in LearnerHandler switch to the syncLimit
-s.setSoTimeout(self.tickTime * self.initLimit);
-s.setTcpNoDelay(nodelay);
-
-BufferedInputStream is = new BufferedInputStream(
-s.getInputStream());
-LearnerHandler fh = new LearnerHandler(s, is,
-Leader.this);
-fh.start();
-} catch (SocketException e) {
-error = true;
-if (stop) {
-LOG.info("exception while shutting down acceptor: "
-+ e);
-
-// When Leader.shutdown() calls ss.close(),
-// the call to accept throws an exception.
-// We catch and set stop to true.
-stop = true;
-} else {
-throw e;
-}
-} catch (SaslException e){
-LOG.error("Exception while connecting to quorum 
learner", e);
-error = true;
-} catch (Exception e) {
-error = true;
+if (!stop.get() && !serverSockets.isEmpty()) {
+ExecutorService executor = 
Executors.newFixedThreadPool(serverSockets.size());
+CountDownLatch latch = new 
CountDownLatch(serverSockets.size());
+
+serverSockets.forEach(serverSocket ->
+executor.submit(new 
LearnerCnxAcceptorHandler(serverSocket, latch)));
+
+try {
+latch.await();
 
 Review comment:
   The code basically now starting listener threads on all local addresses, 
then waiting until all the listeners are dead, and then simply start over again 
(assuming no stop was requested by Leader.shutdown() or no unexpected failure 
happened in the Listener threads). 
   
   I don't think we need a timeout here... ideally we want to wait forever :)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-23 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r317167254
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ObserverBean.java
 ##
 @@ -49,10 +50,11 @@ public String getQuorumAddress() {
 
 public String getLearnerMaster() {
 QuorumPeer.QuorumServer learnerMaster = 
observer.getCurrentLearnerMaster();
-if (learnerMaster == null || learnerMaster.addr == null) {
+InetSocketAddress address = learnerMaster.addr.getReachableOrOne();
+if (learnerMaster == null || address == null) {
 
 Review comment:
   thanks, nice catch!
   (actually beside this, we also got a NoSuchElementException if 
learnerMaster.addr is empty)
   I will fix this in the next commit


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313774702
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java
 ##
 @@ -657,11 +647,31 @@ public CommandResponse run(ZooKeeperServer zkServer, 
Map kwargs)
 TreeMap::new));
 }
 
+private String getMultiAddressString(QuorumPeer.QuorumServer qs) {
+return qs.addr.getAllAddresses().stream()
+.map(address -> getSingleAddressString(qs, address))
+.collect(Collectors.joining(","));
+}
+
+private String getSingleAddressString(QuorumPeer.QuorumServer qs, 
InetSocketAddress address) {
 
 Review comment:
   I changed the format, this is how the `voting_view` admin command responds 
now:
   ```
   {
 "current_config" : {
   "1" : {
 "server_addresses" : [ "/172.16.101.11:2888", "/172.16.102.11:2888" ],
 "election_addresses" : [ "/172.16.101.11:3888", "/172.16.102.11:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   },
   "2" : {
 "server_addresses" : [ "/172.16.101.22:2888", "/172.16.102.22:2888" ],
 "election_addresses" : [ "/172.16.101.22:3888", "/172.16.102.22:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   },
   "3" : {
 "server_addresses" : [ "/172.16.101.33:2888", "/172.16.102.33:2888" ],
 "election_addresses" : [ "/172.16.101.33:3888", "/172.16.102.33:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   }
 },
 "command" : "voting_view",
 "error" : null
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313738065
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java
 ##
 @@ -657,11 +647,31 @@ public CommandResponse run(ZooKeeperServer zkServer, 
Map kwargs)
 TreeMap::new));
 }
 
+private String getMultiAddressString(QuorumPeer.QuorumServer qs) {
+return qs.addr.getAllAddresses().stream()
+.map(address -> getSingleAddressString(qs, address))
+.collect(Collectors.joining(","));
+}
+
+private String getSingleAddressString(QuorumPeer.QuorumServer qs, 
InetSocketAddress address) {
 
 Review comment:
   Yes, I agree, I will change it. 
   
   This admin command is about to show the internal view of the voting members 
(how the zookeeper server thinks who the voting members are and where do they 
listen). I wouldn't complicate this PR any further, but it might be a good idea 
to create a follow-up ticket to have some admin command showing if the given 
server can actually reach all the different ports of the other servers (and not 
only the voting members). It can help debugging network problems, showing if 
certain network interfaces on some servers are unreachable. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313820217
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumCnxManager.java
 ##
 @@ -228,28 +240,33 @@ static public InitialMessage parse(Long protocolVersion, 
DataInputStream din)
 num_read, remaining, sid);
 }
 
-String addr = new String(b);
-String[] host_port;
-try {
-host_port = ConfigUtils.getHostAndPort(addr);
-} catch (ConfigException e) {
-throw new InitialMessageException("Badly formed address: %s", 
addr);
-}
+String[] addressStrings = new String(b).split(",");
 
 Review comment:
   referring to our second offline discussion:
   - I incremented the PROTOCOL_VERSION
   - did not do the JSON format modification, as it makes the future code 
complicated to always ensure both forward and backward compatibility of the 
election protocol during the rolling upgrades. The current solution (simply 
failing if the protocol versions mismatch) is more simple and still working 
just fine: as the servers are restarted one-by-one, the nodes with the old 
protocol version and the nodes with the new protocol version will form two 
partitions, but any given time only one partition will have the quorum.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313776919
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java
 ##
 @@ -271,39 +278,42 @@ public boolean isQuorumSynced(QuorumVerifier qv) {
return qv.containsQuorum(ids);
 }
 
-private final ServerSocket ss;
+private final List serverSockets = new LinkedList<>();
 
 Leader(QuorumPeer self,LeaderZooKeeperServer zk) throws IOException {
 this.self = self;
 this.proposalStats = new BufferStats();
+
+Set addresses;
+if (self.getQuorumListenOnAllIPs()) {
+addresses = self.getQuorumAddress().getWildcardAddresses();
+} else {
+addresses = self.getQuorumAddress().getAllAddresses();
+}
+
+for (InetSocketAddress address : addresses) {
+serverSockets.add(createServerSocket(address, 
self.shouldUsePortUnification(), self.isSslQuorum()));
+}
+
+this.zk = zk;
+}
+
+ServerSocket createServerSocket(InetSocketAddress address, boolean 
portUnification, boolean sslQuorum)
+throws IOException {
+ServerSocket serverSocket;
 try {
-if (self.shouldUsePortUnification() || self.isSslQuorum()) {
-boolean allowInsecureConnection = 
self.shouldUsePortUnification();
-if (self.getQuorumListenOnAllIPs()) {
-ss = new UnifiedServerSocket(self.getX509Util(), 
allowInsecureConnection, self.getQuorumAddress().getPort());
-} else {
-ss = new UnifiedServerSocket(self.getX509Util(), 
allowInsecureConnection);
-}
+if (portUnification || sslQuorum) {
+serverSocket = new UnifiedServerSocket(self.getX509Util(), 
portUnification);
 } else {
-if (self.getQuorumListenOnAllIPs()) {
-ss = new ServerSocket(self.getQuorumAddress().getPort());
-} else {
-ss = new ServerSocket();
-}
-}
-ss.setReuseAddress(true);
-if (!self.getQuorumListenOnAllIPs()) {
-ss.bind(self.getQuorumAddress());
+serverSocket = new ServerSocket();
 }
+serverSocket.setReuseAddress(true);
+serverSocket.bind(address);
+return serverSocket;
 } catch (BindException e) {
-if (self.getQuorumListenOnAllIPs()) {
-LOG.error("Couldn't bind to port " + 
self.getQuorumAddress().getPort(), e);
-} else {
-LOG.error("Couldn't bind to " + self.getQuorumAddress(), e);
-}
+LOG.error("Couldn't bind to " + self.getQuorumAddress(), e);
 
 Review comment:
   thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313774702
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java
 ##
 @@ -657,11 +647,31 @@ public CommandResponse run(ZooKeeperServer zkServer, 
Map kwargs)
 TreeMap::new));
 }
 
+private String getMultiAddressString(QuorumPeer.QuorumServer qs) {
+return qs.addr.getAllAddresses().stream()
+.map(address -> getSingleAddressString(qs, address))
+.collect(Collectors.joining(","));
+}
+
+private String getSingleAddressString(QuorumPeer.QuorumServer qs, 
InetSocketAddress address) {
 
 Review comment:
   I changed the format, this is how the `voting_view` admin command responds 
now (I haven't push the commit yet):
   ```
   {
 "current_config" : {
   "1" : {
 "server_addresses" : [ "/172.16.101.11:2888", "/172.16.102.11:2888" ],
 "election_addresses" : [ "/172.16.101.11:3888", "/172.16.102.11:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   },
   "2" : {
 "server_addresses" : [ "/172.16.101.22:2888", "/172.16.102.22:2888" ],
 "election_addresses" : [ "/172.16.101.22:3888", "/172.16.102.22:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   },
   "3" : {
 "server_addresses" : [ "/172.16.101.33:2888", "/172.16.102.33:2888" ],
 "election_addresses" : [ "/172.16.101.33:3888", "/172.16.102.33:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   }
 },
 "command" : "voting_view",
 "error" : null
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313774702
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java
 ##
 @@ -657,11 +647,31 @@ public CommandResponse run(ZooKeeperServer zkServer, 
Map kwargs)
 TreeMap::new));
 }
 
+private String getMultiAddressString(QuorumPeer.QuorumServer qs) {
+return qs.addr.getAllAddresses().stream()
+.map(address -> getSingleAddressString(qs, address))
+.collect(Collectors.joining(","));
+}
+
+private String getSingleAddressString(QuorumPeer.QuorumServer qs, 
InetSocketAddress address) {
 
 Review comment:
   I changed the format, this is how the admin command responds now (I haven't 
push the commit yet):
   ```
   {
 "current_config" : {
   "1" : {
 "server_addresses" : [ "/172.16.101.11:2888", "/172.16.102.11:2888" ],
 "election_addresses" : [ "/172.16.101.11:3888", "/172.16.102.11:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   },
   "2" : {
 "server_addresses" : [ "/172.16.101.22:2888", "/172.16.102.22:2888" ],
 "election_addresses" : [ "/172.16.101.22:3888", "/172.16.102.22:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   },
   "3" : {
 "server_addresses" : [ "/172.16.101.33:2888", "/172.16.102.33:2888" ],
 "election_addresses" : [ "/172.16.101.33:3888", "/172.16.102.33:3888" 
],
 "client_address" : "/0.0.0.0:2181",
 "learner_type" : "participant"
   }
 },
 "command" : "voting_view",
 "error" : null
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313739495
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java
 ##
 @@ -18,16 +18,8 @@
 
 package org.apache.zookeeper.server.admin;
 
-import java.util.Arrays;
-import java.util.Collections;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Properties;
-import java.util.Set;
-import java.util.SortedMap;
-import java.util.TreeMap;
+import java.net.InetSocketAddress;
+import java.util.*;
 
 Review comment:
   sure, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [zookeeper] symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve resilience to network

2019-08-14 Thread GitBox
symat commented on a change in pull request #1048: ZOOKEEPER-3188: Improve 
resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#discussion_r313738065
 
 

 ##
 File path: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/admin/Commands.java
 ##
 @@ -657,11 +647,31 @@ public CommandResponse run(ZooKeeperServer zkServer, 
Map kwargs)
 TreeMap::new));
 }
 
+private String getMultiAddressString(QuorumPeer.QuorumServer qs) {
+return qs.addr.getAllAddresses().stream()
+.map(address -> getSingleAddressString(qs, address))
+.collect(Collectors.joining(","));
+}
+
+private String getSingleAddressString(QuorumPeer.QuorumServer qs, 
InetSocketAddress address) {
 
 Review comment:
   Yes, I agree, I will change it. 
   
   This admin command is about to show the internal view of the voting members 
(how the zookeeper server thinks who the voting members are and where do they 
listen). I wouldn't complicate this PR any further, but it might be a good idea 
to create a follow-up ticket to have some admin command showing that if the 
given server can reach all the different ports of the other servers (not only 
the voting members). It can help debugging network problems, showing if certain 
network interfaces on some servers are unreachable. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services