[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943747#comment-14943747 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- Great works, thanks [~rthille] and [~fpj]! > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Robert P. Thille >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.0, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901591#comment-14901591 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12761513/ZOOKEEPER-1506.patch against trunk revision 1702378. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 57 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2884//console This message is automatically generated. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Flavio Junqueira >Priority: Blocker > Labels: patch > Fix For: 3.4.7 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900907#comment-14900907 ] Flavio Junqueira commented on ZOOKEEPER-1506: - There is still one conflict in BaseSysTest. For the next patch, make sure to hit the "Submit patch" button at the top, please. This way QA runs and tell you if the patch applies fine, although it will only check trunk. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Robert P. Thille >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744528#comment-14744528 ] Robert P. Thille commented on ZOOKEEPER-1506: - Ah, I see what I'm doing wrong. I have my editor strip trailing whitespace, but the How to Contribute page ( http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute ) asks that you not "reformat code unrelated to the bug being fixed", so I was generating the patch with 'git diff --no-prefix -b' go ignore the whitespace changes, but that seems to generate a patch that will not apply. I'll manually fixup the patch and resubmit. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Robert P. Thille >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, Zookeeper-1506.patch, > zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740915#comment-14740915 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12755248/ZOOKEEPER-1506.patch against trunk revision 1702378. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 57 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2871//console This message is automatically generated. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Robert P. Thille >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, Zookeeper-1506.patch, > zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739716#comment-14739716 ] Robert P. Thille commented on ZOOKEEPER-1506: - The call to the QuorumServer constructor sets 'type' to null, but the QuorumServer constructor checks for null and doesn't set the instance's type variable in that case, so it still gets the default of PARTICIPANT. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Raul Gutierrez Segales >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739580#comment-14739580 ] Flavio Junqueira commented on ZOOKEEPER-1506: - bq. don't think it matters if type is null there. I'm not dereferencing type, just comparing it to the constant Indeed, but there is still the issue that the default was PARTICIPANT and you're changing to null. I haven't tried to determine the implications of the change, but it sounds better to just keep what it is currently. bq. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so the tests would actually hit running Zookeeper servers, so I switched it to the standard ( https://tools.ietf.org/html/rfc5735 ) TEST-NET-1 address range. Sounds good. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Raul Gutierrez Segales >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737680#comment-14737680 ] Robert P. Thille commented on ZOOKEEPER-1506: - 0. Sorry, I'll regen the patch 1. The style stuff is largely from the patches I applied, but I'll clean that up and re-submit the patch 2. I'm no Java expert, but don't think it matters if type is null there. I'm not dereferencing type, just comparing it to the constant... 3. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so the tests would actually hit running Zookeeper servers, so I switched it to the standard ( https://tools.ietf.org/html/rfc5735 ) TEST-NET-1 address range. > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Raul Gutierrez Segales >Priority: Blocker > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737571#comment-14737571 ] Flavio Junqueira commented on ZOOKEEPER-1506: - A few issues I observed with this patch: # Please follow our coding style and add spaces accordingly, e.g., `if (parts.length>2)` should be `if (parts.length > 2)` # type can be null here `if (type == LearnerType.OBSERVER)` in the case of parts.length being less or equal to 3 and will throw an NPE # Why have you changed the address range here `String deadAddress = new String("192.0.2." + finalOctet);`? > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Raul Gutierrez Segales >Priority: Critical > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737517#comment-14737517 ] Flavio Junqueira commented on ZOOKEEPER-1506: - [~rthille] could you generate the patch with --no-prefix, please? > Re-try DNS hostname -> IP resolution if node connection fails > - > > Key: ZOOKEEPER-1506 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.5, 3.4.6 > Environment: Ubuntu 11.04 64-bit >Reporter: Mike Heffner >Assignee: Raul Gutierrez Segales >Priority: Critical > Labels: patch > Fix For: 3.4.7, 3.5.1, 3.6.0 > > Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, > ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch > > >In our zoo.cfg we use hostnames to identify the ZK servers that are part > of an ensemble. These hostnames are configured with a low (<= 60s) TTL and > the IP address they map to can and does change. Our procedure for > replacing/upgrading a ZK node is to boot an entirely new instance and remap > the hostname to the new instance's IP address. Our expectation is that when > the original ZK node is terminated/shutdown, the remaining nodes in the > ensemble would reconnect to the new instance. > However, what we are noticing is that the remaining ZK nodes do not attempt > to re-resolve the hostname->IP mapping for the new server. Once the original > ZK node is terminated, the existing servers continue to attempt contacting it > at the old IP address. It would be great if the ZK servers could try to > re-resolve the hostname when attempting to connect to a lost ZK server, > instead of caching the lookup indefinitely. Currently we must do a rolling > restart of the ZK ensemble after swapping a node -- which at three nodes > means we periodically lose quorum. > The exact method we are following is to boot new instances in EC2 and attach > one, of a set of three, Elastic IP address. External to EC2 this IP address > remains the same and maps to whatever instance it is attached to. Internal to > EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped > to the internal (10.x.y.z) address of the instance it is attached to. > Therefore, in our case we would like ZK to pickup the new 10.x.y.z address > that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715246#comment-14715246 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752535/Zookeeper-1506.patch against trunk revision 1697551. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 70 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2842//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5, 3.4.6 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Raul Gutierrez Segales Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715832#comment-14715832 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752610/ZOOKEEPER-1506.patch against trunk revision 1697551. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 70 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2844//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5, 3.4.6 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Raul Gutierrez Segales Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711955#comment-14711955 ] Robert P. Thille commented on ZOOKEEPER-1506: - I've got a patch (against release-3.4.6) which we're using in-house which includes fixes to the tests. Not sure how applicable it'd be to the 3.4 branch (we wanted minimal changes to the stable release). I had to add one more call to s.recreateSocketAddresses() in Learner.java to get it to function properly with my (not-included, too dependent on our test environment) integration tests. I'm sending a request to Legal to get the release approval (likely a rubber-stamp). Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Raul Gutierrez Segales Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708561#comment-14708561 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- [~andre.c...@meteo.pt]: sorry, I dropped the ball here. I'll prepare a patch for 3.4 later today, and we'll include it on the 3.4.7 release. Thanks! Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Raul Gutierrez Segales Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597841#comment-14597841 ] André Cruz commented on ZOOKEEPER-1506: --- Any news regarding the inclusion of this fix in a stable zookeeper version? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Raul Gutierrez Segales Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503825#comment-14503825 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12726674/ZOOKEEPER-1506-fix.patch against trunk revision 1672934. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503871#comment-14503871 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: Thanks Raul for testing this. I'd try replacing calls to getHostName to getHostString. For example, I found another one in QuorumCnxManager.java: org/apache/zookeeper/server/quorum/QuorumCnxManager.java:String addr = self.getElectionAddress().getHostName() + : + self.getElectionAddress().getPort(); Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503821#comment-14503821 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- actually, maybe it does. not sure my first try was clean. couldn't get a repro after the 2nd try. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503671#comment-14503671 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: The patch uses HostNameUtils.getHostString(), which supposedly avoid reverse lookup. Maybe there is a bug in HostNameUtils.getHostString()? We can replace HostNameUtils with InetSocketAddress.getHostString since we now require java7. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503799#comment-14503799 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- hmmm, it does not help [~michim] Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503846#comment-14503846 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- It doesn't. I got a consistent repro by first firewalling the participant with id 0, to force that code path. I'll try reverting the patch entirely and see if that helps. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504004#comment-14504004 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- So I created a build with ZOOKEEPER-1506 removed and I still get the problem. It's probably due the getHostName() calls that you pointed out. These calls can actually generate reverse lookups according to: http://download.java.net/jdk7/archive/b123/docs/api/java/net/InetSocketAddress.html#getHostName%28%29 However, these calls have been introduced by ZOOKEEPER-107 (according to git-blame). I think we should avoid them, though lets do that in another ticket. In conclusion, if you have a bad resolver or bogus reverse lookups (as is the case in my test scenario): you'll have issues because of these calls. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504007#comment-14504007 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- I'll go ahead and close this again [~michim]. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503564#comment-14503564 ] Raul Gutierrez Segales commented on ZOOKEEPER-1506: --- I am running elections (in a 5 participants + 1 observer cluster) as part of validating the 3.5.1 alpha rc proposed by [~michim]. I am getting this from time to time: https://gist.github.com/rgs1/d11822799fdbbfa5d5f2 I only have IP addresses in zoo.cfg and this patch seems to be triggering a reverse lookup (IP- hostname). Given that in my current setup (a test setup, with systemd-nspawn containers) hostnames don't necessarily resolve back (i.e.: hostname - IP doesn't work), participants might end up unable to connect to the leader if it's initially unavailable. Is the reverse lookup (IP - hostname) something expected with this patch or a side effect? I don't see why we'd ever want/need that reverse lookup given that it could be problematic in some setups. Thoughts? p.s.: will post my entire, reproducible, setup a bit later. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michi Mutsuzaki Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487882#comment-14487882 ] Rakesh R commented on ZOOKEEPER-1506: - Committed to trunk : http://svn.apache.org/viewvc?view=revisionrevision=1672436 Committed to br3.5 : http://svn.apache.org/viewvc?view=revisionrevision=1672438 Since the patch has conflicts in 3.4 branch, am not resolving this issue now. [~michim] will you generate a 3.4 patch? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487759#comment-14487759 ] Camille Fournier commented on ZOOKEEPER-1506: - +1 Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487794#comment-14487794 ] Rakesh R commented on ZOOKEEPER-1506: - I'll commit this shortly. Thanks [~fpj], [~fournc], all others for the reviews and [~michim], [~mlasevich] for taking care this issue. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386759#comment-14386759 ] Flavio Junqueira commented on ZOOKEEPER-1506: - +1, lgtm. [~fournc], is it a go? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386072#comment-14386072 ] Rakesh R commented on ZOOKEEPER-1506: - Thanks [~michim] for the fix. +1 latest patch looks good to me. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385608#comment-14385608 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: [~fournc] [~fpj] could you take a look at the patch? Once this gets checked in, I'll create a release candidate for 3.5.1. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379493#comment-14379493 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707161/ZOOKEEPER-1506.patch against trunk revision 1669060. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379422#comment-14379422 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: Thanks [~fpj], yeah I think you are right. I'll change QuorumPeer.connectNewPeers to call connectOne(long sid) and make connectOne(long sid, InetSocketAddress electionAddr) private so that everybody goes through connectOne(long sid). The only other place we use connectOne(long sid, InetSocketAddress electionAddr) is QuorumCnxManager.receiveConnection() but this one is ok since it creates a new InetSocketAddress explicitly. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379910#comment-14379910 ] Flavio Junqueira commented on ZOOKEEPER-1506: - The change looks good, thanks. About the retry logic, the change seems gratuitous. I don´t see a big issue with removing it, but it sounds best to leave the retry logic as is because it is unrelated to the fix here. If needed, propose it in a new jira. Does it make sense? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380387#comment-14380387 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12707249/ZOOKEEPER-1506.patch against trunk revision 1669062. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376930#comment-14376930 ] Flavio Junqueira commented on ZOOKEEPER-1506: - One clarification here. Don't we have to guarantee that all code paths to connectOne (both signatures) recreate addresses? It looks like it isn't the case with this patch. For example, QuorumPeer.connectNewPeers invokes the version of connectOne that doesn't recreate. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362293#comment-14362293 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12704647/ZOOKEEPER-1506.patch against trunk revision 1666768. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362115#comment-14362115 ] Camille Fournier commented on ZOOKEEPER-1506: - The only concern I have is removing the retries. I think it's probably right to do but does it imply documentation changes anywhere that we need to make? [~michim]? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362075#comment-14362075 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: [~fournc][~rakeshr] could you take a look at the patch? I'd like to get this in for 3.5.1. Thanks! Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358452#comment-14358452 ] koray sariteke commented on ZOOKEEPER-1506: --- We also have that problem, when will it be submitted to release? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358571#comment-14358571 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch against trunk revision 1665315. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358559#comment-14358559 ] Rakesh R commented on ZOOKEEPER-1506: - bq.Perhaps we just have to agree to not having a test case for this I also failed to find a better approach to write unit test for this. I agree to not having a test case. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176590#comment-14176590 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch against trunk revision 1632209. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1, 3.6.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174792#comment-14174792 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch against trunk revision 1632209. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175143#comment-14175143 ] Flavio Junqueira commented on ZOOKEEPER-1506: - I'm not sure I have any great suggestion for how to do a test case for this fix. Any suggestion? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175438#comment-14175438 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: We can modify /etc/hosts (or a local hosts file specified in HOSTALIASES environment variable) to simulate ip address change. This wont' work on windows though. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175512#comment-14175512 ] Ramya Bharathi Nimmagadda commented on ZOOKEEPER-1506: -- Like the idea. There is a similar file in Windows at %SystemRoot%\system32\drivers\etc\hosts that could be updated. Here's the format : #The IP address should be placed in the first column followed by the corresponding host name. #The IP address and the host name should be separated by at least one space. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174315#comment-14174315 ] Flavio Junqueira commented on ZOOKEEPER-1506: - Let me clarify my previous point. The description of this jira talks about bringing up a new instance to upgrade/replace a ZK node. It it starting a ZK node from scratch, essentially dropping the state of the previous incarnation of the node, that I was trying to point out that could be a problem. Resolving names again as this patch proposes is fine. We should get this one in. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174502#comment-14174502 ] Ramya Bharathi Nimmagadda commented on ZOOKEEPER-1506: -- Thanks Flavio. [~michim] Would you be able to submit a new patch with unit tests included? Thanks Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.4.7, 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135508#comment-14135508 ] Joshua Buss commented on ZOOKEEPER-1506: This is very, very annoying. Please get the fix into master as soon as possible. Thanks from another heavy user of cloud services (this affects any environment where you use the same hostnames for configuration purposes but dynamically re-build instances which get new IPs). Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135537#comment-14135537 ] Flavio Junqueira commented on ZOOKEEPER-1506: - I think there is a deeper problem here that I'm worried about. If you point a name to a different server starting from scratch, then it is like the server state has been wiped out from the perspective of the ZK replication, which can lead to state loss in some corner cases. One way around this is to use reconfiguration: remove and re-introduce the server. Is something like this doable for you? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135572#comment-14135572 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch against trunk revision 1623916. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.1 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020955#comment-14020955 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch against trunk revision 1600481. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019799#comment-14019799 ] MUFEED USMAN commented on ZOOKEEPER-1506: - Nope. I do not have any patch installed. Happened to hit this JIRA during my search to understand the working relationship between ZooKeeper and DNS when the service was disrupted and did not have an automatic recovery. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018482#comment-14018482 ] MUFEED USMAN commented on ZOOKEEPER-1506: - In the case where no IPs change, but a mere DNS outage occurs (for some unforeseen reason); shouldn't the ZKs be able to able to make use of the cached info and restore connectivity once the DNS is back up? In the case I'm handling I faced the following: 2014-02-06 20:22:54,438 [myid:0] - INFO [WorkerReceiver[myid=0]:FastLeaderElection@542] - Notification: 0 (n.leader), 0x20171 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x2 ( n.peerEPoch), LOOKING (my state) 2014-02-06 20:22:54,451 [myid:0] - WARN [WorkerSender[myid=0]:QuorumCnxManager@368] - Cannot open channel to 1 at election address node02:3888 java.net.UnknownHostException: node02 And only ZK restarts restored services. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018500#comment-14018500 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: Hi Mufeed, Good point, I can update this.addr and this.electionAddr only if the InetSocketAddress constructor actually resolved the hostname. By the way, are you running ZooKeeper with this patch applied? Does the patch fix the original problem? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993789#comment-13993789 ] Diwaker Gupta commented on ZOOKEEPER-1506: -- This is a severe issue in any environment that uses DNSMasq or relies on DHCP for DNS mappings. Would really like to see this fixed for 3.5.0, or better yet, the next bugfix release in 3.4.x if there's going to be one. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997571#comment-13997571 ] Flavio Junqueira commented on ZOOKEEPER-1506: - [~diwaker], is the patch here good for you, have you had a chance to review it? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986397#comment-13986397 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986773#comment-13986773 ] Michael Lasevich commented on ZOOKEEPER-1506: - [~michim] Thanks for taking this over, I clearly lost track of it. For what its worth, we have been running this patch in production for almost a year with no issues but it would be nice to have it properly merged. Thank you. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986822#comment-13986822 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: Thanks Michael, it's good to know that you've been using this patch without an issue. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986872#comment-13986872 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987036#comment-13987036 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: I've been running FollowerResyncConcurrencyTest for a while, but I can't reproduce the failure. Maybe it's a transient failure. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987094#comment-13987094 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986164#comment-13986164 ] Michi Mutsuzaki commented on ZOOKEEPER-1506: I'd like to get this fixed in 3.5. I'll rebase the patch and address Flavio's comments. I'm still not sure how we should test this though. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986226#comment-13986226 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642754/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986373#comment-13986373 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch against trunk revision 1591175. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Priority: Critical Labels: patch Fix For: 3.5.0 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934720#comment-13934720 ] Robert Kamphuis commented on ZOOKEEPER-1506: As this seems not to progress much, some tips for people working in AWS or similar enough environments. This is likely not the only, nor the best, but it is working for me. - configure the zookeeper ensemble servers to connect to the elastic-IP-address in stead of the hostname - on a serious failure of one of the servers, boot a replacement, and re-assign the corresponding elastic-ip to that server. - others will reconnect correctly - you will need to setup the Security group to explicitly enable the interconnect to 2888/3888(/2181) or your ports of choice for the elasticIPs to enable the connections to work. - downsides: -# traffic between zookeeper servers goes via whatever boxes doing the elastic-ip to server mapping - bigger latency. My measurements as an example: ping using private IPs vs elastic- IPs: 0.8 ms vs 1.4 msec (500 byte packets - servers in two different AZs in US-east) -# you will need to pay for this traffic whereas when using the names which are mapped to the internal IPs you would not. Also: for the clients, I am using as connect string static DNS records with names like: zookeeperN.domain pointing to the ec2-A-B-C-D.compute-1.amazonaws.com - thus pointing to the elastic-ip's name and not the IPs. These are mapped by EC2 to the active private IPs after assigning the elastic-ip to an instance. The clients will be recognised properly as from the correct security group(s). No need to add all the client IPs - of which I have many, and changing set; just add the clients security groups access to the the zookeeper security group. BTW: if someone knows of good resources running zookeeper and curator-based clients in AWS I would kindly like to know where... Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Labels: patch Fix For: 3.5.0 Attachments: zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887902#comment-13887902 ] Flavio Junqueira commented on ZOOKEEPER-1506: - A few comments and concerns with the current patch: # What's the point of making the QuorumServer constructors private? # Please check the spacing, it is not aligned with the other lines. # This patch will recreate the socket addresses every time a connection in QCM fails to get established. Sometimes this happens because the server is not up yet, and in such cases, if the mapping hasn't changed, then the recreation of the socket addresses is necessary. Right now, I don't have any great idea to get around it, so I just wanted to point it out. # I was also thinking about how to test this. It would be good to have a test case. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Labels: patch Fix For: 3.5.0 Attachments: zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886758#comment-13886758 ] charitymajors commented on ZOOKEEPER-1506: -- This is affecting us too. This is a pretty terrible bug for anyone trying to run zk in the cloud. :( Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Labels: patch Fix For: 3.5.0 Attachments: zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886761#comment-13886761 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch against trunk revision 1561672. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1904//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Assignee: Michael Lasevich Labels: patch Fix For: 3.5.0 Attachments: zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650223#comment-13650223 ] Hadoop QA commented on ZOOKEEPER-1506: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch against trunk revision 1463329. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1467//console This message is automatically generated. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Labels: patch Attachments: zk-dns-caching-refresh.patch In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647059#comment-13647059 ] Daniel Heidebrecht commented on ZOOKEEPER-1506: --- I am also seeing this with a 5 node Zookeeper 3.4.5 cluster running in AWS. All nodes in the cluster must be restarted. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647087#comment-13647087 ] Matt Wise commented on ZOOKEEPER-1506: -- This is still an ongoing issue. We've tried making changes to the way Java handles DNS caching and we've been unable to solve the issue. This really seems like a simple thing to fix, and its critical when running Zookeeper in Amazons cloud where internal IP addresses change each time you boot up. Please... somebody fix this! Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571488#comment-13571488 ] Matt Wise commented on ZOOKEEPER-1506: -- This is still an ongoing issue. An AWS failure last night caused us to have to restart our entire Zookeeper cluster one-node-at-a-time because of this bug. Can someone at least set a target date for it? Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Priority: Minor In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571550#comment-13571550 ] Thawan Kooburat commented on ZOOKEEPER-1506: There is also a problem with Java DNS caching. I haven't tried this but you might want to check this out http://stackoverflow.com/questions/1256556/any-way-to-make-java-honor-the-dns-caching-timeout-ttl Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.4.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Priority: Minor In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511207#comment-13511207 ] Matt Wise commented on ZOOKEEPER-1506: -- This is also impacting our environment. We run our entire infrastructure in the cloud, and use a set of static (but short TTL'd) hostnames to point to our Zookeeper instances. We have the exact same concern ... that right now it requires a restart of the Zookeeper application on each system to trigger a 're-lookup' of the DNS record. This seems like it should be customizable with a simple option. If 'dns_lookup_on_connect' is turned on, it should do a new DNS lookup for every connection to a ensemble server. If its disabled, it can lookup the DNS once and only once. Tools like 'stunnel' have this behavior and its great in the cloud. Re-try DNS hostname - IP resolution if node connection fails - Key: ZOOKEEPER-1506 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.3.5 Environment: Ubuntu 11.04 64-bit Reporter: Mike Heffner Priority: Minor In our zoo.cfg we use hostnames to identify the ZK servers that are part of an ensemble. These hostnames are configured with a low (= 60s) TTL and the IP address they map to can and does change. Our procedure for replacing/upgrading a ZK node is to boot an entirely new instance and remap the hostname to the new instance's IP address. Our expectation is that when the original ZK node is terminated/shutdown, the remaining nodes in the ensemble would reconnect to the new instance. However, what we are noticing is that the remaining ZK nodes do not attempt to re-resolve the hostname-IP mapping for the new server. Once the original ZK node is terminated, the existing servers continue to attempt contacting it at the old IP address. It would be great if the ZK servers could try to re-resolve the hostname when attempting to connect to a lost ZK server, instead of caching the lookup indefinitely. Currently we must do a rolling restart of the ZK ensemble after swapping a node -- which at three nodes means we periodically lose quorum. The exact method we are following is to boot new instances in EC2 and attach one, of a set of three, Elastic IP address. External to EC2 this IP address remains the same and maps to whatever instance it is attached to. Internal to EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped to the internal (10.x.y.z) address of the instance it is attached to. Therefore, in our case we would like ZK to pickup the new 10.x.y.z address that the elastic IP hostname gets mapped to and reconnect appropriately. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira