[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-10-05 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14943747#comment-14943747
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

Great works, thanks [~rthille] and [~fpj]!

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Robert P. Thille
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.0, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-21 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14901591#comment-14901591
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12761513/ZOOKEEPER-1506.patch
  against trunk revision 1702378.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 57 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2884//console

This message is automatically generated.

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Flavio Junqueira
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-21 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14900907#comment-14900907
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

There is still one conflict in BaseSysTest. For the next patch, make sure to 
hit the "Submit patch" button at the top, please. This way QA runs and tell you 
if the patch applies fine, although it will only check trunk.

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Robert P. Thille
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-14 Thread Robert P. Thille (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14744528#comment-14744528
 ] 

Robert P. Thille commented on ZOOKEEPER-1506:
-

Ah, I see what I'm doing wrong. I have my editor strip trailing whitespace, but 
the How to Contribute page ( 
http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute ) asks that you not 
"reformat code unrelated to the bug being fixed", so I was generating the patch 
with 'git diff --no-prefix -b' go ignore the whitespace changes, but that seems 
to generate a patch that will not apply.  I'll manually fixup the patch and 
resubmit.

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Robert P. Thille
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, Zookeeper-1506.patch, 
> zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-11 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14740915#comment-14740915
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12755248/ZOOKEEPER-1506.patch
  against trunk revision 1702378.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 57 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2871//console

This message is automatically generated.

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Robert P. Thille
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, Zookeeper-1506.patch, 
> zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-10 Thread Robert P. Thille (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739716#comment-14739716
 ] 

Robert P. Thille commented on ZOOKEEPER-1506:
-

The call to the QuorumServer constructor sets 'type' to null, but the 
QuorumServer constructor checks for null and doesn't set the instance's type 
variable in that case, so it still gets the default of PARTICIPANT.


> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Raul Gutierrez Segales
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-10 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14739580#comment-14739580
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

bq. don't think it matters if type is null there. I'm not dereferencing type, 
just comparing it to the constant

Indeed, but there is still the issue that the default was PARTICIPANT and 
you're changing to null. I haven't tried to determine the implications of the 
change, but it sounds better to just keep what it is currently. 

bq. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so 
the tests would actually hit running Zookeeper servers, so I switched it to the 
standard ( https://tools.ietf.org/html/rfc5735 ) TEST-NET-1 address range.

Sounds good.

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Raul Gutierrez Segales
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-09 Thread Robert P. Thille (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737680#comment-14737680
 ] 

Robert P. Thille commented on ZOOKEEPER-1506:
-

0. Sorry, I'll regen the patch
1. The style stuff is largely from the patches I applied, but I'll clean that 
up and re-submit the patch
2. I'm no Java expert, but don't think it matters if type is null there. I'm 
not dereferencing type, just comparing it to the constant...
3. 10.1.1.x isn't necessarily a dead address (and isn't in our network), so the 
tests would actually hit running Zookeeper servers, so I switched it to the 
standard ( https://tools.ietf.org/html/rfc5735 ) TEST-NET-1 address range.

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Raul Gutierrez Segales
>Priority: Blocker
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-09 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737571#comment-14737571
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

A few issues I observed with this patch:

# Please follow our coding style and add spaces accordingly, e.g., `if 
(parts.length>2)` should be `if (parts.length > 2)`
# type can be null here `if (type == LearnerType.OBSERVER)` in the case of 
parts.length being less or equal to 3 and will throw an NPE
# Why have you changed the address range here `String deadAddress = new 
String("192.0.2." + finalOctet);`?

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Raul Gutierrez Segales
>Priority: Critical
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname -> IP resolution if node connection fails

2015-09-09 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737517#comment-14737517
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

[~rthille] could you generate the patch with --no-prefix, please?

> Re-try DNS hostname -> IP resolution if node connection fails
> -
>
> Key: ZOOKEEPER-1506
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.5, 3.4.6
> Environment: Ubuntu 11.04 64-bit
>Reporter: Mike Heffner
>Assignee: Raul Gutierrez Segales
>Priority: Critical
>  Labels: patch
> Fix For: 3.4.7, 3.5.1, 3.6.0
>
> Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
> ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch
>
>
>In our zoo.cfg we use hostnames to identify the ZK servers that are part 
> of an ensemble. These hostnames are configured with a low (<= 60s) TTL and 
> the IP address they map to can and does change. Our procedure for 
> replacing/upgrading a ZK node is to boot an entirely new instance and remap 
> the hostname to the new instance's IP address. Our expectation is that when 
> the original ZK node is terminated/shutdown, the remaining nodes in the 
> ensemble would reconnect to the new instance.
> However, what we are noticing is that the remaining ZK nodes do not attempt 
> to re-resolve the hostname->IP mapping for the new server. Once the original 
> ZK node is terminated, the existing servers continue to attempt contacting it 
> at the old IP address. It would be great if the ZK servers could try to 
> re-resolve the hostname when attempting to connect to a lost ZK server, 
> instead of caching the lookup indefinitely. Currently we must do a rolling 
> restart of the ZK ensemble after swapping a node -- which at three nodes 
> means we periodically lose quorum.
> The exact method we are following is to boot new instances in EC2 and attach 
> one, of a set of three, Elastic IP address. External to EC2 this IP address 
> remains the same and maps to whatever instance it is attached to. Internal to 
> EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
> to the internal (10.x.y.z) address of the instance it is attached to. 
> Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
> that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715246#comment-14715246
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12752535/Zookeeper-1506.patch
  against trunk revision 1697551.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 70 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2842//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5, 3.4.6
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Raul Gutierrez Segales
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 Zookeeper-1506.patch, zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-08-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14715832#comment-14715832
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12752610/ZOOKEEPER-1506.patch
  against trunk revision 1697551.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 70 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2844//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5, 3.4.6
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Raul Gutierrez Segales
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, Zookeeper-1506.patch, zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-08-25 Thread Robert P. Thille (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711955#comment-14711955
 ] 

Robert P. Thille commented on ZOOKEEPER-1506:
-

I've got a patch (against release-3.4.6) which we're using in-house which 
includes fixes to the tests.  Not sure how applicable it'd be to the 3.4 branch 
(we wanted minimal changes to the stable release).  I had to add one more call 
to s.recreateSocketAddresses() in Learner.java to get it to function properly 
with my (not-included, too dependent on our test environment) integration 
tests.  I'm sending a request to Legal to get the release approval (likely a 
rubber-stamp).

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Raul Gutierrez Segales
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-08-23 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708561#comment-14708561
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

[~andre.c...@meteo.pt]: sorry, I dropped the ball here. I'll prepare a patch 
for 3.4 later today, and we'll include it on the 3.4.7 release. Thanks!

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Raul Gutierrez Segales
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-06-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597841#comment-14597841
 ] 

André Cruz commented on ZOOKEEPER-1506:
---

Any news regarding the inclusion of this fix in a stable zookeeper version?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Raul Gutierrez Segales
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


In our zoo.cfg we use hostnames to identify the ZK servers that are part 
 of an ensemble. These hostnames are configured with a low (= 60s) TTL and 
 the IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503825#comment-14503825
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12726674/ZOOKEEPER-1506-fix.patch
  against trunk revision 1672934.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2641//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503871#comment-14503871
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


Thanks Raul for testing this. I'd try replacing calls to getHostName to 
getHostString. For example, I found another one in QuorumCnxManager.java:

org/apache/zookeeper/server/quorum/QuorumCnxManager.java:String 
addr = self.getElectionAddress().getHostName() + : + 
self.getElectionAddress().getPort();


 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503821#comment-14503821
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

actually, maybe it does. not sure my first try was clean. couldn't get a repro 
after the 2nd try. 

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503671#comment-14503671
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


The patch uses HostNameUtils.getHostString(), which supposedly avoid reverse 
lookup. Maybe there is a bug in HostNameUtils.getHostString()?

We can replace HostNameUtils with InetSocketAddress.getHostString since we now 
require java7.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503799#comment-14503799
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

hmmm, it does not help [~michim]

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503846#comment-14503846
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

It doesn't. I got a consistent repro by first firewalling the participant with 
id 0, to force that code path.

I'll try reverting the patch entirely and see if that helps.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504004#comment-14504004
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

So I created a build with ZOOKEEPER-1506 removed and I still get the problem.

It's probably due the getHostName() calls that you pointed out. These calls can 
actually generate reverse lookups according to:

http://download.java.net/jdk7/archive/b123/docs/api/java/net/InetSocketAddress.html#getHostName%28%29

However, these calls have been introduced by ZOOKEEPER-107 (according to 
git-blame). I think we should avoid them, though lets do that in another ticket.

In conclusion, if you have a bad resolver or bogus reverse lookups (as is the 
case in my test scenario): you'll have issues because of these calls.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504007#comment-14504007
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

I'll go ahead and close this again [~michim].

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506-fix.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-20 Thread Raul Gutierrez Segales (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503564#comment-14503564
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-1506:
---

I am running elections (in a 5 participants + 1 observer cluster) as part of 
validating the 3.5.1 alpha rc proposed by [~michim]. I am getting this from 
time to time:

https://gist.github.com/rgs1/d11822799fdbbfa5d5f2

I only have IP addresses in zoo.cfg and this patch seems to be triggering a 
reverse lookup (IP- hostname). Given that in my current setup (a test setup, 
with systemd-nspawn containers) hostnames don't necessarily resolve back (i.e.: 
hostname - IP doesn't work), participants might end up unable to connect to 
the leader if it's initially unavailable.

Is the reverse lookup (IP - hostname) something expected with this patch or a 
side effect? I don't see why we'd ever want/need that reverse lookup given that 
it could be problematic in some setups.

Thoughts?

p.s.: will post my entire, reproducible, setup a bit later. 

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michi Mutsuzaki
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-09 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487882#comment-14487882
 ] 

Rakesh R commented on ZOOKEEPER-1506:
-

Committed to trunk : http://svn.apache.org/viewvc?view=revisionrevision=1672436
Committed to br3.5 : http://svn.apache.org/viewvc?view=revisionrevision=1672438

Since the patch has conflicts in 3.4 branch, am not resolving this issue now. 
[~michim] will you generate a 3.4 patch?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-09 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487759#comment-14487759
 ] 

Camille Fournier commented on ZOOKEEPER-1506:
-

+1

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-04-09 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487794#comment-14487794
 ] 

Rakesh R commented on ZOOKEEPER-1506:
-

I'll commit this shortly.

Thanks [~fpj], [~fournc], all others for the reviews and [~michim], 
[~mlasevich] for taking care this issue.


 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-30 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386759#comment-14386759
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

+1, lgtm. [~fournc], is it a go?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-29 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386072#comment-14386072
 ] 

Rakesh R commented on ZOOKEEPER-1506:
-

Thanks [~michim] for the fix. +1 latest patch looks good to me.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-28 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14385608#comment-14385608
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


[~fournc] [~fpj] could you take a look at the patch? Once this gets checked in, 
I'll create a release candidate for 3.5.1.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379493#comment-14379493
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12707161/ZOOKEEPER-1506.patch
  against trunk revision 1669060.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2588//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-25 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379422#comment-14379422
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


Thanks [~fpj], yeah I think you are right. I'll change 
QuorumPeer.connectNewPeers to call connectOne(long sid) and make 
connectOne(long sid, InetSocketAddress electionAddr) private so that everybody 
goes through connectOne(long sid). The only other place we use connectOne(long 
sid, InetSocketAddress electionAddr) is QuorumCnxManager.receiveConnection() 
but this one is ok since it creates a new InetSocketAddress explicitly.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-25 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379910#comment-14379910
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

The change looks good, thanks. About the retry logic, the change seems 
gratuitous. I don´t see a big issue with removing it, but it sounds best to 
leave the retry logic as is because it is unrelated to the fix here. If needed, 
propose it in a new jira. Does it make sense?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14380387#comment-14380387
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12707249/ZOOKEEPER-1506.patch
  against trunk revision 1669062.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2589//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-23 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376930#comment-14376930
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

One clarification here. Don't we have to guarantee that all code paths to 
connectOne (both signatures) recreate addresses? It looks like it isn't the 
case with this patch. For example, QuorumPeer.connectNewPeers invokes the 
version of connectOne that doesn't recreate. 

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-15 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362293#comment-14362293
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12704647/ZOOKEEPER-1506.patch
  against trunk revision 1666768.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2570//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-14 Thread Camille Fournier (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362115#comment-14362115
 ] 

Camille Fournier commented on ZOOKEEPER-1506:
-

The only concern I have is removing the retries. I think it's probably right to 
do but does it imply documentation changes anywhere that we need to make? 
[~michim]?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-14 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14362075#comment-14362075
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


[~fournc][~rakeshr] could you take a look at the patch? I'd like to get this in 
for 3.5.1. Thanks!

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-12 Thread koray sariteke (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358452#comment-14358452
 ] 

koray sariteke commented on ZOOKEEPER-1506:
---

We also have that problem, when will it be submitted to release?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358571#comment-14358571
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch
  against trunk revision 1665315.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2559//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2015-03-12 Thread Rakesh R (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358559#comment-14358559
 ] 

Rakesh R commented on ZOOKEEPER-1506:
-

bq.Perhaps we just have to agree to not having a test case for this
I also failed to find a better approach to write unit test for this. I agree to 
not having a test case.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14176590#comment-14176590
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12675762/ZOOKEEPER-1506.patch
  against trunk revision 1632209.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2395//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174792#comment-14174792
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch
  against trunk revision 1632209.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2394//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-17 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175143#comment-14175143
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

I'm not sure I have any great suggestion for how to do a test case for this 
fix. Any suggestion?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-17 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175438#comment-14175438
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


We can modify /etc/hosts (or a local hosts file specified in HOSTALIASES 
environment variable) to simulate ip address change. This wont' work on windows 
though.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-17 Thread Ramya Bharathi Nimmagadda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14175512#comment-14175512
 ] 

Ramya Bharathi Nimmagadda commented on ZOOKEEPER-1506:
--

Like the idea. There is a similar file in Windows at 
%SystemRoot%\system32\drivers\etc\hosts that could be updated. 

Here's the format : 
#The IP address should be placed in the first column followed by the 
corresponding host name.
#The IP address and the host name should be separated by at least one space.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-16 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174315#comment-14174315
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

Let me clarify my previous point. The description of this jira talks about 
bringing up a new instance to upgrade/replace a ZK node. It it starting a ZK 
node from scratch, essentially dropping the state of the previous incarnation 
of the node, that I was trying to point out that could be a problem. Resolving 
names again as this patch proposes is fine. We should get this one in.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-10-16 Thread Ramya Bharathi Nimmagadda (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14174502#comment-14174502
 ] 

Ramya Bharathi Nimmagadda commented on ZOOKEEPER-1506:
--

Thanks Flavio.
[~michim] Would you be able to submit a new patch with unit tests included? 
Thanks





 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.4.7, 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-09-16 Thread Joshua Buss (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135508#comment-14135508
 ] 

Joshua Buss commented on ZOOKEEPER-1506:


This is very, very annoying.  Please get the fix into master as soon as 
possible. Thanks from another heavy user of cloud services (this affects any 
environment where you use the same hostnames for configuration purposes but 
dynamically re-build instances which get new IPs).

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-09-16 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135537#comment-14135537
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

I think there is a deeper problem here that I'm worried about. If you point a 
name to a different server starting from scratch, then it is like the server 
state has been wiped out from the perspective of the ZK replication, which can 
lead to state loss in some corner cases. One way around this is to use 
reconfiguration: remove and re-introduce the server. Is something like this 
doable for you?   

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-09-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14135572#comment-14135572
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch
  against trunk revision 1623916.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 2.0.3) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2337//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.1

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-06-07 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020955#comment-14020955
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12648826/ZOOKEEPER-1506.patch
  against trunk revision 1600481.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2125//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-06-06 Thread MUFEED USMAN (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019799#comment-14019799
 ] 

MUFEED USMAN commented on ZOOKEEPER-1506:
-

Nope. I do not have any patch installed. Happened to hit this JIRA during my 
search to understand the working relationship between ZooKeeper and DNS when 
the service was disrupted and did not have an automatic recovery.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-06-04 Thread MUFEED USMAN (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018482#comment-14018482
 ] 

MUFEED USMAN commented on ZOOKEEPER-1506:
-

In the case where no IPs change, but a mere DNS outage occurs (for some 
unforeseen reason); shouldn't the ZKs be able to able to make use of the cached 
info and restore connectivity once the DNS is back up?

In the case I'm handling I faced the following:

2014-02-06 20:22:54,438 [myid:0] - INFO  
[WorkerReceiver[myid=0]:FastLeaderElection@542] - Notification: 0 (n.leader), 
0x20171 (n.zxid), 0x1 (n.round), LOOKING (n.state), 0 (n.sid), 0x2 (
n.peerEPoch), LOOKING (my state)
2014-02-06 20:22:54,451 [myid:0] - WARN  
[WorkerSender[myid=0]:QuorumCnxManager@368] - Cannot open channel to 1 at 
election address node02:3888
java.net.UnknownHostException: node02

And only ZK restarts restored services.


 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-06-04 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018500#comment-14018500
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


Hi Mufeed,

Good point, I can update this.addr and this.electionAddr only if the 
InetSocketAddress constructor actually resolved the hostname.

By the way, are you running ZooKeeper with this patch applied? Does the patch 
fix the original problem?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-16 Thread Diwaker Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993789#comment-13993789
 ] 

Diwaker Gupta commented on ZOOKEEPER-1506:
--

This is a severe issue in any environment that uses DNSMasq or relies on DHCP 
for DNS mappings. Would really like to see this fixed for 3.5.0, or better yet, 
the next bugfix release in 3.4.x if there's going to be one.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-14 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997571#comment-13997571
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

[~diwaker], is the patch here good for you, have you had a chance to review it?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986397#comment-13986397
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
  against trunk revision 1591175.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2075//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-01 Thread Michael Lasevich (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986773#comment-13986773
 ] 

Michael Lasevich commented on ZOOKEEPER-1506:
-

[~michim] Thanks for taking this over, I clearly lost track of it. For what its 
worth, we have been running this patch in production for almost a year with no 
issues but it would be nice to have it properly merged. Thank you.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-01 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986822#comment-13986822
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


Thanks Michael, it's good to know that you've been using this patch without an 
issue.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986872#comment-13986872
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
  against trunk revision 1591175.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2076//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-01 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987036#comment-13987036
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


I've been running FollowerResyncConcurrencyTest for a while, but I can't 
reproduce the failure. Maybe it's a transient failure.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-05-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987094#comment-13987094
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
  against trunk revision 1591175.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2077//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-04-30 Thread Michi Mutsuzaki (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986164#comment-13986164
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-1506:


I'd like to get this fixed in 3.5. I'll rebase the patch and address Flavio's 
comments. I'm still not sure how we should test this though.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986226#comment-13986226
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12642754/ZOOKEEPER-1506.patch
  against trunk revision 1591175.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2072//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-04-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986373#comment-13986373
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12642801/ZOOKEEPER-1506.patch
  against trunk revision 1591175.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2074//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
Priority: Critical
  Labels: patch
 Fix For: 3.5.0

 Attachments: ZOOKEEPER-1506.patch, ZOOKEEPER-1506.patch, 
 zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-03-14 Thread Robert Kamphuis (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934720#comment-13934720
 ] 

Robert Kamphuis commented on ZOOKEEPER-1506:


As this seems not to progress much, some tips for people working in AWS or 
similar enough environments. This is likely not the only, nor the best, but it 
is working for me. 
- configure the zookeeper ensemble servers to connect to the elastic-IP-address 
in stead of the hostname 
- on a serious failure of one of the servers, boot a replacement, and re-assign 
the corresponding elastic-ip to that server. 
- others will reconnect correctly 
- you will need to setup the Security group to explicitly enable the 
interconnect to 2888/3888(/2181) or your ports of choice for the elasticIPs to 
enable the connections to work. 
- downsides: 
-# traffic between zookeeper servers goes via whatever boxes doing the 
elastic-ip to server mapping - bigger latency. My measurements as an example: 
ping using private IPs vs elastic- IPs: 0.8 ms vs 1.4 msec (500 byte packets - 
servers in two different AZs in US-east)
-# you will need to pay for this traffic whereas when using the names which are 
mapped to the internal IPs you would not. 

Also: for the clients, I am using as connect string static DNS records with 
names like: zookeeperN.domain pointing to the 
ec2-A-B-C-D.compute-1.amazonaws.com - thus pointing to the elastic-ip's name 
and not the IPs. These are mapped by EC2 to the active private IPs after 
assigning the elastic-ip to an instance. The clients will be recognised 
properly as from the correct security group(s). No need to add all the client 
IPs - of which I have many, and changing set; just add the clients security 
groups access to the the zookeeper security group.  

BTW: if someone knows of good resources running zookeeper and curator-based 
clients in AWS I would kindly like to know where... 


 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
  Labels: patch
 Fix For: 3.5.0

 Attachments: zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-01-31 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887902#comment-13887902
 ] 

Flavio Junqueira commented on ZOOKEEPER-1506:
-

A few comments and concerns with the current patch:

# What's the point of making the QuorumServer constructors private?
# Please check the spacing, it is not aligned with the other lines. 
# This patch will recreate the socket addresses every time a connection in QCM 
fails to get established. Sometimes this happens because the server is not up 
yet, and in such cases, if the mapping hasn't changed, then the recreation of 
the socket addresses is necessary. Right now, I don't have any great idea to 
get around it, so I just wanted to point it out.
# I was also thinking about how to test this. It would be good to have a test 
case.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
  Labels: patch
 Fix For: 3.5.0

 Attachments: zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-01-30 Thread charitymajors (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886758#comment-13886758
 ] 

charitymajors commented on ZOOKEEPER-1506:
--

This is affecting us too.  This is a pretty terrible bug for anyone trying to 
run zk in the cloud.  :(

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
  Labels: patch
 Fix For: 3.5.0

 Attachments: zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2014-01-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886761#comment-13886761
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch
  against trunk revision 1561672.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1904//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Assignee: Michael Lasevich
  Labels: patch
 Fix For: 3.5.0

 Attachments: zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2013-05-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13650223#comment-13650223
 ] 

Hadoop QA commented on ZOOKEEPER-1506:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12581564/zk-dns-caching-refresh.patch
  against trunk revision 1463329.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/1467//console

This message is automatically generated.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
  Labels: patch
 Attachments: zk-dns-caching-refresh.patch


 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2013-05-01 Thread Daniel Heidebrecht (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647059#comment-13647059
 ] 

Daniel Heidebrecht commented on ZOOKEEPER-1506:
---

I am also seeing this with a 5 node Zookeeper 3.4.5 cluster running in AWS. All 
nodes in the cluster must be restarted.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner

 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2013-05-01 Thread Matt Wise (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13647087#comment-13647087
 ] 

Matt Wise commented on ZOOKEEPER-1506:
--

This is still an ongoing issue. We've tried making changes to the way Java 
handles DNS caching and we've been unable to solve the issue. This really seems 
like a simple thing to fix, and its critical when running Zookeeper in Amazons 
cloud where internal IP addresses change each time you boot up.

Please... somebody fix this!

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner

 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2013-02-05 Thread Matt Wise (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571488#comment-13571488
 ] 

Matt Wise commented on ZOOKEEPER-1506:
--

This is still an ongoing issue. An AWS failure last night caused us to have to 
restart our entire Zookeeper cluster one-node-at-a-time because of this bug. 
Can someone at least set a target date for it?

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Priority: Minor

 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2013-02-05 Thread Thawan Kooburat (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13571550#comment-13571550
 ] 

Thawan Kooburat commented on ZOOKEEPER-1506:


There is also a problem with Java DNS caching. I haven't tried this but you 
might want to check this out
http://stackoverflow.com/questions/1256556/any-way-to-make-java-honor-the-dns-caching-timeout-ttl

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Priority: Minor

 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (ZOOKEEPER-1506) Re-try DNS hostname - IP resolution if node connection fails

2012-12-05 Thread Matt Wise (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511207#comment-13511207
 ] 

Matt Wise commented on ZOOKEEPER-1506:
--

This is also impacting our environment. We run our entire infrastructure in the 
cloud, and use a set of static (but short TTL'd) hostnames to point to our 
Zookeeper instances. We have the exact same concern ... that right now it 
requires a restart of the Zookeeper application on each system to trigger a 
're-lookup' of the DNS record.

This seems like it should be customizable with a simple option. If 
'dns_lookup_on_connect' is turned on, it should do a new DNS lookup for every 
connection to a ensemble server. If its disabled, it can lookup the DNS once 
and only once. Tools like 'stunnel' have this behavior and its great in the 
cloud.

 Re-try DNS hostname - IP resolution if node connection fails
 -

 Key: ZOOKEEPER-1506
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1506
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.3.5
 Environment: Ubuntu 11.04 64-bit
Reporter: Mike Heffner
Priority: Minor

 In our zoo.cfg we use hostnames to identify the ZK servers that are part of 
 an ensemble. These hostnames are configured with a low (= 60s) TTL and the 
 IP address they map to can and does change. Our procedure for 
 replacing/upgrading a ZK node is to boot an entirely new instance and remap 
 the hostname to the new instance's IP address. Our expectation is that when 
 the original ZK node is terminated/shutdown, the remaining nodes in the 
 ensemble would reconnect to the new instance.
 However, what we are noticing is that the remaining ZK nodes do not attempt 
 to re-resolve the hostname-IP mapping for the new server. Once the original 
 ZK node is terminated, the existing servers continue to attempt contacting it 
 at the old IP address. It would be great if the ZK servers could try to 
 re-resolve the hostname when attempting to connect to a lost ZK server, 
 instead of caching the lookup indefinitely. Currently we must do a rolling 
 restart of the ZK ensemble after swapping a node -- which at three nodes 
 means we periodically lose quorum.
 The exact method we are following is to boot new instances in EC2 and attach 
 one, of a set of three, Elastic IP address. External to EC2 this IP address 
 remains the same and maps to whatever instance it is attached to. Internal to 
 EC2, the elastic IP hostname has a TTL of about 45-60 seconds and is remapped 
 to the internal (10.x.y.z) address of the instance it is attached to. 
 Therefore, in our case we would like ZK to pickup the new 10.x.y.z address 
 that the elastic IP hostname gets mapped to and reconnect appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira