[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user ijuma commented on the issue: https://github.com/apache/zookeeper/pull/451 Really happy to see this in a released version, thanks everyone. @anmolnar after testing this with Apache Kafka, it seems like the combination of a reasonably long backoff (1 second) with randomization when the list is small can cause connection timeouts in some cases that one would not expect. The concrete example we saw was usage of localhost which caused ipv4 and ipv6 addresses to be resolved while the only the ipv4 one worked. We had tests that would fail quite often with a connection timeout of 6 seconds because the ipv6 one would be picked continuously. @rajinisivaram filed a ZooKeeper issue to track a potential future improvement to this logic: https://issues.apache.org/jira/browse/ZOOKEEPER-3100 ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user ijuma commented on the issue: https://github.com/apache/zookeeper/pull/451 Also, as I'm sure many are wondering, a discussion about the next release including this fix is taking place in the mailing list: https://lists.apache.org/thread.html/8d3dcd84b0ad47563e2cf2619eb0775066ff975453865938dd4e7380@%3Cdev.zookeeper.apache.org%3E ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user ijuma commented on the issue: https://github.com/apache/zookeeper/pull/451 Thanks @anmolnar, @fpj and @hanm for getting this in. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 Thanks @hanm ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/451 Committed to branch-3.4. Please close the pull request @anmolnar. Please port it to branch-3.5 and master (if you like :-), and create necessary JIRAs for auxiliary tasks (documentation update, etc). ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @hanm I updated the pull request according to your suggestions. Please take a look. I also updated javadoc accordingly. ZooKeeper docs could be updated in a separate PR as suggested. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/451 The patch looks good, I think we just need to get the public API change issue settled before we can merge this in. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/451 There are also some initial comments about JVM DNS caching >> Re-resolving at StaticHostProvider level may not be sufficient as InetAddress.getAllByName(String host) itself uses a Java-level cache inside InetAddress and turns to name service (e.g. DNS) only if the host could not be found in the Java-level cache. I think it's a good point that we brought this up - and I think we can't do much here as there are multiple level of DNS caching not just in JVM but also in OS, switches, etc, so the best we could do here is to make sure we re-resolve address when necessary, as this pull request is doing, and meanwhile let user know this might not work in the end due to multiple level of DNS caching. In that case, I think some links to the Oracle documents about JVM DNS caching would be helpful so users don't blame ZooKeeper if things don't work, but I think we can do that documentation update separately after this pull request is merged. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user hanm commented on the issue: https://github.com/apache/zookeeper/pull/451 I am reviewing this and will merge before Memorial Day if no other issues. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 I agree with @ijuma @afine You were involved in this patch too. Are you willing to merge it in @fpj 's absence? ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user ijuma commented on the issue: https://github.com/apache/zookeeper/pull/451 If there are no other concerns, it would be great for this to be merged. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @fpj I think this patch is ready for merging as it is. Are you still having concerns? ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @afine @ijuma I've finished refactoring of StaticHostProvider. The implementation follows that I explained in my email as **Option-3**: > - Do not cache IPs, > - Shuffle the names and resolve with getAllByName() every time when next() is called, > - Use getAllByName(), but shuffle the list and return the first IP to properly handle hostnames associated with multiple IPs, > - JDK's built-in caching will prevent name servers from being flooded and will do the re-resolution automatically when cache expires, The `Resolver` interface which is also introduced in this patch is the solution for the problem that @afine raised: to properly mock out the `getAllByName()` call in unit tests. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @ijuma I feel your pain. :) No worries, I'm on it. Doing my best to push committers and others to review changes. Additionally I'd like to make a small refactoring to the proposed logic before committing, because I'm not entirely convinced about this manual caching/shuffling logic that was implemented originally. You can see the details in one of my comments above and on the `dev` mailing list. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user ijuma commented on the issue: https://github.com/apache/zookeeper/pull/451 @anmolnar, it's been more than 1 month since the last comment on this PR. Is there anything that still needs to be addressed? The original PR was submitted in January 2017 and it got stalled after a while, I'm hoping the same doesn't happen here. :) ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @ijuma Sure, no problem. I'm waiting for some feedback from the community here and on the mailing list and I hope I can commit soon. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user ijuma commented on the issue: https://github.com/apache/zookeeper/pull/451 Thanks for picking this up @anmolnar, looking forward to this being fixed. :) ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 Going one step back I wonder why we try to deal with multiple addresses at all. HostProvider should just make a transformation from unresolved InetSocketAddresses to resolved InetSocketAddresses. The easiest way as I can see it is to create a new instance of InetSocketAddress if the input is unresolved every time `next()` is called. Otherwise just pass it through. JVM will deal with the rest: resolution, caching and re-resolution once the cache is expires (30 secs). ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @jeffwidman > Should this PR be targeting branch-3.4 or target trunk and then backport to the 3.4 series? The original PR targets 3.4 which is explained in a comment from @fpj on the jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2184?focusedCommentId=15823099=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15823099 > For the 3.5 branch, we will need a different patch because of the reconfiguration changes to StaticHostProvider. I'll work on it once the 3.4 patch gets a +1. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user jeffwidman commented on the issue: https://github.com/apache/zookeeper/pull/451 Should this PR be targeting `branch-3.4` or target `trunk` and then backport to the 3.4 series? ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @phunt @afine Did you have a chance to take a look? I think we've addressed all issues that were mentioned in the original PR. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user mfenes commented on the issue: https://github.com/apache/zookeeper/pull/451 Looking at the static initialization block in InetAddressCachePolicy more deeply, the default TTL is 30 seconds if there is no SecurityManager installed. So caching a positive lookup forever in the Java-level cache is the default only if there is a SecurityManager installed and the TTL is not overridden by "networkaddress.cache.ttl" to a different value. Default caching policy for a negative lookup is 0 (never cache). Now the only question is whether 30 seconds default caching is ok or too much for ZK. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 Just confirmed on 3.4 branch: ZK uses 30 secs cache TTL on my mac. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user anmolnar commented on the issue: https://github.com/apache/zookeeper/pull/451 @mfenes The only solution I can think of is to set DNS cache TTL `networkaddress.cache.ttl` to a configurable, non-infinite value. ---
[GitHub] zookeeper issue #451: ZOOKEEPER-2184: Zookeeper Client should re-resolve hos...
Github user mfenes commented on the issue: https://github.com/apache/zookeeper/pull/451 Re-resolving at StaticHostProvider level may not be sufficient as InetAddress.getAllByName(String host) itself uses a Java-level cache inside InetAddress and turns to name service (e.g. DNS) only if the host could not be found in the Java-level cache. Unfortunately, when Java resolves a new host using the name service, it puts the host and its addresses in the cache with TTL cache FOREVER. This means, once a host gets resolved by Java, it will never again turn to the name service to re-resolve it. If a host's addresses get updated in DNS, the address cache in Java will still contain the old entry forever. So re-resolving at StaticHostProvider won't help in this case, as InetAddress.getAllByName(String host) will still return the old address(es) I think. Check the getCachedAddresses method inside InetAddress, the get() method of static final class Cache inside InetAddress and sun.net.InetAddressCachePolicy.get() which returns cachePolicy with default value -1 (FOREVER) if it is not overridden by Security properties "networkaddress.cache.ttl" and "networkaddress.cache.negative.ttl". ---