ZooKeeper_branch34_jdk7 - Build # 1912 - Failure

2018-05-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_jdk7/1912/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 38.93 KB...]
[junit] Running org.apache.zookeeper.test.RepeatStartupTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.682 sec
[junit] Running org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.833 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.758 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.632 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.072 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.463 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.562 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.482 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.073 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.443 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.65 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.615 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.857 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.578 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.252 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.516 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.986 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.311 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.083 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.776 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.17 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.176 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.55 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/build.xml:1382: 
The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_jdk7/build.xml:1385: 
Tests failed!

Total time: 29 minutes 19 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testFailedTxnAsPartOfQuorumLoss

Error Message:
KeeperErrorCode = Session expired for /zk2

Stack Trace:
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired for /zk2
at org.apache.zookeeper.KeeperException.create(KeeperException.java:130)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54)
at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:)
at 

[jira] [Commented] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-05-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468194#comment-16468194
 ] 

Hudson commented on ZOOKEEPER-2982:
---

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #15 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/15/])
ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution if node connection (fpj: 
rev b1f6279a8c4708d1df7dd1128dc4fdf41fc7e24a)
* (edit) src/java/main/org/apache/zookeeper/server/quorum/Learner.java


> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper-trunk - Build # 15 - Failure

2018-05-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/15/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 140.54 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.597 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 4
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.852 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
1
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.088 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.49 sec, Thread: 2, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 1
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.574 sec, Thread: 2, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.718 sec, Thread: 2, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 2
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.76 sec, Thread: 2, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.069 sec, Thread: 2, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.645 sec, Thread: 2, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 2
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
12.233 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 4
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
70.974 sec, Thread: 3, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
5.882 sec, Thread: 2, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.073 sec, Thread: 2, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 2
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.993 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 3
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.085 sec, Thread: 3, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.806 sec, Thread: 1, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.681 sec, Thread: 1, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
84.101 sec, Thread: 6, Class: org.apache.zookeeper.test.QuorumTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
8.364 sec, Thread: 3, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 
17.983 sec, Thread: 4, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Test org.apache.zookeeper.test.WatchEventWhenAutoResetTest FAILED
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.398 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 104, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
329.4 sec, Thread: 5, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
255.292 sec, 

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Andor Molnar
Let's put it this way:

1- The interpretation of next() is "give me the next server's IP address
that I should connect to and do whatever preparation (resolution) is
necessary to do so".
The answer could be:
a) I'm done, here it is,
b) there's a problem with the next item in the list, for instance it's
unreasolvable (or else), please deal with it

When it returns, the caller can decide if it wants to advance to the next
item (call next() again) or exit with a fatal error (list items must always
be resolvable) or exit because it doesn't need to connect anymore (client
closed the socket). (...and there could be a whole bunch of other possible
cases...)

So, to wrap up, if you take a look at the retry logic already built in
ClientCnxn, the while-try-catch block is 100+ lines long. First, it already
has the retry logic, second we don't want to go down the road of copying
part of the logic to the HostProvider's level. Reliability, testability,
corner-cases, etc, etc.

What do you think?






On Tue, May 8, 2018 at 1:55 PM, Flavio Junqueira  wrote:

> The refactoring did not seem justifiable at first, so my reaction to it.
> You have clarified the reason for including the changes, and I actually
> like it.
>
> About the exception, there are two points for me:
>
> 1- You don't really need to throw to execute the plan you described.
> 2- In the case we do throw, which is not entirely unreasonable, I'd think
> about the expected behaviour of a method called "next()". In general, I'd
> expect it to either return me the next item or error saying that it cannot
> return an item. The "UnknownHostException" is not doing the latter, though.
> It is indicating that one of the elements of the host provider set is
> "broken" (not resolvable). That's one interpretation. Another
> interpretation is that the logic in "next" needs to indicate to the caller
> that there is a problem with an item in the set of elements of the host
> provider.
>
> For (2) let's converge on what semantics we are trying to provide first,
> please.
>
> -Flavio
>
> > On 8 May 2018, at 21:20, Andor Molnar  wrote:
> >
> > Sorry, I thought you were against the whole refactoring.
> >
> > "2- That we discuss separately whether we want to change the behaviour of
> > the next()in the HostProvider interface."
> >
> > From this it seemed to me, it's not just a polishing issue, but maybe
> I've
> > gotten you wrong.
> > Anyway, there're 2 contention points we've encountered so far:
> >
> > 1. Do we need to refactor at all?
> > 2. If we do so, shall next() throw UnknownHostException or deal with
> error
> > inside.
> >
> > And I'd still go with:
> > 1. Yes
> > 2. Yes, throw
> >
> > So, I would leave the PR as it is now.
> >
> > Andor
> >
> >
> >
> >
> >
> > On Tue, May 8, 2018 at 12:11 PM, Flavio Junqueira 
> wrote:
> >
> >> Can you list what the contention points are according to you? Feel free
> to
> >> do that in the PR as well, I want to make sure we have the same
> >> understanding of the points that still need to be resolved. From where I
> >> stand, there is no major issue pending other than one polishing issue
> that
> >> I brought upon in the PR.
> >>
> >> -Flavio
> >>
> >>> On 8 May 2018, at 21:08, Andor Molnar  wrote:
> >>>
> >>> I'm happy to do that once we have an agreement.
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira 
> wrote:
> >>>
>  It might be a good idea to document whatever we end up doing.
> 
>  -Flavio
> 
> > On 8 May 2018, at 17:22, Andor Molnar  wrote:
> >
> > "If refactoring is necessary to address the issue, then it should be
> >> part
> > of the PR."
> >
> > Agreed. I think it is and it also makes things a lot more simpler, so
>  it's
> > easier to review.
> >
> > "I'm not sure what kind of confirmation you are after here. Could you
> > elaborate, please?"
> >
> > I'm just wondering what could have been the reason for caching host
> > addresses in StaticHostProvider in the first place.
> >
> > "The other solution, if I remember enough of it, tried to avoid
> >> resolving
> > unnecessarily, but perhaps the DNS client cache is enough..."
> >
> > That's exactly my point: what JDK is doing internally is perfectly
> fine
>  for
> > us and we don't need extra logic here.
> >
> > "Could you elaborate on this point, please? What exactly do you mean
> >> with
> > this statement?"
> >
> > See my previous point. Caching is already done in all DNS servers in
> >> the
> > chain and also there's also caching in JDK already, which means by
>  calling
> > getByName() you don't necessarily fire a DNS request, only when the
> TTL
>  is
> > expired.
> >
> > Andor
> >
> >
> >
> >
> > On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira 

Re: Question on merge script

2018-05-08 Thread Michael Han
Hi Flavio,

The merge script is branch agnostic - it only cares about the pull request
number. As long as in the pull request the correct target branch is
specified, the merge script will do its job by merging the change to the
specified target branch. I guess we could commit the same script to
branch-3.5 but the current script in master should be able to do what you
asked.

On Tue, May 8, 2018 at 4:06 PM, Flavio Junqueira  wrote:

> Could anyone remind me why we don't have the merge script on branch-3.5?
> Say I have a change that targets branch-3.5 alone. Shouldn't I be able to
> have a PR that targets branch-3.5 and use the merge script?
>
> Thanks,
> -Flavio




-- 
Cheers
Michael


[GitHub] zookeeper pull request #468: ZOOKEEPER-2982: Re-try DNS hostname -> IP resol...

2018-05-08 Thread EronWright
Github user EronWright closed the pull request at:

https://github.com/apache/zookeeper/pull/468


---


Success: ZOOKEEPER- PreCommit Build #1667

2018-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 40.24 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16468095 added to ZOOKEEPER-3012.
 [exec] Session logged out. Session was 
JSESSIONID=2909EE1CC999ACD6DAEA2FFEC9D4FB5B.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 35 minutes 37 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3012
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-3012) Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders

2018-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468095#comment-16468095
 ] 

Hadoop QA commented on ZOOKEEPER-3012:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1667//console

This message is automatically generated.

> Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders
> 
>
> Key: ZOOKEEPER-3012
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3012
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server, tests
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>  Labels: unit-test
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> The following arrange methods uses hard coded values:
> {noformat}
> when(configMock.getDataDir()).thenReturn("/tmp/zookeeper");
> when(configMock.getDataLogDir()).thenReturn("/tmp/zookeeperLog");
> {noformat}
> Which makes the test fail if the folders exist on the running machine.
> Random test folders should be created and removed during cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Question on merge script

2018-05-08 Thread Flavio Junqueira
Could anyone remind me why we don't have the merge script on branch-3.5? Say I 
have a change that targets branch-3.5 alone. Shouldn't I be able to have a PR 
that targets branch-3.5 and use the merge script?

Thanks,
-Flavio

[GitHub] zookeeper issue #513: ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution i...

2018-05-08 Thread fpj
Github user fpj commented on the issue:

https://github.com/apache/zookeeper/pull/513
  
The reviews and approvals for this change are in pull request #468 . The 
intention of this PR was to rebase it to master, while PR #468 targeted 
branch-3.5.


---


[GitHub] zookeeper issue #468: ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution i...

2018-05-08 Thread fpj
Github user fpj commented on the issue:

https://github.com/apache/zookeeper/pull/468
  
Merged to branch 3.5 as part of merging pull request #513, which targeted 
master.


---


[jira] [Resolved] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-05-08 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira resolved ZOOKEEPER-2982.
-
Resolution: Fixed

Issue resolved by pull request 513
[https://github.com/apache/zookeeper/pull/513]

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.6.0, 3.5.4
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #513: ZOOKEEPER-2982: Re-try DNS hostname -> IP resol...

2018-05-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/513


---


Success: ZOOKEEPER- PreCommit Build #1666

2018-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1666/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 79.69 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1666//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1666//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1666//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16468058 added to ZOOKEEPER-3012.
 [exec] Session logged out. Session was 
JSESSIONID=5CF47CC7216226BA6BDA6E9E67CEE2F1.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD SUCCESSFUL
Total time: 21 minutes 13 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3012
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper pull request #515: ZOOKEEPER-3012. Fix unit test: testDataDirAndDa...

2018-05-08 Thread anmolnar
GitHub user anmolnar opened a pull request:

https://github.com/apache/zookeeper/pull/515

ZOOKEEPER-3012. Fix unit test: testDataDirAndDataLogDir should not use 
hardcode test folders

Replaced by creating random folders.

https://issues.apache.org/jira/browse/ZOOKEEPER-3012


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-3012_34

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #515


commit bf8200cffccee482ac39d56114a914cd52144bfe
Author: Andor Molnar 
Date:   2018-05-08T22:30:12Z

ZOOKEEPER-3012. Replaced hardcoded folder names with creating tmp 
directories




---


[GitHub] zookeeper pull request #514: ZOOKEEPER-3012. Fix unit test: testDataDirAndDa...

2018-05-08 Thread anmolnar
GitHub user anmolnar opened a pull request:

https://github.com/apache/zookeeper/pull/514

ZOOKEEPER-3012. Fix unit test: testDataDirAndDataLogDir should not use 
hardcode test folders

A little bit more than just fixing the hardcoded folder names. Because the 
original issue was only on the 3.4 branch, the unit test has also been added to 
that branch only.

In this PR I add the fixed version of test to the master branch. Should be 
committed to 3.5 as well and I'll create a separate PR for 3.4

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-3012

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/514.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #514


commit 15ff7951701a55d2b9e94af6d4dd73288a2c51a4
Author: Andor Molnar 
Date:   2018-05-08T22:04:23Z

ZOOKEEPER-3012. Added unit test to validate dataDir and dataLogDir params




---


Success: ZOOKEEPER- PreCommit Build #1665

2018-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 79.02 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16468019 added to ZOOKEEPER-2982.
 [exec] Session logged out. Session was 
JSESSIONID=98FCB33386E69426E532E964EED13F88.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 18 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-2982
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468019#comment-16468019
 ] 

Hadoop QA commented on ZOOKEEPER-2982:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1665//console

This message is automatically generated.

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #468: ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution i...

2018-05-08 Thread EronWright
Github user EronWright commented on the issue:

https://github.com/apache/zookeeper/pull/468
  
Opened a PR for rebasing onto master: #513.



---


[GitHub] zookeeper pull request #513: ZOOKEEPER-2982: Re-try DNS hostname -> IP resol...

2018-05-08 Thread EronWright
GitHub user EronWright opened a pull request:

https://github.com/apache/zookeeper/pull/513

ZOOKEEPER-2982: Re-try DNS hostname -> IP resolution if node connection 
fails

This PR ports a fix from the 3.4 to the master branch, that was originally 
made in ZOOKEEPER-1506.

Closes ZOOKEEPER-2982.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/EronWright/zookeeper ZOOKEEPER-2982-master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #513


commit 485d71b9b776417590615e63fb045474e78898de
Author: Eron Wright 
Date:   2018-02-19T23:05:44Z

ZOOKEEPER-2982 Re-try DNS hostname -> IP resolution if node connection fails




---


Re: Name resolution in StaticHostProvider

2018-05-08 Thread Flavio Junqueira
The refactoring did not seem justifiable at first, so my reaction to it. You 
have clarified the reason for including the changes, and I actually like it.

About the exception, there are two points for me:

1- You don't really need to throw to execute the plan you described.
2- In the case we do throw, which is not entirely unreasonable, I'd think about 
the expected behaviour of a method called "next()". In general, I'd expect it 
to either return me the next item or error saying that it cannot return an 
item. The "UnknownHostException" is not doing the latter, though. It is 
indicating that one of the elements of the host provider set is "broken" (not 
resolvable). That's one interpretation. Another interpretation is that the 
logic in "next" needs to indicate to the caller that there is a problem with an 
item in the set of elements of the host provider. 

For (2) let's converge on what semantics we are trying to provide first, please.

-Flavio

> On 8 May 2018, at 21:20, Andor Molnar  wrote:
> 
> Sorry, I thought you were against the whole refactoring.
> 
> "2- That we discuss separately whether we want to change the behaviour of
> the next()in the HostProvider interface."
> 
> From this it seemed to me, it's not just a polishing issue, but maybe I've
> gotten you wrong.
> Anyway, there're 2 contention points we've encountered so far:
> 
> 1. Do we need to refactor at all?
> 2. If we do so, shall next() throw UnknownHostException or deal with error
> inside.
> 
> And I'd still go with:
> 1. Yes
> 2. Yes, throw
> 
> So, I would leave the PR as it is now.
> 
> Andor
> 
> 
> 
> 
> 
> On Tue, May 8, 2018 at 12:11 PM, Flavio Junqueira  wrote:
> 
>> Can you list what the contention points are according to you? Feel free to
>> do that in the PR as well, I want to make sure we have the same
>> understanding of the points that still need to be resolved. From where I
>> stand, there is no major issue pending other than one polishing issue that
>> I brought upon in the PR.
>> 
>> -Flavio
>> 
>>> On 8 May 2018, at 21:08, Andor Molnar  wrote:
>>> 
>>> I'm happy to do that once we have an agreement.
>>> 
>>> 
>>> 
>>> 
>>> On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira  wrote:
>>> 
 It might be a good idea to document whatever we end up doing.
 
 -Flavio
 
> On 8 May 2018, at 17:22, Andor Molnar  wrote:
> 
> "If refactoring is necessary to address the issue, then it should be
>> part
> of the PR."
> 
> Agreed. I think it is and it also makes things a lot more simpler, so
 it's
> easier to review.
> 
> "I'm not sure what kind of confirmation you are after here. Could you
> elaborate, please?"
> 
> I'm just wondering what could have been the reason for caching host
> addresses in StaticHostProvider in the first place.
> 
> "The other solution, if I remember enough of it, tried to avoid
>> resolving
> unnecessarily, but perhaps the DNS client cache is enough..."
> 
> That's exactly my point: what JDK is doing internally is perfectly fine
 for
> us and we don't need extra logic here.
> 
> "Could you elaborate on this point, please? What exactly do you mean
>> with
> this statement?"
> 
> See my previous point. Caching is already done in all DNS servers in
>> the
> chain and also there's also caching in JDK already, which means by
 calling
> getByName() you don't necessarily fire a DNS request, only when the TTL
 is
> expired.
> 
> Andor
> 
> 
> 
> 
> On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira 
>> wrote:
> 
>> Hi Andor,
>> 
>> Thanks for your work on addressing the issue.
>> 
>>> On 8 May 2018, at 16:06, Andor Molnar  wrote:
>>> 
>>> Hi,
>>> 
>>> Updating this thread, because the PR is still being review on GitHub.
>>> 
>>> So, the reason why I refactored the original behaviour of
>>> StaticHostProvider is that I believe that it's trying to do something
>> which
>>> is not its responsibility.
>> 
>> If refactoring is necessary to address the issue, then it should be
>> part
>> of the PR. Otherwise, it is better to refactor in a separate PR.
>> 
>> 
>>> Please tell me if there's a good historical
>>> reason for that.
>> 
>> I'm not sure what kind of confirmation you are after here. Could you
>> elaborate, please?
>> 
>>> 
>>> My approach is giving the user the following to options:
>>> 1- Use static IP addresses, if you don't want to deal with DNS
 resolution
>>> at all - we guarantee that no DNS logic will involved in this case at
>> all.
>> 
>> Sounds fine.
>> 
>>> 2- Use DNS hostnames if you have a reliable DNS service for
>> resolution
>>> (with HA, secondary servers, 

[jira] [Updated] (ZOOKEEPER-1159) ClientCnxn does not propagate session expiration indication

2018-05-08 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-1159:

Fix Version/s: (was: 3.5.3)
   (was: 3.6.0)
   3.4.0

> ClientCnxn does not propagate session expiration indication
> ---
>
> Key: ZOOKEEPER-1159
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1159
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.0
>Reporter: Andrew Purtell
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.4.0
>
>
> ClientCnxn does not always propagate session expiration indication up to 
> clients. If a reconnection attempt fails because the session has since 
> expired, the KeeperCode is still Disconnected, but shouldn't it be set to 
> Expired? Perhaps like so:
> {code}
> --- a/src/java/main/org/apache/zookeeper/ClientCnxn.java
> +++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java
> @@ -1160,6 +1160,7 @@ public class ClientCnxn {
>  clientCnxnSocket.doTransport(to, pendingQueue, 
> outgoingQueue);
>  
>  } catch (Exception e) {
> +Event.KeeperState eventState = 
> Event.KeeperState.Disconnected;
>  if (closing) {
>  if (LOG.isDebugEnabled()) {
>  // closing so this is expected
> @@ -1172,6 +1173,7 @@ public class ClientCnxn {
>  // this is ugly, you have a better way speak up
>  if (e instanceof SessionExpiredException) {
>  LOG.info(e.getMessage() + ", closing socket 
> connection");
> +eventState = Event.KeeperState.Expired;
>  } else if (e instanceof SessionTimeoutException) {
>  LOG.info(e.getMessage() + RETRY_CONN_MSG);
>  } else if (e instanceof EndOfStreamException) {
> @@ -1191,7 +1193,7 @@ public class ClientCnxn {
>  if (state.isAlive()) {
>  eventThread.queueEvent(new WatchedEvent(
>  Event.EventType.None,
> -Event.KeeperState.Disconnected,
> +eventState,
>  null));
>  }
>  clientCnxnSocket.updateNow();
> {code}
> This affects HBase. HBase master and region server processes will shut down 
> by design if their session has expired, but will attempt to reconnect if they 
> think they have been disconnected. The above prevents proper termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-1159) ClientCnxn does not propagate session expiration indication

2018-05-08 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar resolved ZOOKEEPER-1159.
-
   Resolution: Won't Fix
Fix Version/s: (was: 3.5.4)
   3.5.3

> ClientCnxn does not propagate session expiration indication
> ---
>
> Key: ZOOKEEPER-1159
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1159
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: java client
>Affects Versions: 3.4.0
>Reporter: Andrew Purtell
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.6.0, 3.5.3
>
>
> ClientCnxn does not always propagate session expiration indication up to 
> clients. If a reconnection attempt fails because the session has since 
> expired, the KeeperCode is still Disconnected, but shouldn't it be set to 
> Expired? Perhaps like so:
> {code}
> --- a/src/java/main/org/apache/zookeeper/ClientCnxn.java
> +++ b/src/java/main/org/apache/zookeeper/ClientCnxn.java
> @@ -1160,6 +1160,7 @@ public class ClientCnxn {
>  clientCnxnSocket.doTransport(to, pendingQueue, 
> outgoingQueue);
>  
>  } catch (Exception e) {
> +Event.KeeperState eventState = 
> Event.KeeperState.Disconnected;
>  if (closing) {
>  if (LOG.isDebugEnabled()) {
>  // closing so this is expected
> @@ -1172,6 +1173,7 @@ public class ClientCnxn {
>  // this is ugly, you have a better way speak up
>  if (e instanceof SessionExpiredException) {
>  LOG.info(e.getMessage() + ", closing socket 
> connection");
> +eventState = Event.KeeperState.Expired;
>  } else if (e instanceof SessionTimeoutException) {
>  LOG.info(e.getMessage() + RETRY_CONN_MSG);
>  } else if (e instanceof EndOfStreamException) {
> @@ -1191,7 +1193,7 @@ public class ClientCnxn {
>  if (state.isAlive()) {
>  eventThread.queueEvent(new WatchedEvent(
>  Event.EventType.None,
> -Event.KeeperState.Disconnected,
> +eventState,
>  null));
>  }
>  clientCnxnSocket.updateNow();
> {code}
> This affects HBase. HBase master and region server processes will shut down 
> by design if their session has expired, but will attempt to reconnect if they 
> think they have been disconnected. The above prevents proper termination.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: ZOOKEEPER- PreCommit Build #1664

2018-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 79.30 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 7 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16467905 added to ZOOKEEPER-2959.
 [exec] Session logged out. Session was 
JSESSIONID=7689382D06FE0BEA3C16B20E8D2F2376.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 35 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-2959
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467905#comment-16467905
 ] 

Hadoop QA commented on ZOOKEEPER-2959:
--

+1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 7 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1664//console

This message is automatically generated.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #512: ZOOKEEPER-2959: ignore accepted epoch and LEADE...

2018-05-08 Thread lavacat
GitHub user lavacat opened a pull request:

https://github.com/apache/zookeeper/pull/512

ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers

https://issues.apache.org/jira/browse/ZOOKEEPER-2959

- added getVotingMembers id check in getEpochToPropose and waitForEpochAck
- removed unused learnerType param in waitForNewLeaderAck
- unit tests
- refactored common test helpers into ZabUtils

credit: Xiang Yongqiang (https://github.com/xyq000) for original PR and 
reporting the issue

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lavacat/zookeeper master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/512.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #512


commit 08d574237955865c9e474d6be93ed924d779d9ae
Author: Bogdan Kanivets 
Date:   2018-04-20T00:02:59Z

ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers

https://issues.apache.org/jira/browse/ZOOKEEPER-2959

- added getVotingMembers id check in getEpochToPropose and waitForEpochAck
- removed unused learnerType param in waitForNewLeaderAck
- unit tests
- refactored common test helpers into ZabUtils

credit: Xiang Yongqiang (https://github.com/xyq000) for original PR and 
reporting the issue




---


Re: Name resolution in StaticHostProvider

2018-05-08 Thread Andor Molnar
Sorry, I thought you were against the whole refactoring.

"2- That we discuss separately whether we want to change the behaviour of
the next()in the HostProvider interface."

>From this it seemed to me, it's not just a polishing issue, but maybe I've
gotten you wrong.
Anyway, there're 2 contention points we've encountered so far:

1. Do we need to refactor at all?
2. If we do so, shall next() throw UnknownHostException or deal with error
inside.

And I'd still go with:
1. Yes
2. Yes, throw

So, I would leave the PR as it is now.

Andor





On Tue, May 8, 2018 at 12:11 PM, Flavio Junqueira  wrote:

> Can you list what the contention points are according to you? Feel free to
> do that in the PR as well, I want to make sure we have the same
> understanding of the points that still need to be resolved. From where I
> stand, there is no major issue pending other than one polishing issue that
> I brought upon in the PR.
>
> -Flavio
>
> > On 8 May 2018, at 21:08, Andor Molnar  wrote:
> >
> > I'm happy to do that once we have an agreement.
> >
> >
> >
> >
> > On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira  wrote:
> >
> >> It might be a good idea to document whatever we end up doing.
> >>
> >> -Flavio
> >>
> >>> On 8 May 2018, at 17:22, Andor Molnar  wrote:
> >>>
> >>> "If refactoring is necessary to address the issue, then it should be
> part
> >>> of the PR."
> >>>
> >>> Agreed. I think it is and it also makes things a lot more simpler, so
> >> it's
> >>> easier to review.
> >>>
> >>> "I'm not sure what kind of confirmation you are after here. Could you
> >>> elaborate, please?"
> >>>
> >>> I'm just wondering what could have been the reason for caching host
> >>> addresses in StaticHostProvider in the first place.
> >>>
> >>> "The other solution, if I remember enough of it, tried to avoid
> resolving
> >>> unnecessarily, but perhaps the DNS client cache is enough..."
> >>>
> >>> That's exactly my point: what JDK is doing internally is perfectly fine
> >> for
> >>> us and we don't need extra logic here.
> >>>
> >>> "Could you elaborate on this point, please? What exactly do you mean
> with
> >>> this statement?"
> >>>
> >>> See my previous point. Caching is already done in all DNS servers in
> the
> >>> chain and also there's also caching in JDK already, which means by
> >> calling
> >>> getByName() you don't necessarily fire a DNS request, only when the TTL
> >> is
> >>> expired.
> >>>
> >>> Andor
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira 
> wrote:
> >>>
>  Hi Andor,
> 
>  Thanks for your work on addressing the issue.
> 
> > On 8 May 2018, at 16:06, Andor Molnar  wrote:
> >
> > Hi,
> >
> > Updating this thread, because the PR is still being review on GitHub.
> >
> > So, the reason why I refactored the original behaviour of
> > StaticHostProvider is that I believe that it's trying to do something
>  which
> > is not its responsibility.
> 
>  If refactoring is necessary to address the issue, then it should be
> part
>  of the PR. Otherwise, it is better to refactor in a separate PR.
> 
> 
> > Please tell me if there's a good historical
> > reason for that.
> 
>  I'm not sure what kind of confirmation you are after here. Could you
>  elaborate, please?
> 
> >
> > My approach is giving the user the following to options:
> > 1- Use static IP addresses, if you don't want to deal with DNS
> >> resolution
> > at all - we guarantee that no DNS logic will involved in this case at
>  all.
> 
>  Sounds fine.
> 
> > 2- Use DNS hostnames if you have a reliable DNS service for
> resolution
> > (with HA, secondary servers, backups, etc.) - we must use DNS in the
>  right
> > way in this case e.g. do NOT cache IP address for a longer period
> that
>  DNS
> > server allows to and re-resolve after TTL expries, because it's
> >> mandatory
> > by protocol.
> 
>  Sounds fine.
> 
> >
> > My 2 cents here:
> > - the fix which was originally posted for re-resolution is a
> workaround
>  and
> > doesn't satisfy the requirement for #2,
> 
>  The other solution, if I remember enough of it, tried to avoid
> resolving
>  unnecessarily, but perhaps the DNS client cache is enough to avoid the
>  unnecessary round-trips. This observation actually brings me to the
> next
>  point:
> 
> > - the solution is already built-in in JDK and DNS clients in the
> right
>  way
> 
>  Could you elaborate on this point, please? What exactly do you mean
> with
>  this statement?
> 
> > - can't see a reason why we shouldn't use that
> >
> > I checked this in some other projects as well and found very similar
> > approach in hadoop-common's SecurityUtil.java. It 

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Andor Molnar
I'm happy to do that once we have an agreement.




On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira  wrote:

> It might be a good idea to document whatever we end up doing.
>
> -Flavio
>
> > On 8 May 2018, at 17:22, Andor Molnar  wrote:
> >
> > "If refactoring is necessary to address the issue, then it should be part
> > of the PR."
> >
> > Agreed. I think it is and it also makes things a lot more simpler, so
> it's
> > easier to review.
> >
> > "I'm not sure what kind of confirmation you are after here. Could you
> > elaborate, please?"
> >
> > I'm just wondering what could have been the reason for caching host
> > addresses in StaticHostProvider in the first place.
> >
> > "The other solution, if I remember enough of it, tried to avoid resolving
> > unnecessarily, but perhaps the DNS client cache is enough..."
> >
> > That's exactly my point: what JDK is doing internally is perfectly fine
> for
> > us and we don't need extra logic here.
> >
> > "Could you elaborate on this point, please? What exactly do you mean with
> > this statement?"
> >
> > See my previous point. Caching is already done in all DNS servers in the
> > chain and also there's also caching in JDK already, which means by
> calling
> > getByName() you don't necessarily fire a DNS request, only when the TTL
> is
> > expired.
> >
> > Andor
> >
> >
> >
> >
> > On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira  wrote:
> >
> >> Hi Andor,
> >>
> >> Thanks for your work on addressing the issue.
> >>
> >>> On 8 May 2018, at 16:06, Andor Molnar  wrote:
> >>>
> >>> Hi,
> >>>
> >>> Updating this thread, because the PR is still being review on GitHub.
> >>>
> >>> So, the reason why I refactored the original behaviour of
> >>> StaticHostProvider is that I believe that it's trying to do something
> >> which
> >>> is not its responsibility.
> >>
> >> If refactoring is necessary to address the issue, then it should be part
> >> of the PR. Otherwise, it is better to refactor in a separate PR.
> >>
> >>
> >>> Please tell me if there's a good historical
> >>> reason for that.
> >>
> >> I'm not sure what kind of confirmation you are after here. Could you
> >> elaborate, please?
> >>
> >>>
> >>> My approach is giving the user the following to options:
> >>> 1- Use static IP addresses, if you don't want to deal with DNS
> resolution
> >>> at all - we guarantee that no DNS logic will involved in this case at
> >> all.
> >>
> >> Sounds fine.
> >>
> >>> 2- Use DNS hostnames if you have a reliable DNS service for resolution
> >>> (with HA, secondary servers, backups, etc.) - we must use DNS in the
> >> right
> >>> way in this case e.g. do NOT cache IP address for a longer period that
> >> DNS
> >>> server allows to and re-resolve after TTL expries, because it's
> mandatory
> >>> by protocol.
> >>
> >> Sounds fine.
> >>
> >>>
> >>> My 2 cents here:
> >>> - the fix which was originally posted for re-resolution is a workaround
> >> and
> >>> doesn't satisfy the requirement for #2,
> >>
> >> The other solution, if I remember enough of it, tried to avoid resolving
> >> unnecessarily, but perhaps the DNS client cache is enough to avoid the
> >> unnecessary round-trips. This observation actually brings me to the next
> >> point:
> >>
> >>> - the solution is already built-in in JDK and DNS clients in the right
> >> way
> >>
> >> Could you elaborate on this point, please? What exactly do you mean with
> >> this statement?
> >>
> >>> - can't see a reason why we shouldn't use that
> >>>
> >>> I checked this in some other projects as well and found very similar
> >>> approach in hadoop-common's SecurityUtil.java. It has 2 built-in
> plugins
> >>> for that:
> >>> - Standard resolver uses java's built-in getByName().
> >>> - Qualified resolver still uses getByName(), but adds some logic to
> avoid
> >>> incorrect re-resolutions and reverse IP lookups.
> >>>
> >>> Please let me know your thoughts.
> >>>
> >>> Regards,
> >>> Andor
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar 
> wrote:
> >>>
>  Hi Abe,
> 
>  Unfortunately we haven't got any feedback yet. What do you think of
>  implementing Option #3?
> 
>  Regards,
>  Andor
> 
> 
>  On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar 
> >> wrote:
> 
> > Did anybody happen to take a quick look by any chance?
> >
> > I don't want to push this too hard, because I know it's a time
> >> consuming
> > topic to think about, but this is a blocker in 3.5 which has been
> >> hanging
> > around for a while and any feedback would be extremely helpful to
> >> close it
> > quickly.
> >
> > Thanks,
> > Andor
> >
> >
> >
> > On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
> > wrote:
> >
> >> Hi all,
> >>
> >> We need more eyes and brains on the following PR:
> 

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Flavio Junqueira
Can you list what the contention points are according to you? Feel free to do 
that in the PR as well, I want to make sure we have the same understanding of 
the points that still need to be resolved. From where I stand, there is no 
major issue pending other than one polishing issue that I brought upon in the 
PR.

-Flavio

> On 8 May 2018, at 21:08, Andor Molnar  wrote:
> 
> I'm happy to do that once we have an agreement.
> 
> 
> 
> 
> On Tue, May 8, 2018 at 8:34 AM, Flavio Junqueira  wrote:
> 
>> It might be a good idea to document whatever we end up doing.
>> 
>> -Flavio
>> 
>>> On 8 May 2018, at 17:22, Andor Molnar  wrote:
>>> 
>>> "If refactoring is necessary to address the issue, then it should be part
>>> of the PR."
>>> 
>>> Agreed. I think it is and it also makes things a lot more simpler, so
>> it's
>>> easier to review.
>>> 
>>> "I'm not sure what kind of confirmation you are after here. Could you
>>> elaborate, please?"
>>> 
>>> I'm just wondering what could have been the reason for caching host
>>> addresses in StaticHostProvider in the first place.
>>> 
>>> "The other solution, if I remember enough of it, tried to avoid resolving
>>> unnecessarily, but perhaps the DNS client cache is enough..."
>>> 
>>> That's exactly my point: what JDK is doing internally is perfectly fine
>> for
>>> us and we don't need extra logic here.
>>> 
>>> "Could you elaborate on this point, please? What exactly do you mean with
>>> this statement?"
>>> 
>>> See my previous point. Caching is already done in all DNS servers in the
>>> chain and also there's also caching in JDK already, which means by
>> calling
>>> getByName() you don't necessarily fire a DNS request, only when the TTL
>> is
>>> expired.
>>> 
>>> Andor
>>> 
>>> 
>>> 
>>> 
>>> On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira  wrote:
>>> 
 Hi Andor,
 
 Thanks for your work on addressing the issue.
 
> On 8 May 2018, at 16:06, Andor Molnar  wrote:
> 
> Hi,
> 
> Updating this thread, because the PR is still being review on GitHub.
> 
> So, the reason why I refactored the original behaviour of
> StaticHostProvider is that I believe that it's trying to do something
 which
> is not its responsibility.
 
 If refactoring is necessary to address the issue, then it should be part
 of the PR. Otherwise, it is better to refactor in a separate PR.
 
 
> Please tell me if there's a good historical
> reason for that.
 
 I'm not sure what kind of confirmation you are after here. Could you
 elaborate, please?
 
> 
> My approach is giving the user the following to options:
> 1- Use static IP addresses, if you don't want to deal with DNS
>> resolution
> at all - we guarantee that no DNS logic will involved in this case at
 all.
 
 Sounds fine.
 
> 2- Use DNS hostnames if you have a reliable DNS service for resolution
> (with HA, secondary servers, backups, etc.) - we must use DNS in the
 right
> way in this case e.g. do NOT cache IP address for a longer period that
 DNS
> server allows to and re-resolve after TTL expries, because it's
>> mandatory
> by protocol.
 
 Sounds fine.
 
> 
> My 2 cents here:
> - the fix which was originally posted for re-resolution is a workaround
 and
> doesn't satisfy the requirement for #2,
 
 The other solution, if I remember enough of it, tried to avoid resolving
 unnecessarily, but perhaps the DNS client cache is enough to avoid the
 unnecessary round-trips. This observation actually brings me to the next
 point:
 
> - the solution is already built-in in JDK and DNS clients in the right
 way
 
 Could you elaborate on this point, please? What exactly do you mean with
 this statement?
 
> - can't see a reason why we shouldn't use that
> 
> I checked this in some other projects as well and found very similar
> approach in hadoop-common's SecurityUtil.java. It has 2 built-in
>> plugins
> for that:
> - Standard resolver uses java's built-in getByName().
> - Qualified resolver still uses getByName(), but adds some logic to
>> avoid
> incorrect re-resolutions and reverse IP lookups.
> 
> Please let me know your thoughts.
> 
> Regards,
> Andor
> 
> 
> 
> 
> 
> 
> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar 
>> wrote:
> 
>> Hi Abe,
>> 
>> Unfortunately we haven't got any feedback yet. What do you think of
>> implementing Option #3?
>> 
>> Regards,
>> Andor
>> 
>> 
>> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar 
 wrote:
>> 
>>> Did anybody happen to take a quick look by any chance?
>>> 
>>> I don't want to push this too hard, because I 

[GitHub] zookeeper issue #503: ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO a...

2018-05-08 Thread shralex
Github user shralex commented on the issue:

https://github.com/apache/zookeeper/pull/503
  
+1 looks good


---


[jira] [Resolved] (ZOOKEEPER-3035) what does these opeartion code mean

2018-05-08 Thread Edward Ribeiro (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Ribeiro resolved ZOOKEEPER-3035.
---
Resolution: Not A Problem

> what does these opeartion code mean
> ---
>
> Key: ZOOKEEPER-3035
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3035
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: liyuzhou
>Priority: Minor
>
> I'm reading the source code, but I often can not understand the operation 
> code mean in OpCode.java. For example , the sync operation code is 9, but I 
> can't understand what does this mean, and the source code has nothing about 
> the code description. Do we have some wiki or document abount operation code?
> {code:java}
> public interface OpCode {
> public final int notification = 0;
> public final int setACL = 7;
> public final int getChildren = 8;
> public final int sync = 9;
> public final int ping = 11;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3035) what does these opeartion code mean

2018-05-08 Thread Edward Ribeiro (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467666#comment-16467666
 ] 

Edward Ribeiro commented on ZOOKEEPER-3035:
---

Please, don't open Jira issues to ask questions. Use ZK mailing lists (user 
and/or dev) to ask such things.

 

The OpCode is the number that identifies a given operation (getData, setData, 
exists, etc) on the wire protocol. When the message is serialized/deserialize 
the value tells what kind of message it is.

> what does these opeartion code mean
> ---
>
> Key: ZOOKEEPER-3035
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3035
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: liyuzhou
>Priority: Minor
>
> I'm reading the source code, but I often can not understand the operation 
> code mean in OpCode.java. For example , the sync operation code is 9, but I 
> can't understand what does this mean, and the source code has nothing about 
> the code description. Do we have some wiki or document abount operation code?
> {code:java}
> public interface OpCode {
> public final int notification = 0;
> public final int setACL = 7;
> public final int getChildren = 8;
> public final int sync = 9;
> public final int ping = 11;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch34_openjdk7 - Build # 1909 - Failure

2018-05-08 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1909/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 39.68 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.625 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.813 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.585 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.716 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.704 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.096 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.824 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.729 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.971 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.885 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.198 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.367 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.84 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.514 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.676 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.119 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.641 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
30.91 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.468 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.73 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1382:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1385:
 Tests failed!

Total time: 40 minutes 47 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Recording test results
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.test.NonRecoverableErrorTest.testZooKeeperServiceAvailableOnLeader

Error Message:
IOException is expected due to error injected to transaction log commit

Stack Trace:
junit.framework.AssertionFailedError: IOException is expected due to error 
injected to transaction log commit
at 
org.apache.zookeeper.test.NonRecoverableErrorTest.testZooKeeperServiceAvailableOnLeader(NonRecoverableErrorTest.java:113)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)

[jira] [Updated] (ZOOKEEPER-3035) what does these opeartion code mean

2018-05-08 Thread liyuzhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyuzhou updated ZOOKEEPER-3035:

Description: 
I'm reading the source code, but I often can not understand the operation code 
mean in OpCode.java. For example , the sync operation code is 9, but I can't 
understand what does this mean, and the source code has nothing about the code 
description. Do we have some wiki or document abount operation code?
{code:java}
public interface OpCode {
public final int notification = 0;

public final int setACL = 7;

public final int getChildren = 8;

public final int sync = 9;

public final int ping = 11;
}
{code}

  was:
I'm reading the source code, but I often can not understand the operation code 
mean in OpCode.java. For example , the sync operation code is 9, but I can't 
understand what does this mean, and the source code has nothing about the code 
descrtion. Do we have some wiki or document abount operation code? 

{code:java}
public interface OpCode {
public final int notification = 0;

public final int setACL = 7;

public final int getChildren = 8;

public final int sync = 9;

public final int ping = 11;
}
{code}



> what does these opeartion code mean
> ---
>
> Key: ZOOKEEPER-3035
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3035
> Project: ZooKeeper
>  Issue Type: Wish
>Reporter: liyuzhou
>Priority: Minor
>
> I'm reading the source code, but I often can not understand the operation 
> code mean in OpCode.java. For example , the sync operation code is 9, but I 
> can't understand what does this mean, and the source code has nothing about 
> the code description. Do we have some wiki or document abount operation code?
> {code:java}
> public interface OpCode {
> public final int notification = 0;
> public final int setACL = 7;
> public final int getChildren = 8;
> public final int sync = 9;
> public final int ping = 11;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3035) what does these opeartion code mean

2018-05-08 Thread liyuzhou (JIRA)
liyuzhou created ZOOKEEPER-3035:
---

 Summary: what does these opeartion code mean
 Key: ZOOKEEPER-3035
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3035
 Project: ZooKeeper
  Issue Type: Wish
Reporter: liyuzhou


I'm reading the source code, but I often can not understand the operation code 
mean in OpCode.java. For example , the sync operation code is 9, but I can't 
understand what does this mean, and the source code has nothing about the code 
descrtion. Do we have some wiki or document abount operation code? 

{code:java}
public interface OpCode {
public final int notification = 0;

public final int setACL = 7;

public final int getChildren = 8;

public final int sync = 9;

public final int ping = 11;
}
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Success: ZOOKEEPER- PreCommit Build #1663

2018-05-08 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1663/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 78.71 MB...]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +0 tests included.  The patch appears to be a documentation 
patch that doesn't require tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1663//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1663//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1663//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Error: No value specified for option "issue"
 [exec] Session logged out. Session was 
JSESSIONID=B9434B3CE5020CC43C2C0AE35ED89C1D.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD SUCCESSFUL
Total time: 17 minutes 19 seconds
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-2982
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Success
Sending email for trigger: Success
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Flavio Junqueira
It might be a good idea to document whatever we end up doing.

-Flavio

> On 8 May 2018, at 17:22, Andor Molnar  wrote:
> 
> "If refactoring is necessary to address the issue, then it should be part
> of the PR."
> 
> Agreed. I think it is and it also makes things a lot more simpler, so it's
> easier to review.
> 
> "I'm not sure what kind of confirmation you are after here. Could you
> elaborate, please?"
> 
> I'm just wondering what could have been the reason for caching host
> addresses in StaticHostProvider in the first place.
> 
> "The other solution, if I remember enough of it, tried to avoid resolving
> unnecessarily, but perhaps the DNS client cache is enough..."
> 
> That's exactly my point: what JDK is doing internally is perfectly fine for
> us and we don't need extra logic here.
> 
> "Could you elaborate on this point, please? What exactly do you mean with
> this statement?"
> 
> See my previous point. Caching is already done in all DNS servers in the
> chain and also there's also caching in JDK already, which means by calling
> getByName() you don't necessarily fire a DNS request, only when the TTL is
> expired.
> 
> Andor
> 
> 
> 
> 
> On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira  wrote:
> 
>> Hi Andor,
>> 
>> Thanks for your work on addressing the issue.
>> 
>>> On 8 May 2018, at 16:06, Andor Molnar  wrote:
>>> 
>>> Hi,
>>> 
>>> Updating this thread, because the PR is still being review on GitHub.
>>> 
>>> So, the reason why I refactored the original behaviour of
>>> StaticHostProvider is that I believe that it's trying to do something
>> which
>>> is not its responsibility.
>> 
>> If refactoring is necessary to address the issue, then it should be part
>> of the PR. Otherwise, it is better to refactor in a separate PR.
>> 
>> 
>>> Please tell me if there's a good historical
>>> reason for that.
>> 
>> I'm not sure what kind of confirmation you are after here. Could you
>> elaborate, please?
>> 
>>> 
>>> My approach is giving the user the following to options:
>>> 1- Use static IP addresses, if you don't want to deal with DNS resolution
>>> at all - we guarantee that no DNS logic will involved in this case at
>> all.
>> 
>> Sounds fine.
>> 
>>> 2- Use DNS hostnames if you have a reliable DNS service for resolution
>>> (with HA, secondary servers, backups, etc.) - we must use DNS in the
>> right
>>> way in this case e.g. do NOT cache IP address for a longer period that
>> DNS
>>> server allows to and re-resolve after TTL expries, because it's mandatory
>>> by protocol.
>> 
>> Sounds fine.
>> 
>>> 
>>> My 2 cents here:
>>> - the fix which was originally posted for re-resolution is a workaround
>> and
>>> doesn't satisfy the requirement for #2,
>> 
>> The other solution, if I remember enough of it, tried to avoid resolving
>> unnecessarily, but perhaps the DNS client cache is enough to avoid the
>> unnecessary round-trips. This observation actually brings me to the next
>> point:
>> 
>>> - the solution is already built-in in JDK and DNS clients in the right
>> way
>> 
>> Could you elaborate on this point, please? What exactly do you mean with
>> this statement?
>> 
>>> - can't see a reason why we shouldn't use that
>>> 
>>> I checked this in some other projects as well and found very similar
>>> approach in hadoop-common's SecurityUtil.java. It has 2 built-in plugins
>>> for that:
>>> - Standard resolver uses java's built-in getByName().
>>> - Qualified resolver still uses getByName(), but adds some logic to avoid
>>> incorrect re-resolutions and reverse IP lookups.
>>> 
>>> Please let me know your thoughts.
>>> 
>>> Regards,
>>> Andor
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar  wrote:
>>> 
 Hi Abe,
 
 Unfortunately we haven't got any feedback yet. What do you think of
 implementing Option #3?
 
 Regards,
 Andor
 
 
 On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar 
>> wrote:
 
> Did anybody happen to take a quick look by any chance?
> 
> I don't want to push this too hard, because I know it's a time
>> consuming
> topic to think about, but this is a blocker in 3.5 which has been
>> hanging
> around for a while and any feedback would be extremely helpful to
>> close it
> quickly.
> 
> Thanks,
> Andor
> 
> 
> 
> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
> wrote:
> 
>> Hi all,
>> 
>> We need more eyes and brains on the following PR:
>> 
>> https://github.com/apache/zookeeper/pull/451
>> 
>> I added a comment few days ago about the way we currently do DNS name
>> resolution in this class and a suggestion on how we could simplify
>> things a
>> little bit. We talked about it with Abe Fine, but we're a little bit
>> unsure
>> and cannot get a conclusion. It would be extremely handy to get more
>> 

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Andor Molnar
"If refactoring is necessary to address the issue, then it should be part
of the PR."

Agreed. I think it is and it also makes things a lot more simpler, so it's
easier to review.

"I'm not sure what kind of confirmation you are after here. Could you
elaborate, please?"

I'm just wondering what could have been the reason for caching host
addresses in StaticHostProvider in the first place.

"The other solution, if I remember enough of it, tried to avoid resolving
unnecessarily, but perhaps the DNS client cache is enough..."

That's exactly my point: what JDK is doing internally is perfectly fine for
us and we don't need extra logic here.

"Could you elaborate on this point, please? What exactly do you mean with
this statement?"

See my previous point. Caching is already done in all DNS servers in the
chain and also there's also caching in JDK already, which means by calling
getByName() you don't necessarily fire a DNS request, only when the TTL is
expired.

Andor




On Tue, May 8, 2018 at 8:12 AM, Flavio Junqueira  wrote:

> Hi Andor,
>
> Thanks for your work on addressing the issue.
>
> > On 8 May 2018, at 16:06, Andor Molnar  wrote:
> >
> > Hi,
> >
> > Updating this thread, because the PR is still being review on GitHub.
> >
> > So, the reason why I refactored the original behaviour of
> > StaticHostProvider is that I believe that it's trying to do something
> which
> > is not its responsibility.
>
> If refactoring is necessary to address the issue, then it should be part
> of the PR. Otherwise, it is better to refactor in a separate PR.
>
>
> > Please tell me if there's a good historical
> > reason for that.
>
> I'm not sure what kind of confirmation you are after here. Could you
> elaborate, please?
>
> >
> > My approach is giving the user the following to options:
> > 1- Use static IP addresses, if you don't want to deal with DNS resolution
> > at all - we guarantee that no DNS logic will involved in this case at
> all.
>
> Sounds fine.
>
> > 2- Use DNS hostnames if you have a reliable DNS service for resolution
> > (with HA, secondary servers, backups, etc.) - we must use DNS in the
> right
> > way in this case e.g. do NOT cache IP address for a longer period that
> DNS
> > server allows to and re-resolve after TTL expries, because it's mandatory
> > by protocol.
>
> Sounds fine.
>
> >
> > My 2 cents here:
> > - the fix which was originally posted for re-resolution is a workaround
> and
> > doesn't satisfy the requirement for #2,
>
> The other solution, if I remember enough of it, tried to avoid resolving
> unnecessarily, but perhaps the DNS client cache is enough to avoid the
> unnecessary round-trips. This observation actually brings me to the next
> point:
>
> > - the solution is already built-in in JDK and DNS clients in the right
> way
>
> Could you elaborate on this point, please? What exactly do you mean with
> this statement?
>
> > - can't see a reason why we shouldn't use that
> >
> > I checked this in some other projects as well and found very similar
> > approach in hadoop-common's SecurityUtil.java. It has 2 built-in plugins
> > for that:
> > - Standard resolver uses java's built-in getByName().
> > - Qualified resolver still uses getByName(), but adds some logic to avoid
> > incorrect re-resolutions and reverse IP lookups.
> >
> > Please let me know your thoughts.
> >
> > Regards,
> > Andor
> >
> >
> >
> >
> >
> >
> > On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar  wrote:
> >
> >> Hi Abe,
> >>
> >> Unfortunately we haven't got any feedback yet. What do you think of
> >> implementing Option #3?
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar 
> wrote:
> >>
> >>> Did anybody happen to take a quick look by any chance?
> >>>
> >>> I don't want to push this too hard, because I know it's a time
> consuming
> >>> topic to think about, but this is a blocker in 3.5 which has been
> hanging
> >>> around for a while and any feedback would be extremely helpful to
> close it
> >>> quickly.
> >>>
> >>> Thanks,
> >>> Andor
> >>>
> >>>
> >>>
> >>> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
> >>> wrote:
> >>>
>  Hi all,
> 
>  We need more eyes and brains on the following PR:
> 
>  https://github.com/apache/zookeeper/pull/451
> 
>  I added a comment few days ago about the way we currently do DNS name
>  resolution in this class and a suggestion on how we could simplify
> things a
>  little bit. We talked about it with Abe Fine, but we're a little bit
> unsure
>  and cannot get a conclusion. It would be extremely handy to get more
>  feedback from you.
> 
>  To add some colour to it, let me elaborate on the situation here:
> 
>  In general, the task that StaticHostProvider does is to get a list of
>  potentially unresolved InetSocketAddress objects, resolve them and
> iterate
>  over the resolved 

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Flavio Junqueira
Hi Andor,

Thanks for your work on addressing the issue.

> On 8 May 2018, at 16:06, Andor Molnar  wrote:
> 
> Hi,
> 
> Updating this thread, because the PR is still being review on GitHub.
> 
> So, the reason why I refactored the original behaviour of
> StaticHostProvider is that I believe that it's trying to do something which
> is not its responsibility.

If refactoring is necessary to address the issue, then it should be part of the 
PR. Otherwise, it is better to refactor in a separate PR.


> Please tell me if there's a good historical
> reason for that.

I'm not sure what kind of confirmation you are after here. Could you elaborate, 
please?

> 
> My approach is giving the user the following to options:
> 1- Use static IP addresses, if you don't want to deal with DNS resolution
> at all - we guarantee that no DNS logic will involved in this case at all.

Sounds fine.

> 2- Use DNS hostnames if you have a reliable DNS service for resolution
> (with HA, secondary servers, backups, etc.) - we must use DNS in the right
> way in this case e.g. do NOT cache IP address for a longer period that DNS
> server allows to and re-resolve after TTL expries, because it's mandatory
> by protocol.

Sounds fine.

> 
> My 2 cents here:
> - the fix which was originally posted for re-resolution is a workaround and
> doesn't satisfy the requirement for #2,

The other solution, if I remember enough of it, tried to avoid resolving 
unnecessarily, but perhaps the DNS client cache is enough to avoid the 
unnecessary round-trips. This observation actually brings me to the next point:

> - the solution is already built-in in JDK and DNS clients in the right way

Could you elaborate on this point, please? What exactly do you mean with this 
statement?

> - can't see a reason why we shouldn't use that
> 
> I checked this in some other projects as well and found very similar
> approach in hadoop-common's SecurityUtil.java. It has 2 built-in plugins
> for that:
> - Standard resolver uses java's built-in getByName().
> - Qualified resolver still uses getByName(), but adds some logic to avoid
> incorrect re-resolutions and reverse IP lookups.
> 
> Please let me know your thoughts.
> 
> Regards,
> Andor
> 
> 
> 
> 
> 
> 
> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar  wrote:
> 
>> Hi Abe,
>> 
>> Unfortunately we haven't got any feedback yet. What do you think of
>> implementing Option #3?
>> 
>> Regards,
>> Andor
>> 
>> 
>> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar  wrote:
>> 
>>> Did anybody happen to take a quick look by any chance?
>>> 
>>> I don't want to push this too hard, because I know it's a time consuming
>>> topic to think about, but this is a blocker in 3.5 which has been hanging
>>> around for a while and any feedback would be extremely helpful to close it
>>> quickly.
>>> 
>>> Thanks,
>>> Andor
>>> 
>>> 
>>> 
>>> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
>>> wrote:
>>> 
 Hi all,
 
 We need more eyes and brains on the following PR:
 
 https://github.com/apache/zookeeper/pull/451
 
 I added a comment few days ago about the way we currently do DNS name
 resolution in this class and a suggestion on how we could simplify things a
 little bit. We talked about it with Abe Fine, but we're a little bit unsure
 and cannot get a conclusion. It would be extremely handy to get more
 feedback from you.
 
 To add some colour to it, let me elaborate on the situation here:
 
 In general, the task that StaticHostProvider does is to get a list of
 potentially unresolved InetSocketAddress objects, resolve them and iterate
 over the resolved objects by calling next() method.
 
 *Option #1 (current logic)*
 - Resolve addresses with getAllByName() which returns a list of IP
 addresses associated with the address.
 - Cache all these IP's, shuffle them and iterate over.
 - If client is unable to connect to an IP, remove all IPs from the list
 which the original servername was resolved to and re-resolve it.
 
 *Option #2 (getByName())*
 - Resolve address with getByName() instead which returns only the first
 IP address of the name,
 - Do not cache IPs,
 - Shuffle the *names* and resolve with getByName() *every time* when
 next() is called,
 - JDK's built-in caching will prevent name servers from being flooded
 and will do the re-resolution automatically when cache expires,
 - Names with multiple IPs will be handled by DNS servers which (if
 configured properly) return IPs in different order - this is called DNS
 Round Robin -, so getByName() will return different IP on each call.
 
 *Options #3*
 - There's a small problem with option#2: if DNS server is not configured
 properly and handles the round-robin case in a way that it always return
 the IP list in the same order, getByName() will never 

[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186759876
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -111,9 +154,18 @@ public InetSocketAddress next(long spinDelay) {
 lastIndex = 0;
 }
 
-return serverAddresses.get(currentIndex);
+InetSocketAddress curAddr = serverAddresses.get(currentIndex);
+
+String curHostString = getHostString(curAddr);
+List resolvedAddresses = new 
ArrayList(Arrays.asList(this.resolver.getAllByName(curHostString)));
+if (resolvedAddresses.isEmpty()) {
+throw new UnknownHostException("No IP address returned for 
address: " + curHostString);
--- End diff --

Ok.


---


Re: Let's cut a ZK 3.5.4-beta release

2018-05-08 Thread Patrick Hunt
On Tue, May 8, 2018 at 1:58 AM, Flavio Junqueira  wrote:

> Hi Pat,
>
> I'm ready to merge ZK-2982. It is a one line change that is actually a
> mistake that was made during the port of the changes from 3.4 to 3.5. It is
> just bringing in the line that was missed.
>
>
It looks like that was missed from 3.4, is that right? Sounds fine to me. I
will be cutting the release soon, so please get it in asap otw we might
miss the window.


> As you are the RM, I'll follow your instructions. I'm fine with either
> merging it today or doing a 3.5.5 immediately after. Let me know where you
> are with the release candidate and we can decide on the best course of
> action.
>
>
If you are confident with the fix and can get it in before I start 3.5.4
it's fine with me. Thanks!

Patrick



> -Flavio
>
> > On 8 May 2018, at 05:56, Patrick Hunt  wrote:
> >
> > There's been plenty of time for someone to get these into 3.5.4 before
> now.
> > I really want to get 3.5.4 out so that folks can access/assess what's
> > currently committed.
> >
> > I'm happy to drive a follow-on release with these changes soon after
> 3.5.4
> > ships if there's sufficient progress.
> >
> > Patrick
> >
> > On Mon, May 7, 2018 at 2:00 PM, Alexander Shraer 
> wrote:
> >
> >> Lets also get this in: https://issues.apache.org/
> >> jira/browse/ZOOKEEPER-2959
> >> This affects anyone running observers.
> >>
> >> On Mon, May 7, 2018 at 1:45 PM, Flavio Junqueira 
> wrote:
> >>
> >>> We should strongly consider merging this:
> >>>
> >>> https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> >>>
> >>> As it is causing problems to anyone using ZK on K8s.
> >>>
> >>> -Flavio
> >>>
>  On 7 May 2018, at 17:32, Rakesh Radhakrishnan 
> >>> wrote:
> 
>  I have added my feedback to the PR. Please take a look at it.
> 
> 
>  Rakesh
> 
>  On Mon, May 7, 2018 at 8:38 AM, Patrick Hunt 
> wrote:
> 
> > No one has looked at 2901, while I'm not super happy with it it seems
> >>> fine
> > - I'll commit it as-is if I don't hear anything by EOD tomorrow
> >>> (Monday).
> > After which I'll start the release process.
> >
> > Patrick
> >
> > On Mon, Mar 26, 2018 at 2:09 AM, Andor Molnar 
> >>> wrote:
> >
> >> I'm currently working on ZOOKEEPER-2184. PR has been open for ages
> on
> >>> 3.4
> >> branch, please review if you have some capacity.
> >> I'll port the fix to the 3.5 branch too, if we have an agreement and
> >> 3.4-version is merged.
> >>
> >> ZK-2982 is somewhat related, I believe my changes will fix that one
> >>> too.
> >>
> >> ZK-1818 is probably tough, but patch is already available. Somebody
> > should
> >> pick it up, which I'm happy to do once finished with above stuff.
> >>
> >> Regards,
> >> Andor
> >>
> >>
> >>
> >> On Mon, Mar 26, 2018 at 3:39 AM, Michael Han 
> >> wrote:
> >>
> >>> +1 on 3.5.4 release planning.
> >>>
> > There are 10 open blocker issues marked for 3.5.4. Can I get some
> > help
> >>> to sort out those issues?
> >>>
> >>> The url posted does not work for me, here is the query I use:
> >>> https://goo.gl/3MJZMN
> >>>
> >>> Just had a chance to go through the JIRA and did some clean ups.
> >>> - 1159: lower the priority from blocker to major.
> >>> - 761: resolved because the code is merged.
> >>>
> >>> ZOOKEEPER-2903, 2184, 2982 looks like real blockers for the 3.5.4
> >> release.
> >>> The rest of the blockers are legacy that's get postponed
> >> indefinitely.
> >>>
> >>>
> >>> On Wed, Mar 14, 2018 at 11:59 AM, Flavio Junqueira  >
> >> wrote:
> >>>
>  Ok, I can have a look at ZK-2901. I'd like to get ZK-2982 in as
> >> well
> > as
> >>> it
>  is causing us problems with Kubernetes. The fix is simple, but I'm
>  wondering about adding a test case.
> 
>  -Flavio
> 
> > On 14 Mar 2018, at 17:30, Patrick Hunt  wrote:
> >
> > I would like to cut 3.5.4. Need more eyes on 2901 though.
> >
> > Patrick
> >
> > On Wed, Mar 14, 2018 at 4:26 AM, Flavio Junqueira <
> f...@apache.org
> >>>
>  wrote:
> >
> >> I think ZK-2901 is close to being merged, yes? And with that,
> >> will
> >> we
>  cut
> >> a 3.5.4 release?
> >>
> >> -Flavio
> >>
> >>> On 7 Dec 2017, at 00:27, Patrick Hunt 
> wrote:
> >>>
> >>> I haven't forgotten about this - we've been stuck on
> > ZOOKEEPER-2901
> >>> . I
> >>> think we getting closer but it's been tricky to navigate
> > addressing
> >>> the
> >>> issue vs backward compat 

[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186741712
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -30,76 +31,118 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-
 /**
  * Most simple HostProvider, resolves only on instantiation.
- * 
+ *
  */
 @InterfaceAudience.Public
 public final class StaticHostProvider implements HostProvider {
+public interface Resolver {
+InetAddress[] getAllByName(String name) throws 
UnknownHostException;
+}
+
 private static final Logger LOG = LoggerFactory
 .getLogger(StaticHostProvider.class);
 
-private final List serverAddresses = new 
ArrayList(
-5);
+private final List serverAddresses = new 
ArrayList(5);
 
 private int lastIndex = -1;
 
 private int currentIndex = -1;
 
+private Resolver resolver;
+
 /**
  * Constructs a SimpleHostSet.
- * 
+ *
  * @param serverAddresses
  *possibly unresolved ZooKeeper server addresses
  * @throws IllegalArgumentException
  * if serverAddresses is empty or resolves to an empty list
  */
 public StaticHostProvider(Collection 
serverAddresses) {
-for (InetSocketAddress address : serverAddresses) {
-try {
-InetAddress ia = address.getAddress();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName((ia != null) ? ia.getHostAddress() :
-address.getHostName());
-for (InetAddress resolvedAddress : resolvedAddresses) {
-// If hostName is null but the address is not, we can 
tell that
-// the hostName is an literal IP address. Then we can 
set the host string as the hostname
-// safely to avoid reverse DNS lookup.
-// As far as i know, the only way to check if the 
hostName is null is use toString().
-// Both the two implementations of InetAddress are 
final class, so we can trust the return value of
-// the toString() method.
-if (resolvedAddress.toString().startsWith("/")
-&& resolvedAddress.getAddress() != null) {
-this.serverAddresses.add(
-new 
InetSocketAddress(InetAddress.getByAddress(
-address.getHostName(),
-resolvedAddress.getAddress()),
-address.getPort()));
-} else {
-this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
-}
-}
-} catch (UnknownHostException e) {
-LOG.error("Unable to connect to server: {}", address, e);
+this.resolver = new Resolver() {
+@Override
+public InetAddress[] getAllByName(String name) throws 
UnknownHostException {
+return InetAddress.getAllByName(name);
 }
-}
-
-if (this.serverAddresses.isEmpty()) {
+};
+init(serverAddresses);
+}
+
+/**
+ * Introduced for testing purposes. getAllByName() is a static method 
of InetAddress, therefore cannot be easily mocked.
+ * By abstraction of Resolver interface we can easily inject a mocked 
implementation in tests.
+ *
+ * @param serverAddresses
+ *possibly unresolved ZooKeeper server addresses
+ * @param resolver
+ *custom resolver implementation
+ * @throws IllegalArgumentException
+ * if serverAddresses is empty or resolves to an empty list
+ */
+public StaticHostProvider(Collection 
serverAddresses, Resolver resolver) {
+this.resolver = resolver;
+init(serverAddresses);
+}
+
+/**
+ * Common init method for all constructors.
+ * Resolve all unresolved server addresses, put them in a list and 
shuffle.
+ */
+private void init(Collection serverAddresses) {
+if (serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
 }
+
+this.serverAddresses.addAll(serverAddresses);
 Collections.shuffle(this.serverAddresses);
 }
 
+/**
+ * Evaluate to a 

Re: Name resolution in StaticHostProvider

2018-05-08 Thread Andor Molnar
Hi,

Updating this thread, because the PR is still being review on GitHub.

So, the reason why I refactored the original behaviour of
StaticHostProvider is that I believe that it's trying to do something which
is not its responsibility. Please tell me if there's a good historical
reason for that.

My approach is giving the user the following to options:
1- Use static IP addresses, if you don't want to deal with DNS resolution
at all - we guarantee that no DNS logic will involved in this case at all.
2- Use DNS hostnames if you have a reliable DNS service for resolution
(with HA, secondary servers, backups, etc.) - we must use DNS in the right
way in this case e.g. do NOT cache IP address for a longer period that DNS
server allows to and re-resolve after TTL expries, because it's mandatory
by protocol.

My 2 cents here:
- the fix which was originally posted for re-resolution is a workaround and
doesn't satisfy the requirement for #2,
- the solution is already built-in in JDK and DNS clients in the right way
- can't see a reason why we shouldn't use that

I checked this in some other projects as well and found very similar
approach in hadoop-common's SecurityUtil.java. It has 2 built-in plugins
for that:
- Standard resolver uses java's built-in getByName().
- Qualified resolver still uses getByName(), but adds some logic to avoid
incorrect re-resolutions and reverse IP lookups.

Please let me know your thoughts.

Regards,
Andor






On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar  wrote:

> Hi Abe,
>
> Unfortunately we haven't got any feedback yet. What do you think of
> implementing Option #3?
>
> Regards,
> Andor
>
>
> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar  wrote:
>
>> Did anybody happen to take a quick look by any chance?
>>
>> I don't want to push this too hard, because I know it's a time consuming
>> topic to think about, but this is a blocker in 3.5 which has been hanging
>> around for a while and any feedback would be extremely helpful to close it
>> quickly.
>>
>> Thanks,
>> Andor
>>
>>
>>
>> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
>> wrote:
>>
>>> Hi all,
>>>
>>> We need more eyes and brains on the following PR:
>>>
>>> https://github.com/apache/zookeeper/pull/451
>>>
>>> I added a comment few days ago about the way we currently do DNS name
>>> resolution in this class and a suggestion on how we could simplify things a
>>> little bit. We talked about it with Abe Fine, but we're a little bit unsure
>>> and cannot get a conclusion. It would be extremely handy to get more
>>> feedback from you.
>>>
>>> To add some colour to it, let me elaborate on the situation here:
>>>
>>> In general, the task that StaticHostProvider does is to get a list of
>>> potentially unresolved InetSocketAddress objects, resolve them and iterate
>>> over the resolved objects by calling next() method.
>>>
>>> *Option #1 (current logic)*
>>> - Resolve addresses with getAllByName() which returns a list of IP
>>> addresses associated with the address.
>>> - Cache all these IP's, shuffle them and iterate over.
>>> - If client is unable to connect to an IP, remove all IPs from the list
>>> which the original servername was resolved to and re-resolve it.
>>>
>>> *Option #2 (getByName())*
>>> - Resolve address with getByName() instead which returns only the first
>>> IP address of the name,
>>> - Do not cache IPs,
>>> - Shuffle the *names* and resolve with getByName() *every time* when
>>> next() is called,
>>> - JDK's built-in caching will prevent name servers from being flooded
>>> and will do the re-resolution automatically when cache expires,
>>> - Names with multiple IPs will be handled by DNS servers which (if
>>> configured properly) return IPs in different order - this is called DNS
>>> Round Robin -, so getByName() will return different IP on each call.
>>>
>>> *Options #3*
>>> - There's a small problem with option#2: if DNS server is not configured
>>> properly and handles the round-robin case in a way that it always return
>>> the IP list in the same order, getByName() will never return the next ip,
>>> - In order to overcome that, use getAllByName() instead, shuffle the
>>> list and return the first IP.
>>>
>>> All feedback if much appreciated.
>>>
>>> Thanks,
>>> Andor
>>>
>>>
>>>
>>>
>>
>


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186735649
  
--- Diff: src/java/main/org/apache/zookeeper/client/HostProvider.java ---
@@ -53,7 +54,7 @@
  * @param spinDelay
  *Milliseconds to wait if all hosts have been tried once.
  */
-public InetSocketAddress next(long spinDelay);
+public InetSocketAddress next(long spinDelay) throws 
UnknownHostException;
--- End diff --

1- See my comment below. Now I think the API change is necessary, but I 
need to double check how it was detected previously.
2- I started a thread on the mailing list a while ago for exactly the same 
reason. Let me update it.


---


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186734788
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -111,9 +154,18 @@ public InetSocketAddress next(long spinDelay) {
 lastIndex = 0;
 }
 
-return serverAddresses.get(currentIndex);
+InetSocketAddress curAddr = serverAddresses.get(currentIndex);
+
+String curHostString = getHostString(curAddr);
+List resolvedAddresses = new 
ArrayList(Arrays.asList(this.resolver.getAllByName(curHostString)));
+if (resolvedAddresses.isEmpty()) {
+throw new UnknownHostException("No IP address returned for 
address: " + curHostString);
--- End diff --

> We need a way to break the loop in the case the client closes, though.

That's actually a good reason for _not_ dealing with the error here. 
Because the caller - ClientCnxn - is be able to detect client closes, but 
StatisHostProvider is not.


---


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread eribeiro
Github user eribeiro commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186694289
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -30,76 +31,118 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-
 /**
  * Most simple HostProvider, resolves only on instantiation.
- * 
+ *
  */
 @InterfaceAudience.Public
 public final class StaticHostProvider implements HostProvider {
+public interface Resolver {
+InetAddress[] getAllByName(String name) throws 
UnknownHostException;
+}
+
 private static final Logger LOG = LoggerFactory
 .getLogger(StaticHostProvider.class);
 
-private final List serverAddresses = new 
ArrayList(
-5);
+private final List serverAddresses = new 
ArrayList(5);
 
 private int lastIndex = -1;
 
 private int currentIndex = -1;
 
+private Resolver resolver;
+
 /**
  * Constructs a SimpleHostSet.
- * 
+ *
  * @param serverAddresses
  *possibly unresolved ZooKeeper server addresses
  * @throws IllegalArgumentException
  * if serverAddresses is empty or resolves to an empty list
  */
 public StaticHostProvider(Collection 
serverAddresses) {
-for (InetSocketAddress address : serverAddresses) {
-try {
-InetAddress ia = address.getAddress();
-InetAddress resolvedAddresses[] = 
InetAddress.getAllByName((ia != null) ? ia.getHostAddress() :
-address.getHostName());
-for (InetAddress resolvedAddress : resolvedAddresses) {
-// If hostName is null but the address is not, we can 
tell that
-// the hostName is an literal IP address. Then we can 
set the host string as the hostname
-// safely to avoid reverse DNS lookup.
-// As far as i know, the only way to check if the 
hostName is null is use toString().
-// Both the two implementations of InetAddress are 
final class, so we can trust the return value of
-// the toString() method.
-if (resolvedAddress.toString().startsWith("/")
-&& resolvedAddress.getAddress() != null) {
-this.serverAddresses.add(
-new 
InetSocketAddress(InetAddress.getByAddress(
-address.getHostName(),
-resolvedAddress.getAddress()),
-address.getPort()));
-} else {
-this.serverAddresses.add(new 
InetSocketAddress(resolvedAddress.getHostAddress(), address.getPort()));
-}
-}
-} catch (UnknownHostException e) {
-LOG.error("Unable to connect to server: {}", address, e);
+this.resolver = new Resolver() {
+@Override
+public InetAddress[] getAllByName(String name) throws 
UnknownHostException {
+return InetAddress.getAllByName(name);
 }
-}
-
-if (this.serverAddresses.isEmpty()) {
+};
+init(serverAddresses);
+}
+
+/**
+ * Introduced for testing purposes. getAllByName() is a static method 
of InetAddress, therefore cannot be easily mocked.
+ * By abstraction of Resolver interface we can easily inject a mocked 
implementation in tests.
+ *
+ * @param serverAddresses
+ *possibly unresolved ZooKeeper server addresses
+ * @param resolver
+ *custom resolver implementation
+ * @throws IllegalArgumentException
+ * if serverAddresses is empty or resolves to an empty list
+ */
+public StaticHostProvider(Collection 
serverAddresses, Resolver resolver) {
+this.resolver = resolver;
+init(serverAddresses);
+}
+
+/**
+ * Common init method for all constructors.
+ * Resolve all unresolved server addresses, put them in a list and 
shuffle.
+ */
+private void init(Collection serverAddresses) {
+if (serverAddresses.isEmpty()) {
 throw new IllegalArgumentException(
 "A HostProvider may not be empty!");
 }
+
+this.serverAddresses.addAll(serverAddresses);
 Collections.shuffle(this.serverAddresses);
 }
 
+/**
+ * Evaluate to a 

[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186690327
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -111,9 +154,18 @@ public InetSocketAddress next(long spinDelay) {
 lastIndex = 0;
 }
 
-return serverAddresses.get(currentIndex);
+InetSocketAddress curAddr = serverAddresses.get(currentIndex);
+
+String curHostString = getHostString(curAddr);
+List resolvedAddresses = new 
ArrayList(Arrays.asList(this.resolver.getAllByName(curHostString)));
+if (resolvedAddresses.isEmpty()) {
+throw new UnknownHostException("No IP address returned for 
address: " + curHostString);
+}
+Collections.shuffle(resolvedAddresses);
--- End diff --

Ok.


---


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186690242
  
--- Diff: src/java/main/org/apache/zookeeper/client/StaticHostProvider.java 
---
@@ -111,9 +154,18 @@ public InetSocketAddress next(long spinDelay) {
 lastIndex = 0;
 }
 
-return serverAddresses.get(currentIndex);
+InetSocketAddress curAddr = serverAddresses.get(currentIndex);
+
+String curHostString = getHostString(curAddr);
+List resolvedAddresses = new 
ArrayList(Arrays.asList(this.resolver.getAllByName(curHostString)));
+if (resolvedAddresses.isEmpty()) {
+throw new UnknownHostException("No IP address returned for 
address: " + curHostString);
--- End diff --

I think it is fine to loop indefinitely because the client needs it for 
progress. We need a way to break the loop in the case the client closes, though.


---


[GitHub] zookeeper pull request #451: ZOOKEEPER-2184: Zookeeper Client should re-reso...

2018-05-08 Thread fpj
Github user fpj commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/451#discussion_r186689865
  
--- Diff: src/java/main/org/apache/zookeeper/client/HostProvider.java ---
@@ -53,7 +54,7 @@
  * @param spinDelay
  *Milliseconds to wait if all hosts have been tried once.
  */
-public InetSocketAddress next(long spinDelay);
+public InetSocketAddress next(long spinDelay) throws 
UnknownHostException;
--- End diff --

My preference is the following:

1- That we do not make this API change. It is not necessary to solve the 
problem of this issue.
2- That we discuss separately whether we want to change the behaviour of 
the `next()` in the `HostProvider` interface.


---


Re: Let's cut a ZK 3.5.4-beta release

2018-05-08 Thread Flavio Junqueira
Hi Pat,

I'm ready to merge ZK-2982. It is a one line change that is actually a mistake 
that was made during the port of the changes from 3.4 to 3.5. It is just 
bringing in the line that was missed.

As you are the RM, I'll follow your instructions. I'm fine with either merging 
it today or doing a 3.5.5 immediately after. Let me know where you are with the 
release candidate and we can decide on the best course of action.

-Flavio

> On 8 May 2018, at 05:56, Patrick Hunt  wrote:
> 
> There's been plenty of time for someone to get these into 3.5.4 before now.
> I really want to get 3.5.4 out so that folks can access/assess what's
> currently committed.
> 
> I'm happy to drive a follow-on release with these changes soon after 3.5.4
> ships if there's sufficient progress.
> 
> Patrick
> 
> On Mon, May 7, 2018 at 2:00 PM, Alexander Shraer  wrote:
> 
>> Lets also get this in: https://issues.apache.org/
>> jira/browse/ZOOKEEPER-2959
>> This affects anyone running observers.
>> 
>> On Mon, May 7, 2018 at 1:45 PM, Flavio Junqueira  wrote:
>> 
>>> We should strongly consider merging this:
>>> 
>>> https://issues.apache.org/jira/browse/ZOOKEEPER-2982
>>> 
>>> As it is causing problems to anyone using ZK on K8s.
>>> 
>>> -Flavio
>>> 
 On 7 May 2018, at 17:32, Rakesh Radhakrishnan 
>>> wrote:
 
 I have added my feedback to the PR. Please take a look at it.
 
 
 Rakesh
 
 On Mon, May 7, 2018 at 8:38 AM, Patrick Hunt  wrote:
 
> No one has looked at 2901, while I'm not super happy with it it seems
>>> fine
> - I'll commit it as-is if I don't hear anything by EOD tomorrow
>>> (Monday).
> After which I'll start the release process.
> 
> Patrick
> 
> On Mon, Mar 26, 2018 at 2:09 AM, Andor Molnar 
>>> wrote:
> 
>> I'm currently working on ZOOKEEPER-2184. PR has been open for ages on
>>> 3.4
>> branch, please review if you have some capacity.
>> I'll port the fix to the 3.5 branch too, if we have an agreement and
>> 3.4-version is merged.
>> 
>> ZK-2982 is somewhat related, I believe my changes will fix that one
>>> too.
>> 
>> ZK-1818 is probably tough, but patch is already available. Somebody
> should
>> pick it up, which I'm happy to do once finished with above stuff.
>> 
>> Regards,
>> Andor
>> 
>> 
>> 
>> On Mon, Mar 26, 2018 at 3:39 AM, Michael Han 
>> wrote:
>> 
>>> +1 on 3.5.4 release planning.
>>> 
> There are 10 open blocker issues marked for 3.5.4. Can I get some
> help
>>> to sort out those issues?
>>> 
>>> The url posted does not work for me, here is the query I use:
>>> https://goo.gl/3MJZMN
>>> 
>>> Just had a chance to go through the JIRA and did some clean ups.
>>> - 1159: lower the priority from blocker to major.
>>> - 761: resolved because the code is merged.
>>> 
>>> ZOOKEEPER-2903, 2184, 2982 looks like real blockers for the 3.5.4
>> release.
>>> The rest of the blockers are legacy that's get postponed
>> indefinitely.
>>> 
>>> 
>>> On Wed, Mar 14, 2018 at 11:59 AM, Flavio Junqueira 
>> wrote:
>>> 
 Ok, I can have a look at ZK-2901. I'd like to get ZK-2982 in as
>> well
> as
>>> it
 is causing us problems with Kubernetes. The fix is simple, but I'm
 wondering about adding a test case.
 
 -Flavio
 
> On 14 Mar 2018, at 17:30, Patrick Hunt  wrote:
> 
> I would like to cut 3.5.4. Need more eyes on 2901 though.
> 
> Patrick
> 
> On Wed, Mar 14, 2018 at 4:26 AM, Flavio Junqueira >> 
 wrote:
> 
>> I think ZK-2901 is close to being merged, yes? And with that,
>> will
>> we
 cut
>> a 3.5.4 release?
>> 
>> -Flavio
>> 
>>> On 7 Dec 2017, at 00:27, Patrick Hunt  wrote:
>>> 
>>> I haven't forgotten about this - we've been stuck on
> ZOOKEEPER-2901
>>> . I
>>> think we getting closer but it's been tricky to navigate
> addressing
>>> the
>>> issue vs backward compat vs making things worse. I was about to
>> sign
 off
>>> then noticed I had missed something. Jordan has been working to
 address.
>>> 
>>> Camille, Edward -- it would be good if you could take a look at
>> 2901
>> given
>>> you participated in the original creation/commit of this
>> feature.
>>> 
>>> Regards,
>>> 
>>> Patrick
>>> 
>>> On Tue, Nov 21, 2017 at 11:32 AM, Karan Mehta <
>>> karanmeht...@gmail.com>
>>> wrote:
>>> 
 Can we get ZOOKEEPER-2770

[jira] [Commented] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-05-08 Thread Flavio Junqueira (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467049#comment-16467049
 ] 

Flavio Junqueira commented on ZOOKEEPER-2982:
-

Based on this comment, I'm +1 from this change:

https://issues.apache.org/jira/browse/ZOOKEEPER-2982?focusedCommentId=16371886=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16371886

It would have been good to have a test case, but we haven't been able to come 
up with anything, so I suggest we leave it for future work. We also have a +1 
from [~andorm] in the pull request.

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-2982) Re-try DNS hostname -> IP resolution

2018-05-08 Thread Flavio Junqueira (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Flavio Junqueira reassigned ZOOKEEPER-2982:
---

Assignee: Flavio Junqueira

> Re-try DNS hostname -> IP resolution
> 
>
> Key: ZOOKEEPER-2982
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2982
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.3
>Reporter: Eron Wright 
>Assignee: Flavio Junqueira
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
> Attachments: 3.5.3-beta.zip, fixed.log
>
>
> ZOOKEEPER-1506 fixed a DNS resolution issue in 3.4.  Some portions of the fix 
> haven't yet been ported to 3.5.
> To recap the outstanding problem in 3.5, if a given ZK server is started 
> before all peer addresses are resolvable, that server may cache a negative 
> lookup result and forever fail to resolve the address.For example, 
> deploying ZK 3.5 to Kubernetes using a StatefulSet plus a Service (headless) 
> may fail because the DNS records are created lazily.
> {code}
> 2018-02-18 09:11:22,583 [myid:0] - WARN  
> [QuorumPeer[myid=0](plain=/0:0:0:0:0:0:0:0:2181)(secure=disabled):Follower@95]
>  - Exception when following the leader
> java.net.UnknownHostException: zk-2.zk.default.svc.cluster.local
> at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> at java.net.Socket.connect(Socket.java:589)
> at 
> org.apache.zookeeper.server.quorum.Learner.sockConnect(Learner.java:227)
> at 
> org.apache.zookeeper.server.quorum.Learner.connectToLeader(Learner.java:256)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:76)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {code}
> In the above example, the address `zk-2.zk.default.svc.cluster.local` was not 
> resolvable when the server started, but became resolvable shortly thereafter. 
>The server should eventually succeed but doesn't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)