[GitHub] zookeeper pull request #587: ZOOKEEPER-3106: Zookeeper client supports IPv6 ...

2018-08-01 Thread maoling
Github user maoling commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/587#discussion_r207090394
  
--- Diff: 
src/java/main/org/apache/zookeeper/client/ConnectStringParser.java ---
@@ -68,14 +69,26 @@ public ConnectStringParser(String connectString) {
 List hostsList = split(connectString,",");
 for (String host : hostsList) {
 int port = DEFAULT_PORT;
-int pidx = host.lastIndexOf(':');
-if (pidx >= 0) {
-// otherwise : is at the end of the string, ignore
-if (pidx < host.length() - 1) {
-port = Integer.parseInt(host.substring(pidx + 1));
-}
-host = host.substring(0, pidx);
+if (!connectString.startsWith("[")) {//IPv4
+   int pidx = host.lastIndexOf(':');
+   if (pidx >= 0) {
+   // otherwise : is at the end of the string, ignore
+   if (pidx < host.length() - 1) {
+   port = Integer.parseInt(host.substring(pidx + 1));
+   }
+   host = host.substring(0, pidx);
+   }
+} else {//IPv6
--- End diff --

@enixon thanks for your review.after collecting enough suggestions,I will 
polish up this issue. 


---


[jira] [Commented] (ZOOKEEPER-3062) introduce fsync.warningthresholdms constant for FileTxnLog LOG.warn message

2018-08-01 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16566277#comment-16566277
 ] 

Hudson commented on ZOOKEEPER-3062:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #131 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/131/])
ZOOKEEPER-3062: mention fsync.warningthresholdms in FileTxnLog LOG.warn (phunt: 
rev 7cf8035c3a5ca05bce2d183b41bf410709a5f6ee)
* (edit) src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java
* (edit) 
src/java/test/org/apache/zookeeper/server/persistence/FileTxnLogTest.java


> introduce fsync.warningthresholdms constant for FileTxnLog LOG.warn message
> ---
>
> Key: ZOOKEEPER-3062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3062
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
> Attachments: ZOOKEEPER-3062.patch, ZOOKEEPER-3062.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The
> {code}
> fsync-ing the write ahead log in ... took ... ms which will adversely effect 
> operation latency. File size is ... bytes. See the ZooKeeper troubleshooting 
> guide
> {code}
> warning mentioning the {{fsync.warningthresholdms}} configurable property 
> would make it easier to discover and also when interpreting historical vs. 
> current logs or logs from different ensembles then differences in 
> configuration would be easier to spot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper_branch35_jdk8 - Build # 1068 - Failure

2018-08-01 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/1068/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.63 KB...]
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 7
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.106 sec, Thread: 7, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.401 sec, Thread: 3, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.82 sec, Thread: 7, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 3
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.441 sec, Thread: 7, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
7
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.11 sec, Thread: 7, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
145.339 sec, Thread: 5, Class: org.apache.zookeeper.test.RecoveryTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.421 sec, Thread: 5, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 5
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
15.108 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.165 sec, Thread: 5, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 5
[junit] Tests run: 26, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.1 sec, Thread: 3, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.086 sec, Thread: 5, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 5
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
92.284 sec, Thread: 1, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.004 sec, Thread: 3, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.147 sec, Thread: 3, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 3
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.971 sec, Thread: 3, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
23.42 sec, Thread: 7, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.147 sec, Thread: 7, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
18.384 sec, Thread: 5, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.384 sec, Thread: 5, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
19.751 sec, Thread: 1, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
14.941 sec, Thread: 7, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time 

[GitHub] zookeeper pull request #588: [ZOOKEEPER-3109] Avoid long unavailable time du...

2018-08-01 Thread lvfangmin
GitHub user lvfangmin opened a pull request:

https://github.com/apache/zookeeper/pull/588

[ZOOKEEPER-3109] Avoid long unavailable time due to voter changed mind 
during leader election

For more details, please check descriptions in 
https://issues.apache.org/jira/browse/ZOOKEEPER-3109 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lvfangmin/zookeeper ZOOKEEPER-3109

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/588.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #588


commit 9611393b3d4d9e1a0327a5b8bf678e526c7fc5a7
Author: Fangmin Lyu 
Date:   2018-08-01T22:49:57Z

Avoid long unavailable time due to voter changed mind when activating the 
leader during election




---


[jira] [Updated] (ZOOKEEPER-3109) Avoid long unavailable time due to voter changed mind when activating the leader during election

2018-08-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-3109:
--
Labels: pull-request-available  (was: )

> Avoid long unavailable time due to voter changed mind when activating the 
> leader during election
> 
>
> Key: ZOOKEEPER-3109
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3109
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>
> Occasionally, we'll find it takes long time to elect a leader, might longer 
> then 1 minute, depends on how big the initLimit and tickTime are set.
>   
>  This exposes an issue in leader election protocol. During leader election, 
> before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
> finalizeWait time before changing its state. Depends on the order of 
> notifications, some voter might change mind just after it voting for a 
> server. If the server it was previous voting for has majority of votes after 
> considering this one, then that server will goto LEADING state. In some 
> corner cases, the leader may end up with timeout waiting for epoch ACK from 
> majority, because of the changed mind voter. This usually happen when there 
> are even number of servers in the ensemble (either because one of the server 
> is down or being restarted and it takes long time to restart). If there are 5 
> servers in the ensemble, then we'll find two of them in LEADING/FOLLOWING 
> state, another two in LOOKING state, but the LOOKING servers cannot join the 
> quorum since they're waiting for majority servers FOLLOWING the current 
> leader before changing to FOLLOWING as well.
>   
>  As far as we know, this voter will change mind if it received a vote from 
> another host which just started and start to vote itself, or there is a 
> server takes long time to shutdown it's previous ZK server and start to vote 
> itself when starting the leader election process.
>   
>  Also the follower may abandon the leader if the leader is not ready for 
> accepting learner connection when the follower tried to connect to it.
>   
>  To solve this issue, there are multiple options: 
> 1. increase the finalizeWait time
> 2. smartly detect this state on leader and quit earlier
>  
>  The 1st option is straightforward and easier to change, but it will cause 
> longer leader election time in common cases.
>   
>  The 2nd option is more complexity, but it can efficiently solve the problem 
> without sacrificing the performance in common cases. It remembers the first 
> majority servers voting for it, checking if there is anyone changed mind 
> while it's waiting for epoch ACK. The leader will wait for sometime before 
> quitting LEADING state, since one voter changed may not be a problem if there 
> are still majority voters voting for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3109) Avoid long unavailable time due to voter changed mind when activating the leader during election

2018-08-01 Thread Fangmin Lv (JIRA)
Fangmin Lv created ZOOKEEPER-3109:
-

 Summary: Avoid long unavailable time due to voter changed mind 
when activating the leader during election
 Key: ZOOKEEPER-3109
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3109
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.6.0
Reporter: Fangmin Lv
Assignee: Fangmin Lv
 Fix For: 3.6.0


Occasionally, we'll find it takes long time to elect a leader, might longer 
then 1 minute, depends on how big the initLimit and tickTime are set.
 
This exposes an issue in leader election protocol. During leader election, 
before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
finalizeWait time before changing its state. Depends on the order of 
notifications, some voter might change mind just after it voting for a server. 
If the server it was previous voting for has majority of votes after 
considering this one, then that server will goto LEADING state. In some corner 
cases, the leader may end up with timeout waiting for epoch ACK from majority, 
because of the changed mind voter. This usually happen when there are even 
number of servers in the ensemble (either because one of the server is down or 
being restarted and it takes long time to restart). If there are 5 servers in 
the ensemble, then we'll find two of them in LEADING/FOLLOWING state, another 
two in LOOKING state, but the LOOKING servers cannot join the quorum since 
they're waiting for majority servers FOLLOWING the current leader before 
changing to FOLLOWING as well.
 
As far as we know, this voter will change mind if it received a vote from 
another host which just started and start to vote itself, or there is a server 
takes long time to shutdown it's previous ZK server and start to vote itself 
when starting the leader election process.
 
Also the follower may abandon the leader if the leader is not ready for 
accepting learner connection when the follower tried to connect to it.
 
To solve this issue, there are multiple options: # 
increase the finalizeWait time
 # 
smartly detect this state on leader and quit earlier

 
The 1st option is straightforward and easier to change, but it will cause 
longer leader election time in common cases.
 
The 2nd option is more complexity, but it can efficiently solve the problem 
without sacrificing the performance in common cases. It remembers the first 
majority servers voting for it, checking if there is anyone changed mind while 
it's waiting for epoch ACK. The leader will wait for sometime before quitting 
LEADING state, since one voter changed may not be a problem if there are still 
majority voters voting for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3109) Avoid long unavailable time due to voter changed mind when activating the leader during election

2018-08-01 Thread Fangmin Lv (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv updated ZOOKEEPER-3109:
--
Description: 
Occasionally, we'll find it takes long time to elect a leader, might longer 
then 1 minute, depends on how big the initLimit and tickTime are set.
  
 This exposes an issue in leader election protocol. During leader election, 
before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
finalizeWait time before changing its state. Depends on the order of 
notifications, some voter might change mind just after it voting for a server. 
If the server it was previous voting for has majority of votes after 
considering this one, then that server will goto LEADING state. In some corner 
cases, the leader may end up with timeout waiting for epoch ACK from majority, 
because of the changed mind voter. This usually happen when there are even 
number of servers in the ensemble (either because one of the server is down or 
being restarted and it takes long time to restart). If there are 5 servers in 
the ensemble, then we'll find two of them in LEADING/FOLLOWING state, another 
two in LOOKING state, but the LOOKING servers cannot join the quorum since 
they're waiting for majority servers FOLLOWING the current leader before 
changing to FOLLOWING as well.
  
 As far as we know, this voter will change mind if it received a vote from 
another host which just started and start to vote itself, or there is a server 
takes long time to shutdown it's previous ZK server and start to vote itself 
when starting the leader election process.
  
 Also the follower may abandon the leader if the leader is not ready for 
accepting learner connection when the follower tried to connect to it.
  
 To solve this issue, there are multiple options: 

1. increase the finalizeWait time

2. smartly detect this state on leader and quit earlier

 
 The 1st option is straightforward and easier to change, but it will cause 
longer leader election time in common cases.
  
 The 2nd option is more complexity, but it can efficiently solve the problem 
without sacrificing the performance in common cases. It remembers the first 
majority servers voting for it, checking if there is anyone changed mind while 
it's waiting for epoch ACK. The leader will wait for sometime before quitting 
LEADING state, since one voter changed may not be a problem if there are still 
majority voters voting for it.

  was:
Occasionally, we'll find it takes long time to elect a leader, might longer 
then 1 minute, depends on how big the initLimit and tickTime are set.
 
This exposes an issue in leader election protocol. During leader election, 
before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
finalizeWait time before changing its state. Depends on the order of 
notifications, some voter might change mind just after it voting for a server. 
If the server it was previous voting for has majority of votes after 
considering this one, then that server will goto LEADING state. In some corner 
cases, the leader may end up with timeout waiting for epoch ACK from majority, 
because of the changed mind voter. This usually happen when there are even 
number of servers in the ensemble (either because one of the server is down or 
being restarted and it takes long time to restart). If there are 5 servers in 
the ensemble, then we'll find two of them in LEADING/FOLLOWING state, another 
two in LOOKING state, but the LOOKING servers cannot join the quorum since 
they're waiting for majority servers FOLLOWING the current leader before 
changing to FOLLOWING as well.
 
As far as we know, this voter will change mind if it received a vote from 
another host which just started and start to vote itself, or there is a server 
takes long time to shutdown it's previous ZK server and start to vote itself 
when starting the leader election process.
 
Also the follower may abandon the leader if the leader is not ready for 
accepting learner connection when the follower tried to connect to it.
 
To solve this issue, there are multiple options: # 
increase the finalizeWait time
 # 
smartly detect this state on leader and quit earlier

 
The 1st option is straightforward and easier to change, but it will cause 
longer leader election time in common cases.
 
The 2nd option is more complexity, but it can efficiently solve the problem 
without sacrificing the performance in common cases. It remembers the first 
majority servers voting for it, checking if there is anyone changed mind while 
it's waiting for epoch ACK. The leader will wait for sometime before quitting 
LEADING state, since one voter changed may not be a problem if there are still 
majority voters voting for it.


> Avoid long unavailable time due to voter changed mind when activating the 
> leader during election
> 

Re: Test failures (SASL) with Java 11 - any ideas?

2018-08-01 Thread Enrico Olivelli
Il mer 1 ago 2018, 21:08 Patrick Hunt  ha scritto:

> We had discussed dropping java6 as a supported platform recently. Perhaps
> yet another reason to move forward with that?
>

So if we drop java6 we can use kerby. It shouldn't be difficult, just port
3.5 branch config.
Don't know if it possible to drop java6 in a point release.


Enrico



>
> Patrick
>
> On Sun, Jul 22, 2018 at 8:13 PM Rakesh Radhakrishnan 
> wrote:
>
> >   Do you know why 3.4 is not using kerby?
> >
> > In short, Kerby was failing with java-6.  Please refer jira:
> > https://jira.apache.org/jira/browse/ZOOKEEPER-2689
> >
> > "ZooKeeper runs in Java, release 1.6 or greater (JDK 6 or greater)."
> > https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html
> >
> >
> > Rakesh
> >
> > On Sat, Jul 21, 2018 at 9:06 PM, Enrico Olivelli 
> > wrote:
> >
> > >
> > >
> > > Il sab 21 lug 2018, 17:17 Patrick Hunt  ha scritto:
> > >
> > >> On Sat, Jul 21, 2018 at 1:21 AM Enrico Olivelli 
> > >> wrote:
> > >>
> > >> > Il sab 21 lug 2018, 09:22 Patrick Hunt  ha
> scritto:
> > >> >
> > >> > > Interestingly I don't see the auth tests that are failing in 3.4
> > >> failing
> > >> > on
> > >> > > trunk (they pass), instead a number of tests fail with "Address
> > >> already
> > >> > in
> > >> > > use"
> > >> > >
> > >> > >
> > >> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> ZooKeeper-trunk-java11/3/#showFailuresLink
> > >> > >
> > >> > > 3.4 is using
> > >> > >  > >> value="2.0.0-M15"/>
> > >> > >  > value="1.0.0-M20"/>
> > >> > > while trunk moved to kerby, wonder if that could be it (sasl fails
> > at
> > >> > > least)?
> > >> > > 
> > >> > >
> > >> >
> > >> > True.
> > >> > I can't find any report of errors about Kerby + jdk11 by googling a
> > >> little.
> > >> > Maybe we are the first :)
> > >> >
> > >> >
> > >> To be clear it looks like kerby (master) is working while directory
> > (3.4)
> > >> is not - perhaps we need to update?
> > >>
> > >
> > > Do you know why 3.4 is not using kerby?
> > >
> > > Enrico
> > >
> > >
> > >> Patrick
> > >>
> > >>
> > >> > I did not start to run extensive tests of my applications on jdk11,
> I
> > >> will
> > >> > start next week.
> > >> >
> > >> > While switching from 8 to 10 I had problems with a bunch of fixes on
> > >> > Kerberos impl in java which made tests not work on testing
> > environments
> > >> due
> > >> > to stricter check about the env
> > >> >
> > >> > Enrico
> > >> >
> > >> >
> > >> > > Patrick
> > >> > >
> > >> > > On Sat, Jul 21, 2018 at 12:02 AM Patrick Hunt 
> > >> wrote:
> > >> > >
> > >> > > > Thanks Enrico. Possible. However afaict Jenkins is running build
> > 19
> > >> > > (build
> > >> > > > 11-ea+19) and I didn't notice anything obvious in the notes for
> > 19+
> > >> > > related
> > >> > > > to sasl/kerb.
> > >> > > >
> > >> > > > Patrick
> > >> > > >
> > >> > > > On Fri, Jul 20, 2018 at 11:48 PM Enrico Olivelli <
> > >> eolive...@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > >> In java11 there are a bunch of news about Kerberos, maybe it is
> > >> > related
> > >> > > >>
> > >> > > >> http://jdk.java.net/11/release-notes
> > >> > > >>
> > >> > > >> My 2 cents
> > >> > > >> Enrico
> > >> > > >>
> > >> > > >> Il sab 21 lug 2018, 08:03 Patrick Hunt  ha
> > >> scritto:
> > >> > > >>
> > >> > > >> > Hey folks, I added a couple Jenkins jobs based on Java 11
> which
> > >> is
> > >> > set
> > >> > > >> to
> > >> > > >> > release in September. Jenkins is running a pre-release
> > >> > > >> >
> > >> > > >> >
> > >> > > >>
> > >> > >
> > >> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> > >> ZooKeeper_branch34_java11/
> > >> > > >> >
> > >> > > >> > java version "11-ea" 2018-09-25
> > >> > > >> > Java(TM) SE Runtime Environment 18.9 (build 11-ea+19)
> > >> > > >> > Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11-ea+19, mixed
> > >> mode)
> > >> > > >> >
> > >> > > >> >
> > >> > > >> > Anyone have insight into what's failing here?
> > >> > > >> >
> > >> > > >> > 2018-07-20 14:39:27,126 [myid:2] - ERROR
> > >> > > >> > [QuorumConnectionThread-[myid=2]-3:QuorumCnxManager@268] -
> > >> > Exception
> > >> > > >> > while connecting, id: [0, localhost/127.0.0.1:11223], addr:
> > {},
> > >> > > >> > closing learner connection
> > >> > > >> > javax.security.sasl.SaslException: An error:
> > >> > > >> > (java.security.PrivilegedActionException:
> > >> > > >> > javax.security.sasl.SaslException: GSS initiate failed
> [Caused
> > >> by
> > >> > > >> > GSSException: No valid credentials provided (Mechanism level:
> > >> > Message
> > >> > > >> > stream modified (41) - Message stream modified)]) occurred
> when
> > >> > > >> > evaluating Zookeeper Quorum Member's  received SASL token.
> > >> > > >> >
> > >> > > >> > ...
> > >> > > >> >
> > >> > > >> > Entered Krb5Context.initSecContext with state=STATE_NEW
> > >> > > >> > Found ticket for lear...@example.com to go to
> > >> > > >> > krbtgt/example@example.com expiring on Sat Jul 21
> 14:39:01
> > 

[jira] [Commented] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space

2018-08-01 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565860#comment-16565860
 ] 

Brian Nixon commented on ZOOKEEPER-3082:


[~andorm] my (possibly incorrect) read on ZOOKEEPER-1621 is that the issue is 
related to this one but not strictly a subset. Here we've removed the 
possibility of the snapshot side of recovery being lost during a disk-full 
event. There, the issue seems to be in ensuring the transaction log side of 
recovery is not corrupted by writing empty/incomplete log files. That issue 
will continue to be present even with the patch from this file applied.

> Fix server snapshot behavior when out of disk space
> ---
>
> Key: ZOOKEEPER-3082
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.4.12, 3.5.5
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When the ZK server tries to make a snapshot and the machine is out of disk 
> space, the snapshot creation fails and throws an IOException. An empty 
> snapshot file is created, (probably because the server is able to create an 
> entry in the dir) but is not able to write to the file.
>  
> If snapshot creation fails, the server commits suicide. When it restarts, it 
> will do so from the last known good snapshot. However, when it tries to make 
> a snapshot again, the same thing happens. This results in lots of empty 
> snapshot files being created. If eventually the DataDirCleanupManager garbage 
> collects the good snapshot files then only the empty files remain. At this 
> point, the server is well and truly screwed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #566: ZOOKEEPER-3062: mention fsync.warningthresholdms in Fi...

2018-08-01 Thread phunt
Github user phunt commented on the issue:

https://github.com/apache/zookeeper/pull/566
  
lgtm. +1, thanks @cpoerschke .

Perhaps consider logging the value during startup (initial read of the 
value) instead?


---


[jira] [Resolved] (ZOOKEEPER-3062) mention fsync.warningthresholdms in FileTxnLog LOG.warn message

2018-08-01 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt resolved ZOOKEEPER-3062.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

LGTM. Thanks [~cpoerschke]!

> mention fsync.warningthresholdms in FileTxnLog LOG.warn message
> ---
>
> Key: ZOOKEEPER-3062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3062
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
> Attachments: ZOOKEEPER-3062.patch, ZOOKEEPER-3062.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The
> {code}
> fsync-ing the write ahead log in ... took ... ms which will adversely effect 
> operation latency. File size is ... bytes. See the ZooKeeper troubleshooting 
> guide
> {code}
> warning mentioning the {{fsync.warningthresholdms}} configurable property 
> would make it easier to discover and also when interpreting historical vs. 
> current logs or logs from different ensembles then differences in 
> configuration would be easier to spot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3062) introduce fsync.warningthresholdms constant for FileTxnLog LOG.warn message

2018-08-01 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-3062:

Summary: introduce fsync.warningthresholdms constant for FileTxnLog 
LOG.warn message  (was: mention fsync.warningthresholdms in FileTxnLog LOG.warn 
message)

> introduce fsync.warningthresholdms constant for FileTxnLog LOG.warn message
> ---
>
> Key: ZOOKEEPER-3062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3062
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
> Attachments: ZOOKEEPER-3062.patch, ZOOKEEPER-3062.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The
> {code}
> fsync-ing the write ahead log in ... took ... ms which will adversely effect 
> operation latency. File size is ... bytes. See the ZooKeeper troubleshooting 
> guide
> {code}
> warning mentioning the {{fsync.warningthresholdms}} configurable property 
> would make it easier to discover and also when interpreting historical vs. 
> current logs or logs from different ensembles then differences in 
> configuration would be easier to spot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3062) mention fsync.warningthresholdms in FileTxnLog LOG.warn message

2018-08-01 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-3062:

Fix Version/s: 3.4.14
   3.5.5
   3.6.0

> mention fsync.warningthresholdms in FileTxnLog LOG.warn message
> ---
>
> Key: ZOOKEEPER-3062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3062
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
> Attachments: ZOOKEEPER-3062.patch, ZOOKEEPER-3062.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The
> {code}
> fsync-ing the write ahead log in ... took ... ms which will adversely effect 
> operation latency. File size is ... bytes. See the ZooKeeper troubleshooting 
> guide
> {code}
> warning mentioning the {{fsync.warningthresholdms}} configurable property 
> would make it easier to discover and also when interpreting historical vs. 
> current logs or logs from different ensembles then differences in 
> configuration would be easier to spot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3062) mention fsync.warningthresholdms in FileTxnLog LOG.warn message

2018-08-01 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-3062:

Affects Version/s: 3.6.0
   3.5.4
   3.4.13

> mention fsync.warningthresholdms in FileTxnLog LOG.warn message
> ---
>
> Key: ZOOKEEPER-3062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3062
> Project: ZooKeeper
>  Issue Type: Task
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
> Attachments: ZOOKEEPER-3062.patch, ZOOKEEPER-3062.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The
> {code}
> fsync-ing the write ahead log in ... took ... ms which will adversely effect 
> operation latency. File size is ... bytes. See the ZooKeeper troubleshooting 
> guide
> {code}
> warning mentioning the {{fsync.warningthresholdms}} configurable property 
> would make it easier to discover and also when interpreting historical vs. 
> current logs or logs from different ensembles then differences in 
> configuration would be easier to spot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3062) mention fsync.warningthresholdms in FileTxnLog LOG.warn message

2018-08-01 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3062:
---

Assignee: Christine Poerschke

> mention fsync.warningthresholdms in FileTxnLog LOG.warn message
> ---
>
> Key: ZOOKEEPER-3062
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3062
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Attachments: ZOOKEEPER-3062.patch, ZOOKEEPER-3062.patch
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> The
> {code}
> fsync-ing the write ahead log in ... took ... ms which will adversely effect 
> operation latency. File size is ... bytes. See the ZooKeeper troubleshooting 
> guide
> {code}
> warning mentioning the {{fsync.warningthresholdms}} configurable property 
> would make it easier to discover and also when interpreting historical vs. 
> current logs or logs from different ensembles then differences in 
> configuration would be easier to spot.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #566: ZOOKEEPER-3062: mention fsync.warningthresholdm...

2018-08-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/566


---


Re: Trying to find pattern in Flaky Tests

2018-08-01 Thread Patrick Hunt
Looks like 16808 has been resolved - I haven't noticed it after the recent
changes.

Note that INFRA recently added openjdk10 to Jenkins and I added a job or
two which seem to be working OK.

Java 11 is failing on 3.4 due to broken libraries (according to Rakesh on
another thread) but we're also seeing failures on trunk which are unrelated
to that issue. Perhaps someone can take a look?

Patrick

On Tue, Jul 24, 2018 at 3:49 PM Patrick Hunt  wrote:

> FYI, there's also this which I just reported:
> https://issues.apache.org/jira/browse/INFRA-16808
>
> Patrick
>
> On Fri, Jul 20, 2018 at 12:01 AM Patrick Hunt  wrote:
>
>> Something that's significantly different about the 3.4 and 3.5/master
>> Jenkins jobs is that 3.5/master has
>>
>> test.junit.threads=8
>>
>> set while this is not supported in 3.4 (see build.xml). It's very likely
>> that the paralyzation of the tests is causing the discrepancy.
>>
>> setting threads > 1 significantly improves the speed of the jobs, that's
>> why it was originally added to 3.5+.
>> See a358280fb2b3cc7852cded3fe67769765a519beb
>>
>> Perhaps we should try one/more of the 3.5/master jobs with threads=1 and
>> see?
>>
>> Patrick
>>
>>
>>
>> On Thu, Jul 19, 2018 at 1:26 PM Molnár Andor  wrote:
>>
>>> Sorry guys for this aweful email. Looks like Apache converted my nicely
>>> illustrated email into plain text. :(
>>>
>>> Maybe I could attach the test reports as images, but I think you already
>>> got the idea.
>>>
>>>
>>> Andor
>>>
>>>
>>>
>>> On 07/18/2018 05:42 PM, Andor Molnar wrote:
>>> > Hi,
>>> >
>>> > *branch-3.4*
>>> >
>>> > I've taken a quick look at our Jenkins builds and in terms of flaky
>>> tests,
>>> > it looks like branch-3.4 is in a pretty good shape. The build hasn't
>>> failed
>>> > for 5-6 days on all JDKs which I think is pretty awesome.
>>> >
>>> > *branch-3.5*
>>> >
>>> > This branch is in very bad condition. Which is quite unfortunate given
>>> > we're in the middle of stabilising it. :)
>>> > Especially on JDK8, last successful build was 11 days ago. JDK9 (50%
>>> > failing) and JDK10 (30% failing) are looking better in the last 10
>>> builds.
>>> >
>>> > Interestingly (apart from a few quite rare ones) it looks there's only
>>> 1
>>> > test which is quite nasty on this branch:
>>> testManyChildWatchersAutoReset
>>> >
>>> > There's a Jira about fixing it and a fix has been merged by increasing
>>> the
>>> > timeout of the test, but having a bug on the branch is also possible
>>> > causing the test to fail even with 10 min timeout.
>>> >
>>> > I wasn't able to repro the failing test on my machine (Mac and
>>> CentOS7), it
>>> > always finished in 30-40 seconds maximum. On jenkins slaves it shows
>>> the
>>> > following:
>>> >
>>> > *JDK 8:*
>>> >
>>> > Report creation timed out.
>>> >
>>> >
>>> > *JDK 9:*
>>> >
>>> > New Failures
>>> > Chart
>>> > See children
>>> > Build Number ⇒
>>> > Package-Class-Testmethod names ⇓
>>> > 351
>>> > 350
>>> > 349
>>> > 348
>>> > 347
>>> > 346
>>> > 345
>>> > 344
>>> > 343
>>> > 342
>>> > 341
>>> > 340
>>> > 339
>>> > 338
>>> > 337
>>> > 336
>>> > 335
>>> > 334
>>> >  testManyChildWatchersAutoReset
>>> > 45.604
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/351/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 600.337
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/350/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 21.904
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/349/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 583.063
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/348/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 600.325
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/347/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 600.383
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/346/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 600.362
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/345/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 21.139
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/344/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> >
>>> > 24.031
>>> > <
>>> https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper_branch35_java9/343/testReport/org.apache.zookeeper.test/DisconnectedWatcherTest/testManyChildWatchersAutoReset
>>> 

Re: Test failures (SASL) with Java 11 - any ideas?

2018-08-01 Thread Patrick Hunt
We had discussed dropping java6 as a supported platform recently. Perhaps
yet another reason to move forward with that?

Patrick

On Sun, Jul 22, 2018 at 8:13 PM Rakesh Radhakrishnan 
wrote:

>   Do you know why 3.4 is not using kerby?
>
> In short, Kerby was failing with java-6.  Please refer jira:
> https://jira.apache.org/jira/browse/ZOOKEEPER-2689
>
> "ZooKeeper runs in Java, release 1.6 or greater (JDK 6 or greater)."
> https://zookeeper.apache.org/doc/r3.4.13/zookeeperAdmin.html
>
>
> Rakesh
>
> On Sat, Jul 21, 2018 at 9:06 PM, Enrico Olivelli 
> wrote:
>
> >
> >
> > Il sab 21 lug 2018, 17:17 Patrick Hunt  ha scritto:
> >
> >> On Sat, Jul 21, 2018 at 1:21 AM Enrico Olivelli 
> >> wrote:
> >>
> >> > Il sab 21 lug 2018, 09:22 Patrick Hunt  ha scritto:
> >> >
> >> > > Interestingly I don't see the auth tests that are failing in 3.4
> >> failing
> >> > on
> >> > > trunk (they pass), instead a number of tests fail with "Address
> >> already
> >> > in
> >> > > use"
> >> > >
> >> > >
> >> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> ZooKeeper-trunk-java11/3/#showFailuresLink
> >> > >
> >> > > 3.4 is using
> >> > >  >> value="2.0.0-M15"/>
> >> > >  value="1.0.0-M20"/>
> >> > > while trunk moved to kerby, wonder if that could be it (sasl fails
> at
> >> > > least)?
> >> > > 
> >> > >
> >> >
> >> > True.
> >> > I can't find any report of errors about Kerby + jdk11 by googling a
> >> little.
> >> > Maybe we are the first :)
> >> >
> >> >
> >> To be clear it looks like kerby (master) is working while directory
> (3.4)
> >> is not - perhaps we need to update?
> >>
> >
> > Do you know why 3.4 is not using kerby?
> >
> > Enrico
> >
> >
> >> Patrick
> >>
> >>
> >> > I did not start to run extensive tests of my applications on jdk11, I
> >> will
> >> > start next week.
> >> >
> >> > While switching from 8 to 10 I had problems with a bunch of fixes on
> >> > Kerberos impl in java which made tests not work on testing
> environments
> >> due
> >> > to stricter check about the env
> >> >
> >> > Enrico
> >> >
> >> >
> >> > > Patrick
> >> > >
> >> > > On Sat, Jul 21, 2018 at 12:02 AM Patrick Hunt 
> >> wrote:
> >> > >
> >> > > > Thanks Enrico. Possible. However afaict Jenkins is running build
> 19
> >> > > (build
> >> > > > 11-ea+19) and I didn't notice anything obvious in the notes for
> 19+
> >> > > related
> >> > > > to sasl/kerb.
> >> > > >
> >> > > > Patrick
> >> > > >
> >> > > > On Fri, Jul 20, 2018 at 11:48 PM Enrico Olivelli <
> >> eolive...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > >> In java11 there are a bunch of news about Kerberos, maybe it is
> >> > related
> >> > > >>
> >> > > >> http://jdk.java.net/11/release-notes
> >> > > >>
> >> > > >> My 2 cents
> >> > > >> Enrico
> >> > > >>
> >> > > >> Il sab 21 lug 2018, 08:03 Patrick Hunt  ha
> >> scritto:
> >> > > >>
> >> > > >> > Hey folks, I added a couple Jenkins jobs based on Java 11 which
> >> is
> >> > set
> >> > > >> to
> >> > > >> > release in September. Jenkins is running a pre-release
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > >
> >> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> ZooKeeper_branch34_java11/
> >> > > >> >
> >> > > >> > java version "11-ea" 2018-09-25
> >> > > >> > Java(TM) SE Runtime Environment 18.9 (build 11-ea+19)
> >> > > >> > Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11-ea+19, mixed
> >> mode)
> >> > > >> >
> >> > > >> >
> >> > > >> > Anyone have insight into what's failing here?
> >> > > >> >
> >> > > >> > 2018-07-20 14:39:27,126 [myid:2] - ERROR
> >> > > >> > [QuorumConnectionThread-[myid=2]-3:QuorumCnxManager@268] -
> >> > Exception
> >> > > >> > while connecting, id: [0, localhost/127.0.0.1:11223], addr:
> {},
> >> > > >> > closing learner connection
> >> > > >> > javax.security.sasl.SaslException: An error:
> >> > > >> > (java.security.PrivilegedActionException:
> >> > > >> > javax.security.sasl.SaslException: GSS initiate failed [Caused
> >> by
> >> > > >> > GSSException: No valid credentials provided (Mechanism level:
> >> > Message
> >> > > >> > stream modified (41) - Message stream modified)]) occurred when
> >> > > >> > evaluating Zookeeper Quorum Member's  received SASL token.
> >> > > >> >
> >> > > >> > ...
> >> > > >> >
> >> > > >> > Entered Krb5Context.initSecContext with state=STATE_NEW
> >> > > >> > Found ticket for lear...@example.com to go to
> >> > > >> > krbtgt/example@example.com expiring on Sat Jul 21 14:39:01
> >> UTC
> >> > > >> > 2018
> >> > > >> > Service ticket not found in the subject
> >> > > >> > 2018-07-20 14:39:27,127 [myid:0] - ERROR
> >> > > >> > [QuorumConnectionThread-[myid=0]-2:SaslQuorumAuthServer@133] -
> >> > Failed
> >> > > >> > to authenticate using SASL
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > >
> >> > https://builds.apache.org/view/S-Z/view/ZooKeeper/job/
> >> ZooKeeper_branch34_java11/2/#showFailuresLink
> >> > > >> >
> >> > > >> > Patrick
> >> > > >> >
> >> > > >> --
> >> > > >>
> >> > > >>

[jira] [Commented] (ZOOKEEPER-3108) deprecated myid file and use a new property "server.id" in the zoo.cfg

2018-08-01 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565814#comment-16565814
 ] 

Brian Nixon commented on ZOOKEEPER-3108:


This seems like a good idea to me (provided myid files are still supported) to 
give admins a bit more flexibility.

One reason I can think of to keep using a separate myid file is that the server 
id is the one property guaranteed to be unique for a given peer across the 
ensemble. All other properties and jvm flags may be identical across every 
instance. This makes reasoning about configuration files very easy - one simply 
propagates the same file everywhere and no custom logic is needed when 
comparing them.

Here's a link to an old discussion around myid -> 
http://zookeeper-user.578899.n2.nabble.com/The-idea-behind-myid-td3711269.html

>  deprecated myid file and use a new property "server.id" in the zoo.cfg
> ---
>
> Key: ZOOKEEPER-3108
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3108
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>
> When use zk in distributional model,we need to touch a myid file in 
> dataDir.then write a unique number to it.It is inconvenient and not 
> user-friendly,Look at an example from other distribution system such as 
> kafka:it just uses broker.id=0 in the server.properties to indentify a unique 
> server node.This issue is going to abandon the myid file and use a new 
> property such as server.id=0 in the zoo.cfg. this fix will be applied to 
> master branch,branch-3.5+,
> keep branch-3.4 unchaged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #587: ZOOKEEPER-3106: Zookeeper client supports IPv6 ...

2018-08-01 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/587#discussion_r206983579
  
--- Diff: 
src/java/main/org/apache/zookeeper/client/ConnectStringParser.java ---
@@ -68,14 +69,26 @@ public ConnectStringParser(String connectString) {
 List hostsList = split(connectString,",");
 for (String host : hostsList) {
 int port = DEFAULT_PORT;
-int pidx = host.lastIndexOf(':');
-if (pidx >= 0) {
-// otherwise : is at the end of the string, ignore
-if (pidx < host.length() - 1) {
-port = Integer.parseInt(host.substring(pidx + 1));
-}
-host = host.substring(0, pidx);
+if (!connectString.startsWith("[")) {//IPv4
+   int pidx = host.lastIndexOf(':');
+   if (pidx >= 0) {
+   // otherwise : is at the end of the string, ignore
+   if (pidx < host.length() - 1) {
+   port = Integer.parseInt(host.substring(pidx + 1));
+   }
+   host = host.substring(0, pidx);
+   }
+} else {//IPv6
+   int pidx = host.lastIndexOf(':');
+   int bracketIdx = host.lastIndexOf(']');
+   if (pidx >=0 && bracketIdx >=0 && pidx > bracketIdx) {
+   if (pidx < host.length() - 1) {
+   port = Integer.parseInt(host.substring(pidx + 1));
+   }
+   host = host.substring(0, pidx);
+   }
--- End diff --

nit - you've added tabs with your whitespace


---


[GitHub] zookeeper pull request #587: ZOOKEEPER-3106: Zookeeper client supports IPv6 ...

2018-08-01 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/587#discussion_r206987004
  
--- Diff: 
src/java/main/org/apache/zookeeper/client/ConnectStringParser.java ---
@@ -68,14 +69,26 @@ public ConnectStringParser(String connectString) {
 List hostsList = split(connectString,",");
 for (String host : hostsList) {
 int port = DEFAULT_PORT;
-int pidx = host.lastIndexOf(':');
-if (pidx >= 0) {
-// otherwise : is at the end of the string, ignore
-if (pidx < host.length() - 1) {
-port = Integer.parseInt(host.substring(pidx + 1));
-}
-host = host.substring(0, pidx);
+if (!connectString.startsWith("[")) {//IPv4
+   int pidx = host.lastIndexOf(':');
+   if (pidx >= 0) {
+   // otherwise : is at the end of the string, ignore
+   if (pidx < host.length() - 1) {
+   port = Integer.parseInt(host.substring(pidx + 1));
+   }
+   host = host.substring(0, pidx);
+   }
+} else {//IPv6
--- End diff --

purely selfish request - could you add an example to this comment? 
something like // IPv6 e.g. [2001:db8:1::242:ac11:2]:1234.

Having that on hand made reasoning about the string parsing logic much 
easier for me.


---


[GitHub] zookeeper pull request #:

2018-08-01 Thread maoling
Github user maoling commented on the pull request:


https://github.com/apache/zookeeper/commit/a2623a625a4778720f7d5482d0a66e9b37ae556f#commitcomment-29922823
  
@nkalmar  Thanks for your nice explain.
these security problems can also exist in the JMX and Jetty?


---


[GitHub] zookeeper pull request #:

2018-08-01 Thread nkalmar
Github user nkalmar commented on the pull request:


https://github.com/apache/zookeeper/commit/a2623a625a4778720f7d5482d0a66e9b37ae556f#commitcomment-29917129
  
@maoling , the problem is, there is no security implemented. Anyone user 
who can access ZooKeeper, can send commands to the ensemble. While all 4ltw 
commands is read-only, some takes quite some time, so a DOS attack is actually 
possible. 

So it has been deemed unsecure and deprecated, as far as I know. Originally 
I implemented the 4ltw command for this PR but the I was suggested to remove 
from 3.6 and 3.5


---