[jira] [Commented] (ZOOKEEPER-3038) Cleanup some nitpicks in TTL implementation

2018-05-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469974#comment-16469974
 ] 

Hudson commented on ZOOKEEPER-3038:
---

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #18 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/18/])
ZOOKEEPER-3038: Cleanup some nitpicks in TTL implementation (phunt: rev 
6e64125f2aafc29253904c43ee44233c907e5fca)
* (add) src/java/main/org/apache/zookeeper/server/EphemeralTypeEmulate353.java
* (edit) src/java/main/org/apache/zookeeper/server/EphemeralType.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
* (delete) src/java/main/org/apache/zookeeper/server/OldEphemeralType.java
* (edit) src/java/test/org/apache/zookeeper/server/Emulate353TTLTest.java


> Cleanup some nitpicks in TTL implementation
> ---
>
> Key: ZOOKEEPER-3038
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3038
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.5.4, 3.6.0
>
>
> A few nitpicks which needs to be cleaned up:
> 1. Rename OldEphemeralType --> EphemeralTypeEmulate353
>  2. Remove unused method: getTTL()
> 3. Remove unused import from QuorumPeer
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469973#comment-16469973
 ] 

Hudson commented on ZOOKEEPER-2959:
---

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #18 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/18/])
ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO ack from observers 
(ashraer: rev 088dfdf188663f6bad79b0e87b710737b318537d)
* (edit) src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java
* (edit) src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/Leader.java
* (add) src/java/test/org/apache/zookeeper/server/quorum/ZabUtils.java
* (add) 
src/java/test/org/apache/zookeeper/server/quorum/LeaderWithObserverTest.java


> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ZooKeeper-trunk - Build # 18 - Failure

2018-05-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/18/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 140.86 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.466 sec, Thread: 2, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 5
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.731 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
2
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.119 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 2
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.047 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.766 sec, Thread: 5, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
133.314 sec, Thread: 7, Class: org.apache.zookeeper.test.RecoveryTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 5
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 7
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.355 sec, Thread: 5, Class: org.apache.zookeeper.test.StatTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.242 sec, Thread: 7, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 7
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.071 sec, Thread: 7, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.753 sec, Thread: 5, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
18.773 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.11 sec, Thread: 2, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 2
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.624 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
102.803 sec, Thread: 4, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.094 sec, Thread: 4, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 4
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.19 sec, Thread: 7, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 7
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.138 sec, Thread: 7, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.214 sec, Thread: 5, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.807 sec, Thread: 4, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
33.873 sec, Thread: 2, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
280.887 sec, Thread: 6, Class: org.apache.zookeeper.test.ReconfigTest
[junit] Tests run: 104, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
564.093 sec, Thread: 1, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Tests run: 104, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
560.713 sec, Thread: 8, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] Te

[jira] [Resolved] (ZOOKEEPER-2903) Port ZOOKEEPER-2901 to 3.5.4

2018-05-09 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt resolved ZOOKEEPER-2903.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

Resolved with commit 282dc836802f67ed4814a36bdf45423cff27b577 - applied the 
2901 master PR.

> Port ZOOKEEPER-2901 to 3.5.4
> 
>
> Key: ZOOKEEPER-2903
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2903
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Jordan Zimmerman
>Assignee: Jordan Zimmerman
>Priority: Blocker
> Fix For: 3.5.4
>
>
> The TTL/Server ID bug is quite serious and should be back-ported to the 3.5.x 
> branch



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #378: [ZOOKEEPER-2903] Backport of ZOOKEEPER-2901 changes

2018-05-09 Thread phunt
Github user phunt commented on the issue:

https://github.com/apache/zookeeper/pull/378
  
@Randgalt can you close this out? I applied the master PR to branch-3.5 and 
committed it already. I think this is taken care of, lmk otw.


---


Re: Apache ZooKeeper meetup May 9th in Palo Alto?

2018-05-09 Thread Patrick Hunt
On Wed, May 9, 2018 at 9:24 PM, Jeff Widman  wrote:

> Many thanks to Patrick, Andor and the rest of the Cloudera team for hosting
> us tonight. I appreciated the chance to compare notes with other users and
> also discuss some of where the future of Zookeeper is heading.
>
>
Thanks everyone for attending. It was great to see some old as well as new
faces.


> Cheers,
> Jeff
>
> PS: Apologies to the remote folks for the dead sound at the end--the
> hangout connection to the conference room sound system died, and nobody in
> the room knew how to reconnect it.
>
>
Yea, sorry about that. IT did some magic to connect the hangout into the
internal audio system of the room and I wasn't able to figure out the codes
they used.

Regards,

Patrick


>
> On Fri, Apr 27, 2018 at 6:40 PM, Srikanth Viswanathan <
> srikant...@gmail.com>
> wrote:
>
> > Confirming attendance from Seattle as well. Looking forward to the
> > presentations. Particularly excited for the containers talk!
> >
> > On Thu, Apr 26, 2018, 10:29 Patrick Hunt  wrote:
> >
> > > Ok, great. I have two speakers already lined up: one on ZK and
> containers
> > > and another on ZK failure modes and recovery. If you haven't reached
> out
> > to
> > > me yet and you have something to talk about please LMK asap.
> > >
> > > I believe we will have video conference available but I'm not entirely
> > sure
> > > yet - I will try.
> > >
> > > I'll finalize things and send out a more detailed agenda.
> > >
> > > Regards,
> > >
> > > Patrick
> > >
> > > On Thu, Apr 26, 2018 at 1:33 AM, Shivam Goel 
> > wrote:
> > >
> > > > Count me in !!
> > > >
> > > > On Mon, Apr 23, 2018 at 10:08 AM Patrick Hunt 
> > wrote:
> > > >
> > > > > Hi folks. I am interested in hosting a ZooKeeper meetup May 9th in
> > > > > Cloudera's Palo Alto offices. It's been a while since we last got
> > > > together,
> > > > > lots of recent changes and some big plans, new additions to the PMC
> > and
> > > > > committer lists, new contributors.
> > > > >
> > > > > I was hoping to use the mailing lists to gauge interest. Please
> reply
> > > if
> > > > > you
> > > > > think you would be able to attend or would prefer a different date.
> > > Also
> > > > > let me know if there's something you would like to present to the
> > > group.
> > > > > Food
> > > > > and beer will be provided.
> > > > >
> > > > > Looking forward to hearing from everyone.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Patrick
> > > > >
> > > >
> > >
> >
>
>
>
> --
>
> *Jeff Widman*
> jeffwidman.com  | 740-WIDMAN-J (943-6265)
> <><
>


Re: Apache ZooKeeper meetup May 9th in Palo Alto?

2018-05-09 Thread Jeff Widman
Many thanks to Patrick, Andor and the rest of the Cloudera team for hosting
us tonight. I appreciated the chance to compare notes with other users and
also discuss some of where the future of Zookeeper is heading.

Cheers,
Jeff

PS: Apologies to the remote folks for the dead sound at the end--the
hangout connection to the conference room sound system died, and nobody in
the room knew how to reconnect it.


On Fri, Apr 27, 2018 at 6:40 PM, Srikanth Viswanathan 
wrote:

> Confirming attendance from Seattle as well. Looking forward to the
> presentations. Particularly excited for the containers talk!
>
> On Thu, Apr 26, 2018, 10:29 Patrick Hunt  wrote:
>
> > Ok, great. I have two speakers already lined up: one on ZK and containers
> > and another on ZK failure modes and recovery. If you haven't reached out
> to
> > me yet and you have something to talk about please LMK asap.
> >
> > I believe we will have video conference available but I'm not entirely
> sure
> > yet - I will try.
> >
> > I'll finalize things and send out a more detailed agenda.
> >
> > Regards,
> >
> > Patrick
> >
> > On Thu, Apr 26, 2018 at 1:33 AM, Shivam Goel 
> wrote:
> >
> > > Count me in !!
> > >
> > > On Mon, Apr 23, 2018 at 10:08 AM Patrick Hunt 
> wrote:
> > >
> > > > Hi folks. I am interested in hosting a ZooKeeper meetup May 9th in
> > > > Cloudera's Palo Alto offices. It's been a while since we last got
> > > together,
> > > > lots of recent changes and some big plans, new additions to the PMC
> and
> > > > committer lists, new contributors.
> > > >
> > > > I was hoping to use the mailing lists to gauge interest. Please reply
> > if
> > > > you
> > > > think you would be able to attend or would prefer a different date.
> > Also
> > > > let me know if there's something you would like to present to the
> > group.
> > > > Food
> > > > and beer will be provided.
> > > >
> > > > Looking forward to hearing from everyone.
> > > >
> > > > Regards,
> > > >
> > > > Patrick
> > > >
> > >
> >
>



-- 

*Jeff Widman*
jeffwidman.com  | 740-WIDMAN-J (943-6265)
<><


[GitHub] zookeeper issue #516: ZOOKEEPER-3038 Cleanup some nitpicks in TTL implementa...

2018-05-09 Thread phunt
Github user phunt commented on the issue:

https://github.com/apache/zookeeper/pull/516
  
+1 - thanks Andor.


---


[jira] [Resolved] (ZOOKEEPER-3038) Cleanup some nitpicks in TTL implementation

2018-05-09 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt resolved ZOOKEEPER-3038.
-
   Resolution: Fixed
Fix Version/s: 3.5.4
   3.6.0

Issue resolved by pull request 516
[https://github.com/apache/zookeeper/pull/516]

> Cleanup some nitpicks in TTL implementation
> ---
>
> Key: ZOOKEEPER-3038
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3038
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.6.0, 3.5.4
>
>
> A few nitpicks which needs to be cleaned up:
> 1. Rename OldEphemeralType --> EphemeralTypeEmulate353
>  2. Remove unused method: getTTL()
> 3. Remove unused import from QuorumPeer
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #516: ZOOKEEPER-3038 Cleanup some nitpicks in TTL imp...

2018-05-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/516


---


[jira] [Resolved] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-09 Thread Alexander Shraer (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Shraer resolved ZOOKEEPER-2959.
-
   Resolution: Fixed
Fix Version/s: 3.4.13
   3.6.0
   3.5.4

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 1 leader, 2 followers and 1 observer, 
> and now the leader and the observer have reported their accepted epochs while 
> neither of the followers has. Thus, the connectingFollowers set consists of 
> two elements, resulting in a size of 2, which is greater than half quorum, 
> namely, 2. Then QuorumVerifier#containsQuorum will return true, because it 
> does not check whether the elements of the parameter are participants.
> The same flaw exists in 
> org.apache.zookeeper.server.quorum.Leader#waitForEpochAck



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #512: ZOOKEEPER-2959: ignore accepted epoch and LEADERINFO a...

2018-05-09 Thread shralex
Github user shralex commented on the issue:

https://github.com/apache/zookeeper/pull/512
  
+1 looks good


---


ZooKeeper_branch35_jdk8 - Build # 953 - Still Failing

2018-05-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/953/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 60.65 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.07 sec, Thread: 5, Class: org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.523 sec, Thread: 3, Class: 
org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.544 sec, Thread: 4, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
3
[junit] Running org.apache.zookeeper.test.SessionTest in thread 4
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.518 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest in thread 3
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.718 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTimeoutTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
3
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.078 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.354 sec, Thread: 5, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 3
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.519 sec, Thread: 5, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.843 sec, Thread: 5, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 5
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.696 sec, Thread: 5, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.057 sec, Thread: 5, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.509 sec, Thread: 5, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
13.255 sec, Thread: 4, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 5
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 4
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
70.78 sec, Thread: 6, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
6.373 sec, Thread: 5, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.068 sec, Thread: 5, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 5
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
76.46 sec, Thread: 2, Class: org.apache.zookeeper.test.QuorumTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
18.552 sec, Thread: 3, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.228 sec, Thread: 5, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 6
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 2
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.082 sec, Thread: 2, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 3
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.676 sec, Thread: 5, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
8.264 sec, Thread: 3, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[jun

Re: [VOTE] Migrate ZK to Maven build

2018-05-09 Thread Prasanth Mathialagan
+1

On Mon, May 7, 2018 at 12:44 AM, Norbert Kalmar 
wrote:

> Yes, the plan is to backport to 3.5 and also 3.4, but the possibility to
> backport just the package changes is still in question.
>
> I am writing up a document which I will share for comments. Should be done
> in 1 or 2 days. I am also checking if they will work, so that I can get
> maven to work as expected on my local branch.
>
> Thanks for the patience!
> Norbert
>
> On Mon, May 7, 2018 at 7:36 AM Andor Molnar  wrote:
>
> > Correct. Once it's successful on master, it should be backported to all
> > branches back to 3.4.
> > However, I think we've just voted on the idea of Maven, the detailed
> > document of the plan is coming from Norbert soon which we'll be able to
> > comment on in detail.
> >
> > Andor
> >
> >
> >
> > On Sun, May 6, 2018 at 8:24 PM, Patrick Hunt  wrote:
> >
> > > The JIRA says 3.6.0 (master). Is that what we're voting on? It seems
> like
> > > we should backport this to all active branches, no? Thoughts?
> > >
> > > Patrick
> > >
> > > On Tue, May 1, 2018 at 9:29 PM, Michael Han  wrote:
> > >
> > > > +1
> > > >
> > > > On Mon, Apr 23, 2018 at 4:06 AM, Jordan Zimmerman <
> > > > jor...@jordanzimmerman.com> wrote:
> > > >
> > > > > +1 (non binding)
> > > > >
> > > > > > On Apr 23, 2018, at 6:21 PM, Mohammad arshad <
> > > > mohammad.ars...@huawei.com>
> > > > > wrote:
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > -Original Message-
> > > > > > From: Andor Molnar [mailto:an...@cloudera.com]
> > > > > > Sent: Monday, April 23, 2018 4:43 PM
> > > > > > To: dev@zookeeper.apache.org
> > > > > > Subject: Re: [VOTE] Migrate ZK to Maven build
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > On Mon, Apr 23, 2018 at 10:30 AM, Tamas Penzes <
> > tam...@cloudera.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1 (non-binding)
> > > > > >>
> > > > > >> On Fri, Apr 20, 2018 at 4:06 PM, Norbert Kalmar <
> > > nkal...@cloudera.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi,
> > > > > >>>
> > > > > >>> Let's start a vote on migrating to maven instead of ant.
> > > > > >>> https://issues.apache.org/jira/browse/ZOOKEEPER-3021
> > > > > >>>
> > > > > >>> *Shall we migrate ZooKeeper build from ant to Maven?*
> > > > > >>>
> > > > > >>> Please reply with [Yes / +1] or [No / -1] to this thread.
> > > > > >>>
> > > > > >>> Thanks,
> > > > > >>> Norbert
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> *Tamás Pénzes* | Engineering Manager
> > > > > >> e. tam...@cloudera.com
> > > > > >> cloudera.com 
> > > > > >>
> > > > > >> [image: Cloudera] 
> > > > > >>
> > > > > >> [image: Cloudera on Twitter] 
> > [image:
> > > > > >> Cloudera on Facebook] 
> [image:
> > > > > >> Cloudera on LinkedIn]  company/cloudera>
> > > > > >> --
> > > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Cheers
> > > > Michael
> > > >
> > >
> >
>


Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Prasanth Mathialagan
Hi,
This looks cool :) I have a suggestion. It would be nice if we could add
the current size of the heap or (% of heap used) in the log entry whenever
sleep threshold had exceeded a lot. It could be helpful.

On Wed, May 9, 2018 at 11:26 AM, Patrick Hunt  wrote:

> On Wed, May 9, 2018 at 11:11 AM, Norbert Kalmar 
> wrote:
>
> > Thanks Patrick, great question.
> > My understanding is that this tool not only shows if JVM spends too much
> > time in GC, but if, for any other reason, there is a JVM pause (The tool
> > only differentiates GC pause from all other pause). This could be slow
> > fsync (although we do have logs for that) or even server/OS related.
> >
> > But again, this is just my interpretation. I will ask the source of the
> > idea, what extra benefits this gives them over java GC log.
> >
> > I checked ZK, I don't see it enabled by default, but GC logging can be
> set
> > with JVM parameters easily, so that shouldn't be a key factor anyway.
> >
> >
> I think that would be a useful change regardless - to make it on by default
> I mean. Also some docs wrt our recommendations, how to troubleshoot, etc...
> Adding a feature is useful, but ensuring people know about it and can  use
> it effectively is even more so.
>
> Regards,
>
> Patrick
>
>
> > Regards,
> > Norbert
> >
> > On Wed, May 9, 2018 at 7:57 PM Patrick Hunt  wrote:
> >
> > > Do you know why they did this rather than just enabling GC logging by
> > > default? Why re-invent the wheel?
> > >
> > > I seem to remember seeing a push do enable GC logging by default a few
> > > years ago. In particular around the time when the JVM added GC log
> > rolling
> > > as a feature. Here's an example:
> > >
> > > https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-
> > to-enable-logs-rotation-with-hotspot/
> > > My understanding is that the overhead is so low that it's feasible to
> do
> > > this.
> > >
> > > Good improvement though regardless which way we go.
> > >
> > > Regards,
> > >
> > > Patrick
> > >
> > > On Wed, May 9, 2018 at 9:36 AM, Andor Molnar 
> wrote:
> > >
> > > > +1 cool!
> > > >
> > > >
> > > > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar  >
> > > > wrote:
> > > >
> > > > > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > > > >
> > > > > Regards,
> > > > > Norbert
> > > > >
> > > > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> > > edward.ribe...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1. Sounds really nice to have feature. Let's open a ticket and
> > open
> > > a
> > > > > PR.
> > > > > > :)
> > > > > >
> > > > > > Ed
> > > > > >
> > > > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <
> > nkal...@cloudera.com
> > > >
> > > > > > escreveu:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I just got a tip that we could improve on the logging in
> > ZooKeeper.
> > > > > > After a
> > > > > > > ZK crash, or client timeout sometimes it's hard to determine
> from
> > > the
> > > > > > logs
> > > > > > > what happened. Knowing if ZK was responsive at the time would
> > help
> > > a
> > > > > lot.
> > > > > > > For example, ZK might spend a lot of time waiting on GC (there
> is
> > > > still
> > > > > > > some misconception that ZK is a storage).
> > > > > > >
> > > > > > > To help detect this, HADOOP already has a great tool called JVM
> > > Pause
> > > > > > > Monitor. (As the name suggest, it can be also used for
> > monitoring,
> > > > but
> > > > > it
> > > > > > > also helps post-mortem in a lot of cases). Basically it has a
> > > daemon
> > > > > that
> > > > > > > sleeps for one second, and if the sleep time exceeds the 1s by
> > more
> > > > > than
> > > > > > > the threshold (1s: INFO, 10s: WARN by default - this can be
> > > > > configurable
> > > > > > in
> > > > > > > our case, see below), it will alert/make a log entry. It can
> also
> > > > > monitor
> > > > > > > the time GC took.
> > > > > > >
> > > > > > > Now, this class is in the HADOOP-common. I wouldn't want to
> > depend
> > > on
> > > > > > > Hadoop-common because of this one feature/class (it is
> actually a
> > > > > single
> > > > > > > class). Since this is a straightforward implementation, and in
> > the
> > > > past
> > > > > > > five years the few commits it had is nothing really serious, I
> > > think
> > > > we
> > > > > > > could just copy this class in ZooKeeper, and introduce it as a
> > > > > > configurable
> > > > > > > feature, by default it can be off.
> > > > > > >
> > > > > > > The class:
> > > > > > >
> > > > > > >
> > > > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > > > common-project/hadoop-common/src/main/java/org/apache/
> > > > > hadoop/util/JvmPauseMonitor.java
> > > > > > >
> > > > > > > What do You think?
> > > > > > >
> > > > > > > Regards,
> > > > > > > Norbert
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Question on merge script

2018-05-09 Thread Edward Ribeiro
FYI, the merge script was created in the Spark project then ported to Kafka
project. And Kafka version was ported to ZK. :)

Ed

Em qua, 9 de mai de 2018 19:46, Patrick Hunt  escreveu:

> I believe we forked the script and the process/docs off another TLP,
> perhaps spark or kafka? Might be worth checking what they are currently
> doing/changed.
>
> Patrick
>
> On Wed, May 9, 2018 at 1:45 PM, Flavio Junqueira  wrote:
>
> > Thanks for the feedback, Pat. I think the wiki page with merge script
> > instructions needs updating. I'll explore it a bit further and will
> update
> > it.
> >
> > -Flavio
> >
> > > On 9 May 2018, at 20:05, Patrick Hunt  wrote:
> > >
> > > On Wed, May 9, 2018 at 1:18 AM, Flavio Junqueira 
> wrote:
> > >
> > >> Hey Michael,
> > >>
> > >> I was trying to merge yesterday a PR generated against branch-3.5, and
> > >> fetching the PR branch did not give me the merge script. I ended up
> > asking
> > >> the contributor to change the target branch to master so that I avoid
> > any
> > >> small hacks with the merge script.
> > >>
> > >>
> > > fwiw that's not the workflow I use. I always fetch the latest repo
> > content,
> > > then switch to the master and use the script to merge/push a PR. It
> > doesn't
> > > matter which PR or branch you want to merge, you just run the script
> off
> > > master and it handles the rest. If the branch/PR is off 3.4 it all just
> > > works.
> > >
> > >
> > >> We should consider doing the following two things, and let me know if
> it
> > >> makes sense:
> > >> 1- Clarifying that if a change is supposed to go to both branch-3.5
> and
> > >> master, the PR should be against master
> > >>
> > >
> > > As long as it applies cleanly to master and br35 (etc...) this is not
> > > really necessary. You use the merge script to merge it into the target
> > > branch, then after you push that change to apache git repo it will ask
> > you
> > > if you want to merge to other branches. Typically I would ask the OP to
> > > post multiple PRs if there are conflicts. I don't usually commit to
> just
> > > one branch if the change is necessary for multiple branches and there
> are
> > > conflicts. (I wait for all the PRs covering all the branches cleanly)
> > >
> > >
> > >> 2- Perhaps merging to branch-3.5 so that I see the script when I
> fetch a
> > >> PR branch off branch-3.5. This is unusual, but it is not unreasonable
> > that
> > >> we have eventually PRs for branch-3.5 only.
> > >>
> > >> I'm focusing on 3.5, but the same reasoning applies to 3.4.
> > >>
> > >>
> > > I always just start with master checked out and run the script. Seems
> > fine
> > > to me and it means we don't need to maintain multiple versions of the
> > > scripts and keep them in sync. What's the benefit of doing otw?
> > >
> > > Patrick
> > >
> > >
> > >> -Flavio
> > >>
> > >>
> > >>> On 9 May 2018, at 01:49, Michael Han  wrote:
> > >>>
> > >>> Hi Flavio,
> > >>>
> > >>> The merge script is branch agnostic - it only cares about the pull
> > >> request
> > >>> number. As long as in the pull request the correct target branch is
> > >>> specified, the merge script will do its job by merging the change to
> > the
> > >>> specified target branch. I guess we could commit the same script to
> > >>> branch-3.5 but the current script in master should be able to do what
> > you
> > >>> asked.
> > >>>
> > >>> On Tue, May 8, 2018 at 4:06 PM, Flavio Junqueira 
> > wrote:
> > >>>
> >  Could anyone remind me why we don't have the merge script on
> > branch-3.5?
> >  Say I have a change that targets branch-3.5 alone. Shouldn't I be
> able
> > >> to
> >  have a PR that targets branch-3.5 and use the merge script?
> > 
> >  Thanks,
> >  -Flavio
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Cheers
> > >>> Michael
> > >>
> > >>
> >
> >
>


[jira] [Commented] (ZOOKEEPER-3039) TxnLogToolkit uses Scanner badly

2018-05-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469663#comment-16469663
 ] 

Hadoop QA commented on ZOOKEEPER-3039:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to cause Findbugs (version 3.0.1) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1670//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1670//console

This message is automatically generated.

> TxnLogToolkit uses Scanner badly
> 
>
> Key: ZOOKEEPER-3039
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3039
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.4, 3.6.0
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>
> If more than 1 CRC error is found in the Txn log file, TxnLogToolkit fails to 
> get an answer for the second one, because it has already closed the Scanner 
> which was probably closed the input stream also, so exception is thrown:
> {noformat}
> ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
> CRC ERROR - 4/5/18 5:16:05 AM PDT session 0x16295bafcc4 cxid 0x1 zxid 
> 0x10002 closeSession null
> Would you like to fix it (Yes/No/Abort) ? y
> CRC ERROR - 4/5/18 5:17:34 AM PDT session 0x26295bafcc9 cxid 0x0 zxid 
> 0x20001 closeSession null
> Would you like to fix it (Yes/No/Abort) ? Exception in thread "main" 
> java.util.NoSuchElementException
> at java.util.Scanner.throwFor(Scanner.java:862)
> at java.util.Scanner.next(Scanner.java:1371)
> at 
> org.apache.zookeeper.server.persistence.TxnLogToolkit.askForFix(TxnLogToolkit.java:208)
> at 
> org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:175)
> at 
> org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:101){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: ZOOKEEPER- PreCommit Build #1670

2018-05-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1670/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 34.13 KB...]
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] -1 findbugs.  The patch appears to cause Findbugs (version 
3.0.1) to fail.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1670//testReport/
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1670//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment with id 16469663 added to ZOOKEEPER-3039.
 [exec] Session logged out. Session was 
JSESSIONID=512669126B8E6687EBF08C697E317D02.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build@2/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build@2/patchprocess’
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build@2/build.xml:1737:
 exec returned: 2

Total time: 2 minutes 34 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3039
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
No tests ran.

Failed: ZOOKEEPER- PreCommit Build #1669

2018-05-09 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1669/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 2.14 MB...]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] -1 contrib tests.  The patch failed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1669//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1669//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1669//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] 
 [exec] Remote error: User is not authorized to perform the request. 
Response code: 401.
 [exec] Session logged out. Session was 
JSESSIONID=F783B71B20A1410026B509D0A7D9D51C.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] mv: 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 and 
‘/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess’
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1737:
 exec returned: 2

Total time: 3 minutes 10 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Recording test results
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
[description-setter] Description set: ZOOKEEPER-3039
Putting comment on the pull request
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8
Setting JDK_1_8_LATEST__HOME=/home/jenkins/tools/java/latest1.8



###
## FAILED TESTS (if any) 
##
All tests passed

[GitHub] zookeeper pull request #517: ZOOKEEPER-3039 TxnLogToolkit uses Scanner badly

2018-05-09 Thread anmolnar
GitHub user anmolnar opened a pull request:

https://github.com/apache/zookeeper/pull/517

ZOOKEEPER-3039 TxnLogToolkit uses Scanner badly

Fixed by creating a single Scanner for all queries in the main() method.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-3039

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/517.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #517


commit 8ff6dad5c7905db04b1ed836fc233ac648b2b5f5
Author: Andor Molnar 
Date:   2018-05-09T23:15:59Z

ZOOKEEPER-3039. Use the same Scanner for all queries




---


[jira] [Updated] (ZOOKEEPER-3039) TxnLogToolkit uses Scanner badly

2018-05-09 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-3039:

Affects Version/s: 3.6.0
   3.5.4

> TxnLogToolkit uses Scanner badly
> 
>
> Key: ZOOKEEPER-3039
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3039
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.4, 3.6.0
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>
> If more than 1 CRC error is found in the Txn log file, TxnLogToolkit fails to 
> get an answer for the second one, because it has already closed the Scanner 
> which was probably closed the input stream also, so exception is thrown:
> {noformat}
> ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
> CRC ERROR - 4/5/18 5:16:05 AM PDT session 0x16295bafcc4 cxid 0x1 zxid 
> 0x10002 closeSession null
> Would you like to fix it (Yes/No/Abort) ? y
> CRC ERROR - 4/5/18 5:17:34 AM PDT session 0x26295bafcc9 cxid 0x0 zxid 
> 0x20001 closeSession null
> Would you like to fix it (Yes/No/Abort) ? Exception in thread "main" 
> java.util.NoSuchElementException
> at java.util.Scanner.throwFor(Scanner.java:862)
> at java.util.Scanner.next(Scanner.java:1371)
> at 
> org.apache.zookeeper.server.persistence.TxnLogToolkit.askForFix(TxnLogToolkit.java:208)
> at 
> org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:175)
> at 
> org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:101){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3039) TxnLogToolkit uses Scanner badly

2018-05-09 Thread Andor Molnar (JIRA)
Andor Molnar created ZOOKEEPER-3039:
---

 Summary: TxnLogToolkit uses Scanner badly
 Key: ZOOKEEPER-3039
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3039
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Andor Molnar
Assignee: Andor Molnar


If more than 1 CRC error is found in the Txn log file, TxnLogToolkit fails to 
get an answer for the second one, because it has already closed the Scanner 
which was probably closed the input stream also, so exception is thrown:
{noformat}
ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
CRC ERROR - 4/5/18 5:16:05 AM PDT session 0x16295bafcc4 cxid 0x1 zxid 
0x10002 closeSession null
Would you like to fix it (Yes/No/Abort) ? y
CRC ERROR - 4/5/18 5:17:34 AM PDT session 0x26295bafcc9 cxid 0x0 zxid 
0x20001 closeSession null
Would you like to fix it (Yes/No/Abort) ? Exception in thread "main" 
java.util.NoSuchElementException
at java.util.Scanner.throwFor(Scanner.java:862)
at java.util.Scanner.next(Scanner.java:1371)
at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.askForFix(TxnLogToolkit.java:208)
at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.dump(TxnLogToolkit.java:175)
at 
org.apache.zookeeper.server.persistence.TxnLogToolkit.main(TxnLogToolkit.java:101){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3038) Cleanup some nitpicks in TTL implementation

2018-05-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469641#comment-16469641
 ] 

Hadoop QA commented on ZOOKEEPER-3038:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1668//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1668//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1668//console

This message is automatically generated.

> Cleanup some nitpicks in TTL implementation
> ---
>
> Key: ZOOKEEPER-3038
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3038
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>
> A few nitpicks which needs to be cleaned up:
> 1. Rename OldEphemeralType --> EphemeralTypeEmulate353
>  2. Remove unused method: getTTL()
> 3. Remove unused import from QuorumPeer
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Question on merge script

2018-05-09 Thread Patrick Hunt
I believe we forked the script and the process/docs off another TLP,
perhaps spark or kafka? Might be worth checking what they are currently
doing/changed.

Patrick

On Wed, May 9, 2018 at 1:45 PM, Flavio Junqueira  wrote:

> Thanks for the feedback, Pat. I think the wiki page with merge script
> instructions needs updating. I'll explore it a bit further and will update
> it.
>
> -Flavio
>
> > On 9 May 2018, at 20:05, Patrick Hunt  wrote:
> >
> > On Wed, May 9, 2018 at 1:18 AM, Flavio Junqueira  wrote:
> >
> >> Hey Michael,
> >>
> >> I was trying to merge yesterday a PR generated against branch-3.5, and
> >> fetching the PR branch did not give me the merge script. I ended up
> asking
> >> the contributor to change the target branch to master so that I avoid
> any
> >> small hacks with the merge script.
> >>
> >>
> > fwiw that's not the workflow I use. I always fetch the latest repo
> content,
> > then switch to the master and use the script to merge/push a PR. It
> doesn't
> > matter which PR or branch you want to merge, you just run the script off
> > master and it handles the rest. If the branch/PR is off 3.4 it all just
> > works.
> >
> >
> >> We should consider doing the following two things, and let me know if it
> >> makes sense:
> >> 1- Clarifying that if a change is supposed to go to both branch-3.5 and
> >> master, the PR should be against master
> >>
> >
> > As long as it applies cleanly to master and br35 (etc...) this is not
> > really necessary. You use the merge script to merge it into the target
> > branch, then after you push that change to apache git repo it will ask
> you
> > if you want to merge to other branches. Typically I would ask the OP to
> > post multiple PRs if there are conflicts. I don't usually commit to just
> > one branch if the change is necessary for multiple branches and there are
> > conflicts. (I wait for all the PRs covering all the branches cleanly)
> >
> >
> >> 2- Perhaps merging to branch-3.5 so that I see the script when I fetch a
> >> PR branch off branch-3.5. This is unusual, but it is not unreasonable
> that
> >> we have eventually PRs for branch-3.5 only.
> >>
> >> I'm focusing on 3.5, but the same reasoning applies to 3.4.
> >>
> >>
> > I always just start with master checked out and run the script. Seems
> fine
> > to me and it means we don't need to maintain multiple versions of the
> > scripts and keep them in sync. What's the benefit of doing otw?
> >
> > Patrick
> >
> >
> >> -Flavio
> >>
> >>
> >>> On 9 May 2018, at 01:49, Michael Han  wrote:
> >>>
> >>> Hi Flavio,
> >>>
> >>> The merge script is branch agnostic - it only cares about the pull
> >> request
> >>> number. As long as in the pull request the correct target branch is
> >>> specified, the merge script will do its job by merging the change to
> the
> >>> specified target branch. I guess we could commit the same script to
> >>> branch-3.5 but the current script in master should be able to do what
> you
> >>> asked.
> >>>
> >>> On Tue, May 8, 2018 at 4:06 PM, Flavio Junqueira 
> wrote:
> >>>
>  Could anyone remind me why we don't have the merge script on
> branch-3.5?
>  Say I have a change that targets branch-3.5 alone. Shouldn't I be able
> >> to
>  have a PR that targets branch-3.5 and use the merge script?
> 
>  Thanks,
>  -Flavio
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Cheers
> >>> Michael
> >>
> >>
>
>


[jira] [Updated] (ZOOKEEPER-3038) Cleanup some nitpicks in TTL implementation

2018-05-09 Thread Andor Molnar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated ZOOKEEPER-3038:

Description: 
A few nitpicks which needs to be cleaned up:

1. Rename OldEphemeralType --> EphemeralTypeEmulate353
 2. Remove unused method: getTTL()
3. Remove unused import from QuorumPeer

 

  was:
A few nitpicks which needs to be cleaned up:

1. Rename OldEphemeralType --> EphemeralTypeEmulate353
2. Remove unused method: getTTL()
3. Log message fix
4. Remove unused import from QuorumPeer

 


> Cleanup some nitpicks in TTL implementation
> ---
>
> Key: ZOOKEEPER-3038
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3038
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>
> A few nitpicks which needs to be cleaned up:
> 1. Rename OldEphemeralType --> EphemeralTypeEmulate353
>  2. Remove unused method: getTTL()
> 3. Remove unused import from QuorumPeer
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #516: ZOOKEEPER-3038 Cleanup some nitpicks in TTL imp...

2018-05-09 Thread anmolnar
GitHub user anmolnar opened a pull request:

https://github.com/apache/zookeeper/pull/516

ZOOKEEPER-3038 Cleanup some nitpicks in TTL implementation

A few nitpicks which needs to be cleaned up:

1. Rename OldEphemeralType --> EphemeralTypeEmulate353
2. Remove unused method: getTTL()
3. Remove unused import from QuorumPeer


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anmolnar/zookeeper ZOOKEEPER-3038

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #516


commit 0ba34b35a8fd9c063ca89c1c5697378c608671a1
Author: Andor Molnar 
Date:   2018-05-09T22:42:32Z

ZOOKEEPER-3038. Code review fixes detailed in the Jira




---


[jira] [Created] (ZOOKEEPER-3038) Cleanup some nitpicks in TTL implementation

2018-05-09 Thread Andor Molnar (JIRA)
Andor Molnar created ZOOKEEPER-3038:
---

 Summary: Cleanup some nitpicks in TTL implementation
 Key: ZOOKEEPER-3038
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3038
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.3
Reporter: Andor Molnar
Assignee: Andor Molnar


A few nitpicks which needs to be cleaned up:

1. Rename OldEphemeralType --> EphemeralTypeEmulate353
2. Remove unused method: getTTL()
3. Log message fix
4. Remove unused import from QuorumPeer

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type

2018-05-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469623#comment-16469623
 ] 

Hudson commented on ZOOKEEPER-2901:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #16 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/16/])
ZOOKEEPER-2901: TTL Nodes don't work with Server IDs > 127 (phunt: rev 
ceaeccd6e310983d37e685a9d5fff3d7e75cf125)
* (edit) src/java/main/org/apache/zookeeper/server/ContainerManager.java
* (add) src/java/main/org/apache/zookeeper/server/OldEphemeralType.java
* (edit) src/java/test/org/apache/zookeeper/test/TruncateTest.java
* (edit) src/java/test/org/apache/zookeeper/test/ClientBase.java
* (edit) src/java/main/org/apache/zookeeper/server/SessionTrackerImpl.java
* (edit) src/java/main/org/apache/zookeeper/ZooKeeper.java
* (edit) src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml
* (edit) src/java/main/org/apache/zookeeper/server/DataTree.java
* (edit) src/java/main/org/apache/zookeeper/server/EphemeralType.java
* (edit) src/java/main/org/apache/zookeeper/server/quorum/QuorumPeer.java
* (edit) src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
* (edit) src/java/test/org/apache/zookeeper/server/CreateContainerTest.java
* (edit) src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
* (edit) src/java/main/org/apache/zookeeper/cli/CreateCommand.java
* (add) src/java/test/org/apache/zookeeper/server/Emulate353TTLTest.java
* (edit) src/java/test/org/apache/zookeeper/server/CreateTTLTest.java
* (add) src/java/test/org/apache/zookeeper/server/ServerIdTest.java
* (edit) src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java
* (edit) src/java/test/config/findbugsExcludeFile.xml
* (edit) src/java/test/org/apache/zookeeper/server/EphemeralTypeTest.java


> Session ID that is negative causes mis-calculation of Ephemeral Type
> 
>
> Key: ZOOKEEPER-2901
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
> Environment: Running 3.5.3-beta in Docker container
>Reporter: Mark Johnson
>Assignee: Jordan Zimmerman
>Priority: Blocker
> Fix For: 3.5.4, 3.6.0
>
>
> In the code that determines the EphemeralType it is looking at the owner 
> (which is the client ID or connection ID):
> EphemeralType.java:
>public static EphemeralType get(long ephemeralOwner) {
>if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
>return CONTAINER;
>}
>if (ephemeralOwner < 0) {
>return TTL;
>}
>return (ephemeralOwner == 0) ? VOID : NORMAL;
>}
> However my connection ID is:
> header.getClientId(): -720548323429908480
> This causes the code to think this is a TTL Ephemeral node instead of a
> NORMAL Ephemeral node.
> This also explains why this is random - if my client ID is non-negative
> then the node gets added correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3012) Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders

2018-05-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469624#comment-16469624
 ] 

Hudson commented on ZOOKEEPER-3012:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #16 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/16/])
ZOOKEEPER-3012: Fix unit test: testDataDirAndDataLogDir should not use (phunt: 
rev 43f117ef5098573d7378956358c653475a4b993e)
* (edit) 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java


> Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders
> 
>
> Key: ZOOKEEPER-3012
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3012
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server, tests
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>  Labels: unit-test
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> The following arrange methods uses hard coded values:
> {noformat}
> when(configMock.getDataDir()).thenReturn("/tmp/zookeeper");
> when(configMock.getDataLogDir()).thenReturn("/tmp/zookeeperLog");
> {noformat}
> Which makes the test fail if the folders exist on the running machine.
> Random test folders should be created and removed during cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #515: ZOOKEEPER-3012. Fix unit test: testDataDirAndDa...

2018-05-09 Thread anmolnar
Github user anmolnar closed the pull request at:

https://github.com/apache/zookeeper/pull/515


---


[GitHub] zookeeper issue #515: ZOOKEEPER-3012. Fix unit test: testDataDirAndDataLogDi...

2018-05-09 Thread phunt
Github user phunt commented on the issue:

https://github.com/apache/zookeeper/pull/515
  
+1, thanks @anmolnar 


---


[jira] [Resolved] (ZOOKEEPER-3012) Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders

2018-05-09 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt resolved ZOOKEEPER-3012.
-
  Resolution: Fixed
Hadoop Flags: Reviewed

> Fix unit test: testDataDirAndDataLogDir should not use hardcode test folders
> 
>
> Key: ZOOKEEPER-3012
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3012
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server, tests
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
>  Labels: unit-test
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> The following arrange methods uses hard coded values:
> {noformat}
> when(configMock.getDataDir()).thenReturn("/tmp/zookeeper");
> when(configMock.getDataLogDir()).thenReturn("/tmp/zookeeperLog");
> {noformat}
> Which makes the test fail if the folders exist on the running machine.
> Random test folders should be created and removed during cleanup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #514: ZOOKEEPER-3012. Fix unit test: testDataDirAndDataLogDi...

2018-05-09 Thread phunt
Github user phunt commented on the issue:

https://github.com/apache/zookeeper/pull/514
  
+1, thanks @anmolnar 


---


[GitHub] zookeeper pull request #514: ZOOKEEPER-3012. Fix unit test: testDataDirAndDa...

2018-05-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/514


---


[jira] [Resolved] (ZOOKEEPER-2901) Session ID that is negative causes mis-calculation of Ephemeral Type

2018-05-09 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt resolved ZOOKEEPER-2901.
-
   Resolution: Fixed
Fix Version/s: 3.5.4
   3.6.0

Issue resolved by pull request 377
[https://github.com/apache/zookeeper/pull/377]

> Session ID that is negative causes mis-calculation of Ephemeral Type
> 
>
> Key: ZOOKEEPER-2901
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2901
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3
> Environment: Running 3.5.3-beta in Docker container
>Reporter: Mark Johnson
>Assignee: Jordan Zimmerman
>Priority: Blocker
> Fix For: 3.6.0, 3.5.4
>
>
> In the code that determines the EphemeralType it is looking at the owner 
> (which is the client ID or connection ID):
> EphemeralType.java:
>public static EphemeralType get(long ephemeralOwner) {
>if (ephemeralOwner == CONTAINER_EPHEMERAL_OWNER) {
>return CONTAINER;
>}
>if (ephemeralOwner < 0) {
>return TTL;
>}
>return (ephemeralOwner == 0) ? VOID : NORMAL;
>}
> However my connection ID is:
> header.getClientId(): -720548323429908480
> This causes the code to think this is a TTL Ephemeral node instead of a
> NORMAL Ephemeral node.
> This also explains why this is random - if my client ID is non-negative
> then the node gets added correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #377: [ZOOKEEPER-2901] TTL Nodes don't work with Serv...

2018-05-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/377


---


[jira] [Commented] (ZOOKEEPER-2959) ignore accepted epoch and LEADERINFO ack from observers when a newly elected leader computes new epoch

2018-05-09 Thread Bogdan Kanivets (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16469516#comment-16469516
 ] 

Bogdan Kanivets commented on ZOOKEEPER-2959:


I think this is ready to merge. There are 3 PRs for 3.4, 3.5 and master.

Steps to reproduce the bug:

Start with 3 servers. Config:

 
{code:java}
clientPort=2181
leaderServes=yes
server.1=:2888:3888
server.2=:2888:3888
server.3=:2888:3888:observer
{code}
 

On server.2 block follower port from server.1 to server.2:
{code:java}
sudo iptables -A INPUT -s  -p tcp --destination-port 2888 -j 
DROP{code}
Start server.1, server.2 and server.3
Wait for server.2 to declare itself a leader and then fail in 
waitForNewLeaderAck

 
{code:java}
2018-04-16 20:56:25,990 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@361] - LEADING - LEADER 
ELECTION TOOK - 3903
2018-04-16 20:56:27,275 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@329] - Follower sid: 3 : 
info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@136ca5bc
2018-04-16 20:56:27,281 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@384] - Synchronizing with 
Follower sid: 3 maxCommittedLog=0x0 minCommittedLog=0x0 peerLastZxid=0x0
2018-04-16 20:56:27,281 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@393] - leader and follower 
are in sync, zxid=0x0
2018-04-16 20:56:27,282 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@458] - Sending DIFF
2018-04-16 20:56:27,291 [myid:2] - INFO 
[LearnerHandler-/:29223:LearnerHandler@518] - Received 
NEWLEADER-ACK message from 3
2018-04-16 20:56:47,283 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@502] - Shutting down
2018-04-16 20:56:47,284 [myid:2] - INFO 
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Leader@508] - Shutdown called
java.lang.Exception: shutdown Leader! reason: Waiting for a quorum of 
followers, only synced with sids: [ 2 ]
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:508)
at org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:406)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:859){code}

On server.2 check that currentEpoch is incremented in currentEpoch file. This 
is the bug. Epoch is incremented in getEpochToPropose because server.3 is 
counted in connectingFollowers.

> ignore accepted epoch and LEADERINFO ack from observers when a newly elected 
> leader computes new epoch
> --
>
> Key: ZOOKEEPER-2959
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2959
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.4.10, 3.5.3
>Reporter: xiangyq000
>Assignee: Bogdan Kanivets
>Priority: Blocker
>
> Once the ZooKeeper cluster finishes the election for new leader, all learners 
> report their accepted epoch to the leader for the computation of new cluster 
> epoch.
> org.apache.zookeeper.server.quorum.Leader#getEpochToPropose
> {code:java}
> private final HashSet connectingFollowers = new HashSet();
> public long getEpochToPropose(long sid, long lastAcceptedEpoch) throws 
> InterruptedException, IOException {
> synchronized(connectingFollowers) {
> if (!waitingForNewEpoch) {
> return epoch;
> }
> if (lastAcceptedEpoch >= epoch) {
> epoch = lastAcceptedEpoch+1;
> }
> connectingFollowers.add(sid);
> QuorumVerifier verifier = self.getQuorumVerifier();
> if (connectingFollowers.contains(self.getId()) &&
> 
> verifier.containsQuorum(connectingFollowers)) {
> waitingForNewEpoch = false;
> self.setAcceptedEpoch(epoch);
> connectingFollowers.notifyAll();
> } else {
> long start = Time.currentElapsedTime();
> long cur = start;
> long end = start + self.getInitLimit()*self.getTickTime();
> while(waitingForNewEpoch && cur < end) {
> connectingFollowers.wait(end - cur);
> cur = Time.currentElapsedTime();
> }
> if (waitingForNewEpoch) {
> throw new InterruptedException("Timeout while waiting for 
> epoch from quorum");
> }
> }
> return epoch;
> }
> }
> {code}
> The computation will get an outcome once :
> # The leader has call method "getEpochToPropose"
> # The number of all reporters is greater than half of participants.
> The problem is, an observer server will also send its accepted epoch to the 
> leader, while this procedure treat observers as participants.
> Supposed that the cluster consists of 

Re: Question on merge script

2018-05-09 Thread Flavio Junqueira
Thanks for the feedback, Pat. I think the wiki page with merge script 
instructions needs updating. I'll explore it a bit further and will update it.

-Flavio

> On 9 May 2018, at 20:05, Patrick Hunt  wrote:
> 
> On Wed, May 9, 2018 at 1:18 AM, Flavio Junqueira  wrote:
> 
>> Hey Michael,
>> 
>> I was trying to merge yesterday a PR generated against branch-3.5, and
>> fetching the PR branch did not give me the merge script. I ended up asking
>> the contributor to change the target branch to master so that I avoid any
>> small hacks with the merge script.
>> 
>> 
> fwiw that's not the workflow I use. I always fetch the latest repo content,
> then switch to the master and use the script to merge/push a PR. It doesn't
> matter which PR or branch you want to merge, you just run the script off
> master and it handles the rest. If the branch/PR is off 3.4 it all just
> works.
> 
> 
>> We should consider doing the following two things, and let me know if it
>> makes sense:
>> 1- Clarifying that if a change is supposed to go to both branch-3.5 and
>> master, the PR should be against master
>> 
> 
> As long as it applies cleanly to master and br35 (etc...) this is not
> really necessary. You use the merge script to merge it into the target
> branch, then after you push that change to apache git repo it will ask you
> if you want to merge to other branches. Typically I would ask the OP to
> post multiple PRs if there are conflicts. I don't usually commit to just
> one branch if the change is necessary for multiple branches and there are
> conflicts. (I wait for all the PRs covering all the branches cleanly)
> 
> 
>> 2- Perhaps merging to branch-3.5 so that I see the script when I fetch a
>> PR branch off branch-3.5. This is unusual, but it is not unreasonable that
>> we have eventually PRs for branch-3.5 only.
>> 
>> I'm focusing on 3.5, but the same reasoning applies to 3.4.
>> 
>> 
> I always just start with master checked out and run the script. Seems fine
> to me and it means we don't need to maintain multiple versions of the
> scripts and keep them in sync. What's the benefit of doing otw?
> 
> Patrick
> 
> 
>> -Flavio
>> 
>> 
>>> On 9 May 2018, at 01:49, Michael Han  wrote:
>>> 
>>> Hi Flavio,
>>> 
>>> The merge script is branch agnostic - it only cares about the pull
>> request
>>> number. As long as in the pull request the correct target branch is
>>> specified, the merge script will do its job by merging the change to the
>>> specified target branch. I guess we could commit the same script to
>>> branch-3.5 but the current script in master should be able to do what you
>>> asked.
>>> 
>>> On Tue, May 8, 2018 at 4:06 PM, Flavio Junqueira  wrote:
>>> 
 Could anyone remind me why we don't have the merge script on branch-3.5?
 Say I have a change that targets branch-3.5 alone. Shouldn't I be able
>> to
 have a PR that targets branch-3.5 and use the merge script?
 
 Thanks,
 -Flavio
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Cheers
>>> Michael
>> 
>> 



[GitHub] zookeeper issue #377: [ZOOKEEPER-2901] TTL Nodes don't work with Server IDs ...

2018-05-09 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/377
  
Never mind. I'll create a separate PR for that.


---


[GitHub] zookeeper issue #377: [ZOOKEEPER-2901] TTL Nodes don't work with Server IDs ...

2018-05-09 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/377
  
@Randgalt Not strictly part of this PR, but I noticed that ContainerManager 
doesn't log the name of the container being deleted here:

```java
try {
  LOG.info("Attempting to delete candidate container: %s",
containerPath);
  requestProcessor.processRequest(request);
} catch (Exception e) {
   LOG.error(String.format("Could not delete container: %s" ,
  containerPath), e);
}
```

The `%s` should be `{}`.




---


Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Patrick Hunt
On Wed, May 9, 2018 at 11:11 AM, Norbert Kalmar 
wrote:

> Thanks Patrick, great question.
> My understanding is that this tool not only shows if JVM spends too much
> time in GC, but if, for any other reason, there is a JVM pause (The tool
> only differentiates GC pause from all other pause). This could be slow
> fsync (although we do have logs for that) or even server/OS related.
>
> But again, this is just my interpretation. I will ask the source of the
> idea, what extra benefits this gives them over java GC log.
>
> I checked ZK, I don't see it enabled by default, but GC logging can be set
> with JVM parameters easily, so that shouldn't be a key factor anyway.
>
>
I think that would be a useful change regardless - to make it on by default
I mean. Also some docs wrt our recommendations, how to troubleshoot, etc...
Adding a feature is useful, but ensuring people know about it and can  use
it effectively is even more so.

Regards,

Patrick


> Regards,
> Norbert
>
> On Wed, May 9, 2018 at 7:57 PM Patrick Hunt  wrote:
>
> > Do you know why they did this rather than just enabling GC logging by
> > default? Why re-invent the wheel?
> >
> > I seem to remember seeing a push do enable GC logging by default a few
> > years ago. In particular around the time when the JVM added GC log
> rolling
> > as a feature. Here's an example:
> >
> > https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-
> to-enable-logs-rotation-with-hotspot/
> > My understanding is that the overhead is so low that it's feasible to do
> > this.
> >
> > Good improvement though regardless which way we go.
> >
> > Regards,
> >
> > Patrick
> >
> > On Wed, May 9, 2018 at 9:36 AM, Andor Molnar  wrote:
> >
> > > +1 cool!
> > >
> > >
> > > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar 
> > > wrote:
> > >
> > > > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > > >
> > > > Regards,
> > > > Norbert
> > > >
> > > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> > edward.ribe...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1. Sounds really nice to have feature. Let's open a ticket and
> open
> > a
> > > > PR.
> > > > > :)
> > > > >
> > > > > Ed
> > > > >
> > > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar <
> nkal...@cloudera.com
> > >
> > > > > escreveu:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I just got a tip that we could improve on the logging in
> ZooKeeper.
> > > > > After a
> > > > > > ZK crash, or client timeout sometimes it's hard to determine from
> > the
> > > > > logs
> > > > > > what happened. Knowing if ZK was responsive at the time would
> help
> > a
> > > > lot.
> > > > > > For example, ZK might spend a lot of time waiting on GC (there is
> > > still
> > > > > > some misconception that ZK is a storage).
> > > > > >
> > > > > > To help detect this, HADOOP already has a great tool called JVM
> > Pause
> > > > > > Monitor. (As the name suggest, it can be also used for
> monitoring,
> > > but
> > > > it
> > > > > > also helps post-mortem in a lot of cases). Basically it has a
> > daemon
> > > > that
> > > > > > sleeps for one second, and if the sleep time exceeds the 1s by
> more
> > > > than
> > > > > > the threshold (1s: INFO, 10s: WARN by default - this can be
> > > > configurable
> > > > > in
> > > > > > our case, see below), it will alert/make a log entry. It can also
> > > > monitor
> > > > > > the time GC took.
> > > > > >
> > > > > > Now, this class is in the HADOOP-common. I wouldn't want to
> depend
> > on
> > > > > > Hadoop-common because of this one feature/class (it is actually a
> > > > single
> > > > > > class). Since this is a straightforward implementation, and in
> the
> > > past
> > > > > > five years the few commits it had is nothing really serious, I
> > think
> > > we
> > > > > > could just copy this class in ZooKeeper, and introduce it as a
> > > > > configurable
> > > > > > feature, by default it can be off.
> > > > > >
> > > > > > The class:
> > > > > >
> > > > > >
> > > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > > common-project/hadoop-common/src/main/java/org/apache/
> > > > hadoop/util/JvmPauseMonitor.java
> > > > > >
> > > > > > What do You think?
> > > > > >
> > > > > > Regards,
> > > > > > Norbert
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Discover LEADER from JMX

2018-05-09 Thread Patrick Hunt
iiuc what you are interested in the information is already available. The
beans have a "state" attribute which indicates following vs leading.

Try attaching a jconsole to the running servers, use the "mbeans" tab and
open org.apache.ZooKeeperService -> replicatedserver -> replica ->
attributes, you'll see the "state" attribute there.

Patrick

On Wed, May 9, 2018 at 8:02 AM, Enrico Olivelli  wrote:

> Thank you Edward
>
> I will pack all together and send out a patch as soon as I have time.
> I am running 3.5 in production and given than an RC for 3.5.4 is going to
> be cut soon I will have to wait for 3.5.5 and I assume it won't be
> immediate.
>
> Cheers
> Enrico
>
> Il giorno mer 9 mag 2018 alle ore 14:37 Edward Ribeiro <
> edward.ribe...@gmail.com> ha scritto:
>
> > Sent before finishing the previous email. Only to complement, the
> > findLeader() could have been as below, but this change is only a nitty
> > detail and totally irrelevant to the questions you are asking. :)
> >
> > /**
> >  * Returns the address of the node we think is the leader.
> >  */
> > protected QuorumServer findLeader() {
> >
> > // Find the leader by id
> > long currentLeader = self.getCurrentVote().getId();
> >
> > QuorumServer leaderServer = self.getView().get(currentLeader);
> >
> > if (leaderServer == null) {
> > LOG.warn("Couldn't find the leader with id = {}", currentLeader);
> > }
> > return leaderServer;
> > }
> >
> > Edward
> >
> > On Wed, May 9, 2018 at 9:29 AM, Edward Ribeiro  >
> > wrote:
> >
> > > Hi Enrico,
> > >
> > > Well, I am not an expert on QuorumPeer either (not an expert on
> anything,
> > > really), but maybe it's the variable and method below?
> > >
> > > - QuorumPeer --
> > >
> > > /**
> > >  * This is who I think the leader currently is.
> > >  */
> > > volatile private Vote currentVote;
> > >
> > > public synchronized Vote getCurrentVote(){
> > > return currentVote;
> > > }
> > >
> > > ---
> > >
> > >
> > > Then it's a matter of calling quorumPeer.getCurrentVote().getId() and
> > > quorumPeer.getServerState()?
> > >
> > > Btw, the Learner class has this handy method below (self is a
> > QuorumPeer):
> > >
> > >  Learner 
> > >
> > > /**
> > >  * Returns the address of the node we think is the leader.
> > >  */
> > > protected QuorumServer findLeader() {
> > > QuorumServer leaderServer = null;
> > > // Find the leader by id
> > > Vote current = self.getCurrentVote();
> > > for (QuorumServer s : self.getView().values()) {
> > > if (s.id == current.getId()) {
> > > leaderServer = s;
> > > break;
> > > }
> > > }
> > > if (leaderServer == null) {
> > > LOG.warn("Couldn't find the leader with id = "
> > > + current.getId());
> > > }
> > > return leaderServer;
> > > }
> > >
> > > ---
> > >
> > > By the way, as a side note, the map traversal could be changed by:
> > >
> > > 
> > >
> > > if (self.getView().contains(current.getId()) {
> > >
> > > }
> > >
> > > ---
> > >
> > >
> > >
> > > You can see above this method the quorumPeer.getView() returns a
> Map > > QuorumServer> as below:
> > >
> > > -QuorumPeer -
> > >
> > > /**
> > >  * A 'view' is a node's current opinion of the membership of the entire
> > >  * ensemble.
> > >  */
> > > public Map getView() {
> > > return Collections.unmodifiableMap(getQuorumVerifier().
> > > getAllMembers());
> > > }
> > >
> > > -
> > >
> > >
> > > And then it retrieves the QuorumServer that has many more information
> > > about the node besides the sid (InetSocketAddress, hostname, etc). :)
> > >
> > >
> > > Cheers,
> > > Edward
> > >
> > > On Wed, May 9, 2018 at 8:50 AM, Enrico Olivelli 
> > > wrote:
> > >
> > >> So I am trying to create a patch in order to expose on JMX the id of
> the
> > >> current "leader" (on the JVM of a follower)
> > >>
> > >> I am trying to find in ZK which is the variable which holds the ID of
> > the
> > >> current leader.
> > >> I am new to the internal of QuorumPeer
> > >>
> > >> Can someone give me some hint ?
> > >>
> > >> Enrico
> > >>
> > >> Il giorno mar 8 mag 2018 alle ore 10:08 Ansel Zandegran <
> > >> ansel.zandeg...@infor.com> ha scritto:
> > >>
> > >> > Hi,
> > >> > That is possible with 4 letter commands. We are using it now. In
> 3.5.x
> > >> it
> > >> > is going to be removed in favour of admin server (embedded web
> > server).
> > >> > We are running in an environment where it’s not possible to run JMX
> or
> > >> > embedded web servers.
> > >> >
> > >> > So I am wondering if there is another way? It would be nice to have
> > this
> > >> > info as a znode.
> > >> >
> > >> > Best regards,
> > >> > Ansel
> > >> >
> > >> > > On 8 May 2018, at 09:55, Flavio Junqueira

Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Norbert Kalmar
Thanks Patrick, great question.
My understanding is that this tool not only shows if JVM spends too much
time in GC, but if, for any other reason, there is a JVM pause (The tool
only differentiates GC pause from all other pause). This could be slow
fsync (although we do have logs for that) or even server/OS related.

But again, this is just my interpretation. I will ask the source of the
idea, what extra benefits this gives them over java GC log.

I checked ZK, I don't see it enabled by default, but GC logging can be set
with JVM parameters easily, so that shouldn't be a key factor anyway.

Regards,
Norbert

On Wed, May 9, 2018 at 7:57 PM Patrick Hunt  wrote:

> Do you know why they did this rather than just enabling GC logging by
> default? Why re-invent the wheel?
>
> I seem to remember seeing a push do enable GC logging by default a few
> years ago. In particular around the time when the JVM added GC log rolling
> as a feature. Here's an example:
>
> https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-to-enable-logs-rotation-with-hotspot/
> My understanding is that the overhead is so low that it's feasible to do
> this.
>
> Good improvement though regardless which way we go.
>
> Regards,
>
> Patrick
>
> On Wed, May 9, 2018 at 9:36 AM, Andor Molnar  wrote:
>
> > +1 cool!
> >
> >
> > On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar 
> > wrote:
> >
> > > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> > >
> > > Regards,
> > > Norbert
> > >
> > > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro <
> edward.ribe...@gmail.com>
> > > wrote:
> > >
> > > > +1. Sounds really nice to have feature. Let's open a ticket and open
> a
> > > PR.
> > > > :)
> > > >
> > > > Ed
> > > >
> > > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar  >
> > > > escreveu:
> > > >
> > > > > Hi,
> > > > >
> > > > > I just got a tip that we could improve on the logging in ZooKeeper.
> > > > After a
> > > > > ZK crash, or client timeout sometimes it's hard to determine from
> the
> > > > logs
> > > > > what happened. Knowing if ZK was responsive at the time would help
> a
> > > lot.
> > > > > For example, ZK might spend a lot of time waiting on GC (there is
> > still
> > > > > some misconception that ZK is a storage).
> > > > >
> > > > > To help detect this, HADOOP already has a great tool called JVM
> Pause
> > > > > Monitor. (As the name suggest, it can be also used for monitoring,
> > but
> > > it
> > > > > also helps post-mortem in a lot of cases). Basically it has a
> daemon
> > > that
> > > > > sleeps for one second, and if the sleep time exceeds the 1s by more
> > > than
> > > > > the threshold (1s: INFO, 10s: WARN by default - this can be
> > > configurable
> > > > in
> > > > > our case, see below), it will alert/make a log entry. It can also
> > > monitor
> > > > > the time GC took.
> > > > >
> > > > > Now, this class is in the HADOOP-common. I wouldn't want to depend
> on
> > > > > Hadoop-common because of this one feature/class (it is actually a
> > > single
> > > > > class). Since this is a straightforward implementation, and in the
> > past
> > > > > five years the few commits it had is nothing really serious, I
> think
> > we
> > > > > could just copy this class in ZooKeeper, and introduce it as a
> > > > configurable
> > > > > feature, by default it can be off.
> > > > >
> > > > > The class:
> > > > >
> > > > >
> > > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > > common-project/hadoop-common/src/main/java/org/apache/
> > > hadoop/util/JvmPauseMonitor.java
> > > > >
> > > > > What do You think?
> > > > >
> > > > > Regards,
> > > > > Norbert
> > > > >
> > > >
> > >
> >
>


Re: Question on merge script

2018-05-09 Thread Patrick Hunt
On Wed, May 9, 2018 at 1:18 AM, Flavio Junqueira  wrote:

> Hey Michael,
>
> I was trying to merge yesterday a PR generated against branch-3.5, and
> fetching the PR branch did not give me the merge script. I ended up asking
> the contributor to change the target branch to master so that I avoid any
> small hacks with the merge script.
>
>
fwiw that's not the workflow I use. I always fetch the latest repo content,
then switch to the master and use the script to merge/push a PR. It doesn't
matter which PR or branch you want to merge, you just run the script off
master and it handles the rest. If the branch/PR is off 3.4 it all just
works.


> We should consider doing the following two things, and let me know if it
> makes sense:
> 1- Clarifying that if a change is supposed to go to both branch-3.5 and
> master, the PR should be against master
>

As long as it applies cleanly to master and br35 (etc...) this is not
really necessary. You use the merge script to merge it into the target
branch, then after you push that change to apache git repo it will ask you
if you want to merge to other branches. Typically I would ask the OP to
post multiple PRs if there are conflicts. I don't usually commit to just
one branch if the change is necessary for multiple branches and there are
conflicts. (I wait for all the PRs covering all the branches cleanly)


> 2- Perhaps merging to branch-3.5 so that I see the script when I fetch a
> PR branch off branch-3.5. This is unusual, but it is not unreasonable that
> we have eventually PRs for branch-3.5 only.
>
> I'm focusing on 3.5, but the same reasoning applies to 3.4.
>
>
I always just start with master checked out and run the script. Seems fine
to me and it means we don't need to maintain multiple versions of the
scripts and keep them in sync. What's the benefit of doing otw?

Patrick


> -Flavio
>
>
> > On 9 May 2018, at 01:49, Michael Han  wrote:
> >
> > Hi Flavio,
> >
> > The merge script is branch agnostic - it only cares about the pull
> request
> > number. As long as in the pull request the correct target branch is
> > specified, the merge script will do its job by merging the change to the
> > specified target branch. I guess we could commit the same script to
> > branch-3.5 but the current script in master should be able to do what you
> > asked.
> >
> > On Tue, May 8, 2018 at 4:06 PM, Flavio Junqueira  wrote:
> >
> >> Could anyone remind me why we don't have the merge script on branch-3.5?
> >> Say I have a change that targets branch-3.5 alone. Shouldn't I be able
> to
> >> have a PR that targets branch-3.5 and use the merge script?
> >>
> >> Thanks,
> >> -Flavio
> >
> >
> >
> >
> > --
> > Cheers
> > Michael
>
>


Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Patrick Hunt
Do you know why they did this rather than just enabling GC logging by
default? Why re-invent the wheel?

I seem to remember seeing a push do enable GC logging by default a few
years ago. In particular around the time when the JVM added GC log rolling
as a feature. Here's an example:
https://batmat.net/2016/10/17/always-enable-gc-logs-and-how-to-enable-logs-rotation-with-hotspot/
My understanding is that the overhead is so low that it's feasible to do
this.

Good improvement though regardless which way we go.

Regards,

Patrick

On Wed, May 9, 2018 at 9:36 AM, Andor Molnar  wrote:

> +1 cool!
>
>
> On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar 
> wrote:
>
> > Okay, thanks Ed, I created the Jira, will look into it soon :)
> > https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> >
> > Regards,
> > Norbert
> >
> > On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro 
> > wrote:
> >
> > > +1. Sounds really nice to have feature. Let's open a ticket and open a
> > PR.
> > > :)
> > >
> > > Ed
> > >
> > > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar 
> > > escreveu:
> > >
> > > > Hi,
> > > >
> > > > I just got a tip that we could improve on the logging in ZooKeeper.
> > > After a
> > > > ZK crash, or client timeout sometimes it's hard to determine from the
> > > logs
> > > > what happened. Knowing if ZK was responsive at the time would help a
> > lot.
> > > > For example, ZK might spend a lot of time waiting on GC (there is
> still
> > > > some misconception that ZK is a storage).
> > > >
> > > > To help detect this, HADOOP already has a great tool called JVM Pause
> > > > Monitor. (As the name suggest, it can be also used for monitoring,
> but
> > it
> > > > also helps post-mortem in a lot of cases). Basically it has a daemon
> > that
> > > > sleeps for one second, and if the sleep time exceeds the 1s by more
> > than
> > > > the threshold (1s: INFO, 10s: WARN by default - this can be
> > configurable
> > > in
> > > > our case, see below), it will alert/make a log entry. It can also
> > monitor
> > > > the time GC took.
> > > >
> > > > Now, this class is in the HADOOP-common. I wouldn't want to depend on
> > > > Hadoop-common because of this one feature/class (it is actually a
> > single
> > > > class). Since this is a straightforward implementation, and in the
> past
> > > > five years the few commits it had is nothing really serious, I think
> we
> > > > could just copy this class in ZooKeeper, and introduce it as a
> > > configurable
> > > > feature, by default it can be off.
> > > >
> > > > The class:
> > > >
> > > >
> > > https://github.com/apache/hadoop/blob/trunk/hadoop-
> > common-project/hadoop-common/src/main/java/org/apache/
> > hadoop/util/JvmPauseMonitor.java
> > > >
> > > > What do You think?
> > > >
> > > > Regards,
> > > > Norbert
> > > >
> > >
> >
>


Re: Txn logs and snapshots in git repo

2018-05-09 Thread Edward Ribeiro
Oh, nice. Thanks, Andor!

Ed

On Wed, May 9, 2018 at 1:52 PM, Andor Molnar  wrote:

> Hi Ed,
>
> Static data used by unit tests.
>
> Andor
>
>
>
> On Wed, May 9, 2018 at 9:46 AM, Edward Ribeiro 
> wrote:
>
> > I am updating my local repo and noticed some transaction logs and
> snapshots
> > files in src/java/test/data/invalidsnap/version-2. Are those files
> static
> > data used by unit tests or just artifacts accidentally pushed to the
> repo?
> >
> > ls -lah src/java/test/data/invalidsnap/version-2/
> >
> > total 936
> > drwxr-xr-x  11 edward  staff   374B May  9 13:34 .
> > drwxr-xr-x   3 edward  staff   102B Oct  8  2016 ..
> > -rw-r--r--   1 edward  staff58K Oct  8  2016 log.1
> > -rw-r--r--   1 edward  staff89K Oct  8  2016 log.274
> > -rw-r--r--   1 edward  staff   184B May  9 13:34 log.42
> > -rw-r--r--   1 edward  staff48K Oct  8  2016 log.63b
> > -rw-r--r--   1 edward  staff   296B Oct  8  2016 snapshot.0
> > -rw-r--r--   1 edward  staff55K Oct  8  2016 snapshot.272
> > -rw-r--r--   1 edward  staff55K Oct  8  2016 snapshot.273
> > -rw-r--r--   1 edward  staff   140K Oct  8  2016 snapshot.639
> > -rw-r--r--   1 edward  staff   4.7K Oct  8  2016 snapshot.83f
> >
> > By the way, they are on master, branch-3.4 and branch-3.5
> >
> > Cheers,
> > Edward
> >
>


Re: Txn logs and snapshots in git repo

2018-05-09 Thread Andor Molnar
Hi Ed,

Static data used by unit tests.

Andor



On Wed, May 9, 2018 at 9:46 AM, Edward Ribeiro 
wrote:

> I am updating my local repo and noticed some transaction logs and snapshots
> files in src/java/test/data/invalidsnap/version-2. Are those files static
> data used by unit tests or just artifacts accidentally pushed to the repo?
>
> ls -lah src/java/test/data/invalidsnap/version-2/
>
> total 936
> drwxr-xr-x  11 edward  staff   374B May  9 13:34 .
> drwxr-xr-x   3 edward  staff   102B Oct  8  2016 ..
> -rw-r--r--   1 edward  staff58K Oct  8  2016 log.1
> -rw-r--r--   1 edward  staff89K Oct  8  2016 log.274
> -rw-r--r--   1 edward  staff   184B May  9 13:34 log.42
> -rw-r--r--   1 edward  staff48K Oct  8  2016 log.63b
> -rw-r--r--   1 edward  staff   296B Oct  8  2016 snapshot.0
> -rw-r--r--   1 edward  staff55K Oct  8  2016 snapshot.272
> -rw-r--r--   1 edward  staff55K Oct  8  2016 snapshot.273
> -rw-r--r--   1 edward  staff   140K Oct  8  2016 snapshot.639
> -rw-r--r--   1 edward  staff   4.7K Oct  8  2016 snapshot.83f
>
> By the way, they are on master, branch-3.4 and branch-3.5
>
> Cheers,
> Edward
>


Txn logs and snapshots in git repo

2018-05-09 Thread Edward Ribeiro
I am updating my local repo and noticed some transaction logs and snapshots
files in src/java/test/data/invalidsnap/version-2. Are those files static
data used by unit tests or just artifacts accidentally pushed to the repo?

ls -lah src/java/test/data/invalidsnap/version-2/

total 936
drwxr-xr-x  11 edward  staff   374B May  9 13:34 .
drwxr-xr-x   3 edward  staff   102B Oct  8  2016 ..
-rw-r--r--   1 edward  staff58K Oct  8  2016 log.1
-rw-r--r--   1 edward  staff89K Oct  8  2016 log.274
-rw-r--r--   1 edward  staff   184B May  9 13:34 log.42
-rw-r--r--   1 edward  staff48K Oct  8  2016 log.63b
-rw-r--r--   1 edward  staff   296B Oct  8  2016 snapshot.0
-rw-r--r--   1 edward  staff55K Oct  8  2016 snapshot.272
-rw-r--r--   1 edward  staff55K Oct  8  2016 snapshot.273
-rw-r--r--   1 edward  staff   140K Oct  8  2016 snapshot.639
-rw-r--r--   1 edward  staff   4.7K Oct  8  2016 snapshot.83f

By the way, they are on master, branch-3.4 and branch-3.5

Cheers,
Edward


Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Andor Molnar
+1 cool!


On Wed, May 9, 2018 at 7:59 AM, Norbert Kalmar  wrote:

> Okay, thanks Ed, I created the Jira, will look into it soon :)
> https://issues.apache.org/jira/browse/ZOOKEEPER-3037
>
> Regards,
> Norbert
>
> On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro 
> wrote:
>
> > +1. Sounds really nice to have feature. Let's open a ticket and open a
> PR.
> > :)
> >
> > Ed
> >
> > Em qua, 9 de mai de 2018 11:15, Norbert Kalmar 
> > escreveu:
> >
> > > Hi,
> > >
> > > I just got a tip that we could improve on the logging in ZooKeeper.
> > After a
> > > ZK crash, or client timeout sometimes it's hard to determine from the
> > logs
> > > what happened. Knowing if ZK was responsive at the time would help a
> lot.
> > > For example, ZK might spend a lot of time waiting on GC (there is still
> > > some misconception that ZK is a storage).
> > >
> > > To help detect this, HADOOP already has a great tool called JVM Pause
> > > Monitor. (As the name suggest, it can be also used for monitoring, but
> it
> > > also helps post-mortem in a lot of cases). Basically it has a daemon
> that
> > > sleeps for one second, and if the sleep time exceeds the 1s by more
> than
> > > the threshold (1s: INFO, 10s: WARN by default - this can be
> configurable
> > in
> > > our case, see below), it will alert/make a log entry. It can also
> monitor
> > > the time GC took.
> > >
> > > Now, this class is in the HADOOP-common. I wouldn't want to depend on
> > > Hadoop-common because of this one feature/class (it is actually a
> single
> > > class). Since this is a straightforward implementation, and in the past
> > > five years the few commits it had is nothing really serious, I think we
> > > could just copy this class in ZooKeeper, and introduce it as a
> > configurable
> > > feature, by default it can be off.
> > >
> > > The class:
> > >
> > >
> > https://github.com/apache/hadoop/blob/trunk/hadoop-
> common-project/hadoop-common/src/main/java/org/apache/
> hadoop/util/JvmPauseMonitor.java
> > >
> > > What do You think?
> > >
> > > Regards,
> > > Norbert
> > >
> >
>


[jira] [Updated] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper

2018-05-09 Thread Norbert Kalmar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Kalmar updated ZOOKEEPER-3037:
--
Description: 
After a ZK crash, or client timeout sometimes it's hard to determine from the 
logs what happened. Knowing if ZK was responsive at the time would help a lot. 
For example, ZK might spend a lot of time waiting on GC (there is still some 
misconception that ZK is a storage). 

To help detect this, HADOOP already has a great tool called JVM Pause Monitor. 
(As the name suggest, it can be also used for monitoring, but it also helps 
post-mortem in a lot of cases). Basically it has a daemon that sleeps for one 
second, and if the sleep time exceeds the 1s by more than the threshold (1s: 
INFO, 10s: WARN by default - this can be configurable in our case, see below), 
it will alert/make a log entry. It can also monitor the time GC took.

The class implementing this is in HADOOP-common, but ZK should not depend on 
this package. Since this is a straightforward implementation, and in the past 
five years the few commits it had is nothing really serious, I think we could 
just copy this class in ZooKeeper, and introduce it as a configurable feature, 
by default it can be off.

The class:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Task:
- Create a class in ZK under contrib called JvmPauseMonitor. 
- Make feature configurable, by default: OFF
- Make sleep time and threshold time configurable

  was:
After a ZK crash, or client timeout sometimes it's hard to determine from the 
logs what happened. Knowing if ZK was responsive at the time would help a lot. 
For example, ZK might spend a lot of time waiting on GC (there is still some 
misconception that ZK is a storage). 

To help detect this, HADOOP already has a great tool called JVM Pause Monitor. 
(As the name suggest, it can be also used for monitoring, but it also helps 
post-mortem in a lot of cases). Basically it has a daemon that sleeps for one 
second, and if the sleep time exceeds the 1s by more than the threshold (1s: 
INFO, 10s: WARN by default - this can be configurable in our case, see below), 
it will alert/make a log entry. It can also monitor the time GC took.

The class implementing this is in HADOOP-common, but ZK should not depend on 
this package. Since this is a straightforward implementation, and in the past 
five years the few commits it had is nothing really serious, I think we could 
just copy this class in ZooKeeper, and introduce it as a configurable feature, 
by default it can be off.

The class:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Task:
- Create a class in ZK under contrib called JvmPauseMonitor. 
- Make feature configurable, by default: OFF
- ?Make sleep time and threshold time configurable?


> Add JvmPauseMonitor to ZooKeeper
> 
>
> Key: ZOOKEEPER-3037
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 3.5.3, 3.4.12
>Reporter: Norbert Kalmar
>Assignee: Norbert Kalmar
>Priority: Minor
>
> After a ZK crash, or client timeout sometimes it's hard to determine from the 
> logs what happened. Knowing if ZK was responsive at the time would help a 
> lot. For example, ZK might spend a lot of time waiting on GC (there is still 
> some misconception that ZK is a storage). 
> To help detect this, HADOOP already has a great tool called JVM Pause 
> Monitor. (As the name suggest, it can be also used for monitoring, but it 
> also helps post-mortem in a lot of cases). Basically it has a daemon that 
> sleeps for one second, and if the sleep time exceeds the 1s by more than the 
> threshold (1s: INFO, 10s: WARN by default - this can be configurable in our 
> case, see below), it will alert/make a log entry. It can also monitor the 
> time GC took.
> The class implementing this is in HADOOP-common, but ZK should not depend on 
> this package. Since this is a straightforward implementation, and in the past 
> five years the few commits it had is nothing really serious, I think we could 
> just copy this class in ZooKeeper, and introduce it as a configurable 
> feature, by default it can be off.
> The class:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
> Task:
> - Create a class in ZK under contrib called JvmPauseMonitor. 
> - Make feature configurable, by default: OFF
> - Make sleep time and threshold time configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Discover LEADER from JMX

2018-05-09 Thread Enrico Olivelli
Thank you Edward

I will pack all together and send out a patch as soon as I have time.
I am running 3.5 in production and given than an RC for 3.5.4 is going to
be cut soon I will have to wait for 3.5.5 and I assume it won't be
immediate.

Cheers
Enrico

Il giorno mer 9 mag 2018 alle ore 14:37 Edward Ribeiro <
edward.ribe...@gmail.com> ha scritto:

> Sent before finishing the previous email. Only to complement, the
> findLeader() could have been as below, but this change is only a nitty
> detail and totally irrelevant to the questions you are asking. :)
>
> /**
>  * Returns the address of the node we think is the leader.
>  */
> protected QuorumServer findLeader() {
>
> // Find the leader by id
> long currentLeader = self.getCurrentVote().getId();
>
> QuorumServer leaderServer = self.getView().get(currentLeader);
>
> if (leaderServer == null) {
> LOG.warn("Couldn't find the leader with id = {}", currentLeader);
> }
> return leaderServer;
> }
>
> Edward
>
> On Wed, May 9, 2018 at 9:29 AM, Edward Ribeiro 
> wrote:
>
> > Hi Enrico,
> >
> > Well, I am not an expert on QuorumPeer either (not an expert on anything,
> > really), but maybe it's the variable and method below?
> >
> > - QuorumPeer --
> >
> > /**
> >  * This is who I think the leader currently is.
> >  */
> > volatile private Vote currentVote;
> >
> > public synchronized Vote getCurrentVote(){
> > return currentVote;
> > }
> >
> > ---
> >
> >
> > Then it's a matter of calling quorumPeer.getCurrentVote().getId() and
> > quorumPeer.getServerState()?
> >
> > Btw, the Learner class has this handy method below (self is a
> QuorumPeer):
> >
> >  Learner 
> >
> > /**
> >  * Returns the address of the node we think is the leader.
> >  */
> > protected QuorumServer findLeader() {
> > QuorumServer leaderServer = null;
> > // Find the leader by id
> > Vote current = self.getCurrentVote();
> > for (QuorumServer s : self.getView().values()) {
> > if (s.id == current.getId()) {
> > leaderServer = s;
> > break;
> > }
> > }
> > if (leaderServer == null) {
> > LOG.warn("Couldn't find the leader with id = "
> > + current.getId());
> > }
> > return leaderServer;
> > }
> >
> > ---
> >
> > By the way, as a side note, the map traversal could be changed by:
> >
> > 
> >
> > if (self.getView().contains(current.getId()) {
> >
> > }
> >
> > ---
> >
> >
> >
> > You can see above this method the quorumPeer.getView() returns a Map > QuorumServer> as below:
> >
> > -QuorumPeer -
> >
> > /**
> >  * A 'view' is a node's current opinion of the membership of the entire
> >  * ensemble.
> >  */
> > public Map getView() {
> > return Collections.unmodifiableMap(getQuorumVerifier().
> > getAllMembers());
> > }
> >
> > -
> >
> >
> > And then it retrieves the QuorumServer that has many more information
> > about the node besides the sid (InetSocketAddress, hostname, etc). :)
> >
> >
> > Cheers,
> > Edward
> >
> > On Wed, May 9, 2018 at 8:50 AM, Enrico Olivelli 
> > wrote:
> >
> >> So I am trying to create a patch in order to expose on JMX the id of the
> >> current "leader" (on the JVM of a follower)
> >>
> >> I am trying to find in ZK which is the variable which holds the ID of
> the
> >> current leader.
> >> I am new to the internal of QuorumPeer
> >>
> >> Can someone give me some hint ?
> >>
> >> Enrico
> >>
> >> Il giorno mar 8 mag 2018 alle ore 10:08 Ansel Zandegran <
> >> ansel.zandeg...@infor.com> ha scritto:
> >>
> >> > Hi,
> >> > That is possible with 4 letter commands. We are using it now. In 3.5.x
> >> it
> >> > is going to be removed in favour of admin server (embedded web
> server).
> >> > We are running in an environment where it’s not possible to run JMX or
> >> > embedded web servers.
> >> >
> >> > So I am wondering if there is another way? It would be nice to have
> this
> >> > info as a znode.
> >> >
> >> > Best regards,
> >> > Ansel
> >> >
> >> > > On 8 May 2018, at 09:55, Flavio Junqueira  wrote:
> >> > >
> >> > > Hi Enrico,
> >> > >
> >> > > You can determine the state of a server it via 4-letter commands.
> >> Would
> >> > that work for you?
> >> > >
> >> > > -Flavio
> >> > >
> >> > >> On 8 May 2018, at 09:09, Enrico Olivelli 
> >> wrote:
> >> > >>
> >> > >> Hi,
> >> > >> is there any way to see in JMX which is the leader of a ZooKeeper
> >> > cluster?
> >> > >>
> >> > >> My problem is: given access to any of the nodes of the cluster I
> >> want to
> >> > >> know from JMX which is the current leader.
> >> > >> It seems to me that this information is not available, you can know
> >> > only if
> >> > >> the local node is Leader or Follower.
> >> > >>
> >> > >> Cheers
> >> > >> Enrico
> >> > >
> >> >

Re: Name resolution in StaticHostProvider

2018-05-09 Thread Flavio Junqueira
I'm actually now wondering whether we should be using an unchecked exception 
instead. A lot of things have changed with exception handling since we wrote 
this code base initially. An unchecked exception would actually match better my 
current mental model of what that signature should look like.

-Flavio

> On 9 May 2018, at 16:44, Flavio Junqueira  wrote:
> 
> I like the idea of indicating to the application that there is something 
> wrong with the list of servers so that it has a chance to look into it. With 
> the current code in `ClientCnxn`, we will log at warn level and hope that 
> someone sees it, but we are not really stopping the client. Throwing might 
> actually be an improvement as it will output a log message, but I'm now 
> wondering if we should propagate it all the way to the application. 
> Responding to myself, one reason for not doing it is that it is not a fatal 
> error unless no server can be resolved.
> 
> -Flavio
> 
>> On 8 May 2018, at 16:06, Andor Molnar  wrote:
>> 
>> Hi,
>> 
>> Updating this thread, because the PR is still being review on GitHub.
>> 
>> So, the reason why I refactored the original behaviour of
>> StaticHostProvider is that I believe that it's trying to do something which
>> is not its responsibility. Please tell me if there's a good historical
>> reason for that.
>> 
>> My approach is giving the user the following to options:
>> 1- Use static IP addresses, if you don't want to deal with DNS resolution
>> at all - we guarantee that no DNS logic will involved in this case at all.
>> 2- Use DNS hostnames if you have a reliable DNS service for resolution
>> (with HA, secondary servers, backups, etc.) - we must use DNS in the right
>> way in this case e.g. do NOT cache IP address for a longer period that DNS
>> server allows to and re-resolve after TTL expries, because it's mandatory
>> by protocol.
>> 
>> My 2 cents here:
>> - the fix which was originally posted for re-resolution is a workaround and
>> doesn't satisfy the requirement for #2,
>> - the solution is already built-in in JDK and DNS clients in the right way
>> - can't see a reason why we shouldn't use that
>> 
>> I checked this in some other projects as well and found very similar
>> approach in hadoop-common's SecurityUtil.java. It has 2 built-in plugins
>> for that:
>> - Standard resolver uses java's built-in getByName().
>> - Qualified resolver still uses getByName(), but adds some logic to avoid
>> incorrect re-resolutions and reverse IP lookups.
>> 
>> Please let me know your thoughts.
>> 
>> Regards,
>> Andor
>> 
>> 
>> 
>> 
>> 
>> 
>> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar  wrote:
>> 
>>> Hi Abe,
>>> 
>>> Unfortunately we haven't got any feedback yet. What do you think of
>>> implementing Option #3?
>>> 
>>> Regards,
>>> Andor
>>> 
>>> 
>>> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar  wrote:
>>> 
 Did anybody happen to take a quick look by any chance?
 
 I don't want to push this too hard, because I know it's a time consuming
 topic to think about, but this is a blocker in 3.5 which has been hanging
 around for a while and any feedback would be extremely helpful to close it
 quickly.
 
 Thanks,
 Andor
 
 
 
 On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
 wrote:
 
> Hi all,
> 
> We need more eyes and brains on the following PR:
> 
> https://github.com/apache/zookeeper/pull/451
> 
> I added a comment few days ago about the way we currently do DNS name
> resolution in this class and a suggestion on how we could simplify things 
> a
> little bit. We talked about it with Abe Fine, but we're a little bit 
> unsure
> and cannot get a conclusion. It would be extremely handy to get more
> feedback from you.
> 
> To add some colour to it, let me elaborate on the situation here:
> 
> In general, the task that StaticHostProvider does is to get a list of
> potentially unresolved InetSocketAddress objects, resolve them and iterate
> over the resolved objects by calling next() method.
> 
> *Option #1 (current logic)*
> - Resolve addresses with getAllByName() which returns a list of IP
> addresses associated with the address.
> - Cache all these IP's, shuffle them and iterate over.
> - If client is unable to connect to an IP, remove all IPs from the list
> which the original servername was resolved to and re-resolve it.
> 
> *Option #2 (getByName())*
> - Resolve address with getByName() instead which returns only the first
> IP address of the name,
> - Do not cache IPs,
> - Shuffle the *names* and resolve with getByName() *every time* when
> next() is called,
> - JDK's built-in caching will prevent name servers from being flooded
> and will do the re-resolution automatically when cache expires,
> - Names with multiple IPs will be handled by DNS servers which (if
> configured properly) return IPs 

[jira] [Assigned] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper

2018-05-09 Thread Norbert Kalmar (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Norbert Kalmar reassigned ZOOKEEPER-3037:
-

Assignee: Norbert Kalmar

> Add JvmPauseMonitor to ZooKeeper
> 
>
> Key: ZOOKEEPER-3037
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: contrib
>Affects Versions: 3.5.3, 3.4.12
>Reporter: Norbert Kalmar
>Assignee: Norbert Kalmar
>Priority: Minor
>
> After a ZK crash, or client timeout sometimes it's hard to determine from the 
> logs what happened. Knowing if ZK was responsive at the time would help a 
> lot. For example, ZK might spend a lot of time waiting on GC (there is still 
> some misconception that ZK is a storage). 
> To help detect this, HADOOP already has a great tool called JVM Pause 
> Monitor. (As the name suggest, it can be also used for monitoring, but it 
> also helps post-mortem in a lot of cases). Basically it has a daemon that 
> sleeps for one second, and if the sleep time exceeds the 1s by more than the 
> threshold (1s: INFO, 10s: WARN by default - this can be configurable in our 
> case, see below), it will alert/make a log entry. It can also monitor the 
> time GC took.
> The class implementing this is in HADOOP-common, but ZK should not depend on 
> this package. Since this is a straightforward implementation, and in the past 
> five years the few commits it had is nothing really serious, I think we could 
> just copy this class in ZooKeeper, and introduce it as a configurable 
> feature, by default it can be off.
> The class:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
> Task:
> - Create a class in ZK under contrib called JvmPauseMonitor. 
> - Make feature configurable, by default: OFF
> - ?Make sleep time and threshold time configurable?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Norbert Kalmar
Okay, thanks Ed, I created the Jira, will look into it soon :)
https://issues.apache.org/jira/browse/ZOOKEEPER-3037

Regards,
Norbert

On Wed, May 9, 2018 at 4:44 PM Edward Ribeiro 
wrote:

> +1. Sounds really nice to have feature. Let's open a ticket and open a PR.
> :)
>
> Ed
>
> Em qua, 9 de mai de 2018 11:15, Norbert Kalmar 
> escreveu:
>
> > Hi,
> >
> > I just got a tip that we could improve on the logging in ZooKeeper.
> After a
> > ZK crash, or client timeout sometimes it's hard to determine from the
> logs
> > what happened. Knowing if ZK was responsive at the time would help a lot.
> > For example, ZK might spend a lot of time waiting on GC (there is still
> > some misconception that ZK is a storage).
> >
> > To help detect this, HADOOP already has a great tool called JVM Pause
> > Monitor. (As the name suggest, it can be also used for monitoring, but it
> > also helps post-mortem in a lot of cases). Basically it has a daemon that
> > sleeps for one second, and if the sleep time exceeds the 1s by more than
> > the threshold (1s: INFO, 10s: WARN by default - this can be configurable
> in
> > our case, see below), it will alert/make a log entry. It can also monitor
> > the time GC took.
> >
> > Now, this class is in the HADOOP-common. I wouldn't want to depend on
> > Hadoop-common because of this one feature/class (it is actually a single
> > class). Since this is a straightforward implementation, and in the past
> > five years the few commits it had is nothing really serious, I think we
> > could just copy this class in ZooKeeper, and introduce it as a
> configurable
> > feature, by default it can be off.
> >
> > The class:
> >
> >
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
> >
> > What do You think?
> >
> > Regards,
> > Norbert
> >
>


[jira] [Created] (ZOOKEEPER-3037) Add JvmPauseMonitor to ZooKeeper

2018-05-09 Thread Norbert Kalmar (JIRA)
Norbert Kalmar created ZOOKEEPER-3037:
-

 Summary: Add JvmPauseMonitor to ZooKeeper
 Key: ZOOKEEPER-3037
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3037
 Project: ZooKeeper
  Issue Type: Improvement
  Components: contrib
Affects Versions: 3.4.12, 3.5.3
Reporter: Norbert Kalmar


After a ZK crash, or client timeout sometimes it's hard to determine from the 
logs what happened. Knowing if ZK was responsive at the time would help a lot. 
For example, ZK might spend a lot of time waiting on GC (there is still some 
misconception that ZK is a storage). 

To help detect this, HADOOP already has a great tool called JVM Pause Monitor. 
(As the name suggest, it can be also used for monitoring, but it also helps 
post-mortem in a lot of cases). Basically it has a daemon that sleeps for one 
second, and if the sleep time exceeds the 1s by more than the threshold (1s: 
INFO, 10s: WARN by default - this can be configurable in our case, see below), 
it will alert/make a log entry. It can also monitor the time GC took.

The class implementing this is in HADOOP-common, but ZK should not depend on 
this package. Since this is a straightforward implementation, and in the past 
five years the few commits it had is nothing really serious, I think we could 
just copy this class in ZooKeeper, and introduce it as a configurable feature, 
by default it can be off.

The class:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

Task:
- Create a class in ZK under contrib called JvmPauseMonitor. 
- Make feature configurable, by default: OFF
- ?Make sleep time and threshold time configurable?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Name resolution in StaticHostProvider

2018-05-09 Thread Flavio Junqueira
I like the idea of indicating to the application that there is something wrong 
with the list of servers so that it has a chance to look into it. With the 
current code in `ClientCnxn`, we will log at warn level and hope that someone 
sees it, but we are not really stopping the client. Throwing might actually be 
an improvement as it will output a log message, but I'm now wondering if we 
should propagate it all the way to the application. Responding to myself, one 
reason for not doing it is that it is not a fatal error unless no server can be 
resolved.

-Flavio
 
> On 8 May 2018, at 16:06, Andor Molnar  wrote:
> 
> Hi,
> 
> Updating this thread, because the PR is still being review on GitHub.
> 
> So, the reason why I refactored the original behaviour of
> StaticHostProvider is that I believe that it's trying to do something which
> is not its responsibility. Please tell me if there's a good historical
> reason for that.
> 
> My approach is giving the user the following to options:
> 1- Use static IP addresses, if you don't want to deal with DNS resolution
> at all - we guarantee that no DNS logic will involved in this case at all.
> 2- Use DNS hostnames if you have a reliable DNS service for resolution
> (with HA, secondary servers, backups, etc.) - we must use DNS in the right
> way in this case e.g. do NOT cache IP address for a longer period that DNS
> server allows to and re-resolve after TTL expries, because it's mandatory
> by protocol.
> 
> My 2 cents here:
> - the fix which was originally posted for re-resolution is a workaround and
> doesn't satisfy the requirement for #2,
> - the solution is already built-in in JDK and DNS clients in the right way
> - can't see a reason why we shouldn't use that
> 
> I checked this in some other projects as well and found very similar
> approach in hadoop-common's SecurityUtil.java. It has 2 built-in plugins
> for that:
> - Standard resolver uses java's built-in getByName().
> - Qualified resolver still uses getByName(), but adds some logic to avoid
> incorrect re-resolutions and reverse IP lookups.
> 
> Please let me know your thoughts.
> 
> Regards,
> Andor
> 
> 
> 
> 
> 
> 
> On Tue, Mar 6, 2018 at 8:12 AM, Andor Molnar  wrote:
> 
>> Hi Abe,
>> 
>> Unfortunately we haven't got any feedback yet. What do you think of
>> implementing Option #3?
>> 
>> Regards,
>> Andor
>> 
>> 
>> On Thu, Feb 22, 2018 at 6:06 PM, Andor Molnar  wrote:
>> 
>>> Did anybody happen to take a quick look by any chance?
>>> 
>>> I don't want to push this too hard, because I know it's a time consuming
>>> topic to think about, but this is a blocker in 3.5 which has been hanging
>>> around for a while and any feedback would be extremely helpful to close it
>>> quickly.
>>> 
>>> Thanks,
>>> Andor
>>> 
>>> 
>>> 
>>> On Mon, Feb 19, 2018 at 12:18 PM, Andor Molnar 
>>> wrote:
>>> 
 Hi all,
 
 We need more eyes and brains on the following PR:
 
 https://github.com/apache/zookeeper/pull/451
 
 I added a comment few days ago about the way we currently do DNS name
 resolution in this class and a suggestion on how we could simplify things a
 little bit. We talked about it with Abe Fine, but we're a little bit unsure
 and cannot get a conclusion. It would be extremely handy to get more
 feedback from you.
 
 To add some colour to it, let me elaborate on the situation here:
 
 In general, the task that StaticHostProvider does is to get a list of
 potentially unresolved InetSocketAddress objects, resolve them and iterate
 over the resolved objects by calling next() method.
 
 *Option #1 (current logic)*
 - Resolve addresses with getAllByName() which returns a list of IP
 addresses associated with the address.
 - Cache all these IP's, shuffle them and iterate over.
 - If client is unable to connect to an IP, remove all IPs from the list
 which the original servername was resolved to and re-resolve it.
 
 *Option #2 (getByName())*
 - Resolve address with getByName() instead which returns only the first
 IP address of the name,
 - Do not cache IPs,
 - Shuffle the *names* and resolve with getByName() *every time* when
 next() is called,
 - JDK's built-in caching will prevent name servers from being flooded
 and will do the re-resolution automatically when cache expires,
 - Names with multiple IPs will be handled by DNS servers which (if
 configured properly) return IPs in different order - this is called DNS
 Round Robin -, so getByName() will return different IP on each call.
 
 *Options #3*
 - There's a small problem with option#2: if DNS server is not configured
 properly and handles the round-robin case in a way that it always return
 the IP list in the same order, getByName() will never return the next ip,
 - In order to overcome that, use getAllByName() instead, shuffle the
 list and return the first IP.
 
 All feedback

Re: [SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Edward Ribeiro
+1. Sounds really nice to have feature. Let's open a ticket and open a PR.
:)

Ed

Em qua, 9 de mai de 2018 11:15, Norbert Kalmar 
escreveu:

> Hi,
>
> I just got a tip that we could improve on the logging in ZooKeeper. After a
> ZK crash, or client timeout sometimes it's hard to determine from the logs
> what happened. Knowing if ZK was responsive at the time would help a lot.
> For example, ZK might spend a lot of time waiting on GC (there is still
> some misconception that ZK is a storage).
>
> To help detect this, HADOOP already has a great tool called JVM Pause
> Monitor. (As the name suggest, it can be also used for monitoring, but it
> also helps post-mortem in a lot of cases). Basically it has a daemon that
> sleeps for one second, and if the sleep time exceeds the 1s by more than
> the threshold (1s: INFO, 10s: WARN by default - this can be configurable in
> our case, see below), it will alert/make a log entry. It can also monitor
> the time GC took.
>
> Now, this class is in the HADOOP-common. I wouldn't want to depend on
> Hadoop-common because of this one feature/class (it is actually a single
> class). Since this is a straightforward implementation, and in the past
> five years the few commits it had is nothing really serious, I think we
> could just copy this class in ZooKeeper, and introduce it as a configurable
> feature, by default it can be off.
>
> The class:
>
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java
>
> What do You think?
>
> Regards,
> Norbert
>


[SUGGESTION] JvmPauseMonitor in ZooKeeper

2018-05-09 Thread Norbert Kalmar
Hi,

I just got a tip that we could improve on the logging in ZooKeeper. After a
ZK crash, or client timeout sometimes it's hard to determine from the logs
what happened. Knowing if ZK was responsive at the time would help a lot.
For example, ZK might spend a lot of time waiting on GC (there is still
some misconception that ZK is a storage).

To help detect this, HADOOP already has a great tool called JVM Pause
Monitor. (As the name suggest, it can be also used for monitoring, but it
also helps post-mortem in a lot of cases). Basically it has a daemon that
sleeps for one second, and if the sleep time exceeds the 1s by more than
the threshold (1s: INFO, 10s: WARN by default - this can be configurable in
our case, see below), it will alert/make a log entry. It can also monitor
the time GC took.

Now, this class is in the HADOOP-common. I wouldn't want to depend on
Hadoop-common because of this one feature/class (it is actually a single
class). Since this is a straightforward implementation, and in the past
five years the few commits it had is nothing really serious, I think we
could just copy this class in ZooKeeper, and introduce it as a configurable
feature, by default it can be off.

The class:
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/JvmPauseMonitor.java

What do You think?

Regards,
Norbert


Re: Discover LEADER from JMX

2018-05-09 Thread Edward Ribeiro
Sent before finishing the previous email. Only to complement, the
findLeader() could have been as below, but this change is only a nitty
detail and totally irrelevant to the questions you are asking. :)

/**
 * Returns the address of the node we think is the leader.
 */
protected QuorumServer findLeader() {

// Find the leader by id
long currentLeader = self.getCurrentVote().getId();

QuorumServer leaderServer = self.getView().get(currentLeader);

if (leaderServer == null) {
LOG.warn("Couldn't find the leader with id = {}", currentLeader);
}
return leaderServer;
}

Edward

On Wed, May 9, 2018 at 9:29 AM, Edward Ribeiro 
wrote:

> Hi Enrico,
>
> Well, I am not an expert on QuorumPeer either (not an expert on anything,
> really), but maybe it's the variable and method below?
>
> - QuorumPeer --
>
> /**
>  * This is who I think the leader currently is.
>  */
> volatile private Vote currentVote;
>
> public synchronized Vote getCurrentVote(){
> return currentVote;
> }
>
> ---
>
>
> Then it's a matter of calling quorumPeer.getCurrentVote().getId() and
> quorumPeer.getServerState()?
>
> Btw, the Learner class has this handy method below (self is a QuorumPeer):
>
>  Learner 
>
> /**
>  * Returns the address of the node we think is the leader.
>  */
> protected QuorumServer findLeader() {
> QuorumServer leaderServer = null;
> // Find the leader by id
> Vote current = self.getCurrentVote();
> for (QuorumServer s : self.getView().values()) {
> if (s.id == current.getId()) {
> leaderServer = s;
> break;
> }
> }
> if (leaderServer == null) {
> LOG.warn("Couldn't find the leader with id = "
> + current.getId());
> }
> return leaderServer;
> }
>
> ---
>
> By the way, as a side note, the map traversal could be changed by:
>
> 
>
> if (self.getView().contains(current.getId()) {
>
> }
>
> ---
>
>
>
> You can see above this method the quorumPeer.getView() returns a Map QuorumServer> as below:
>
> -QuorumPeer -
>
> /**
>  * A 'view' is a node's current opinion of the membership of the entire
>  * ensemble.
>  */
> public Map getView() {
> return Collections.unmodifiableMap(getQuorumVerifier().
> getAllMembers());
> }
>
> -
>
>
> And then it retrieves the QuorumServer that has many more information
> about the node besides the sid (InetSocketAddress, hostname, etc). :)
>
>
> Cheers,
> Edward
>
> On Wed, May 9, 2018 at 8:50 AM, Enrico Olivelli 
> wrote:
>
>> So I am trying to create a patch in order to expose on JMX the id of the
>> current "leader" (on the JVM of a follower)
>>
>> I am trying to find in ZK which is the variable which holds the ID of the
>> current leader.
>> I am new to the internal of QuorumPeer
>>
>> Can someone give me some hint ?
>>
>> Enrico
>>
>> Il giorno mar 8 mag 2018 alle ore 10:08 Ansel Zandegran <
>> ansel.zandeg...@infor.com> ha scritto:
>>
>> > Hi,
>> > That is possible with 4 letter commands. We are using it now. In 3.5.x
>> it
>> > is going to be removed in favour of admin server (embedded web server).
>> > We are running in an environment where it’s not possible to run JMX or
>> > embedded web servers.
>> >
>> > So I am wondering if there is another way? It would be nice to have this
>> > info as a znode.
>> >
>> > Best regards,
>> > Ansel
>> >
>> > > On 8 May 2018, at 09:55, Flavio Junqueira  wrote:
>> > >
>> > > Hi Enrico,
>> > >
>> > > You can determine the state of a server it via 4-letter commands.
>> Would
>> > that work for you?
>> > >
>> > > -Flavio
>> > >
>> > >> On 8 May 2018, at 09:09, Enrico Olivelli 
>> wrote:
>> > >>
>> > >> Hi,
>> > >> is there any way to see in JMX which is the leader of a ZooKeeper
>> > cluster?
>> > >>
>> > >> My problem is: given access to any of the nodes of the cluster I
>> want to
>> > >> know from JMX which is the current leader.
>> > >> It seems to me that this information is not available, you can know
>> > only if
>> > >> the local node is Leader or Follower.
>> > >>
>> > >> Cheers
>> > >> Enrico
>> > >
>> >
>> >
>>
>
>


Re: Discover LEADER from JMX

2018-05-09 Thread Edward Ribeiro
Hi Enrico,

Well, I am not an expert on QuorumPeer either (not an expert on anything,
really), but maybe it's the variable and method below?

- QuorumPeer --

/**
 * This is who I think the leader currently is.
 */
volatile private Vote currentVote;

public synchronized Vote getCurrentVote(){
return currentVote;
}

---


Then it's a matter of calling quorumPeer.getCurrentVote().getId() and
quorumPeer.getServerState()?

Btw, the Learner class has this handy method below (self is a QuorumPeer):

 Learner 

/**
 * Returns the address of the node we think is the leader.
 */
protected QuorumServer findLeader() {
QuorumServer leaderServer = null;
// Find the leader by id
Vote current = self.getCurrentVote();
for (QuorumServer s : self.getView().values()) {
if (s.id == current.getId()) {
leaderServer = s;
break;
}
}
if (leaderServer == null) {
LOG.warn("Couldn't find the leader with id = "
+ current.getId());
}
return leaderServer;
}

---

By the way, as a side note, the map traversal could be changed by:



if (self.getView().contains(current.getId()) {

}

---



You can see above this method the quorumPeer.getView() returns a Map as below:

-QuorumPeer -

/**
 * A 'view' is a node's current opinion of the membership of the entire
 * ensemble.
 */
public Map getView() {
return Collections.unmodifiableMap(getQuorumVerifier().getAllMembers());
}

-


And then it retrieves the QuorumServer that has many more information about
the node besides the sid (InetSocketAddress, hostname, etc). :)


Cheers,
Edward

On Wed, May 9, 2018 at 8:50 AM, Enrico Olivelli  wrote:

> So I am trying to create a patch in order to expose on JMX the id of the
> current "leader" (on the JVM of a follower)
>
> I am trying to find in ZK which is the variable which holds the ID of the
> current leader.
> I am new to the internal of QuorumPeer
>
> Can someone give me some hint ?
>
> Enrico
>
> Il giorno mar 8 mag 2018 alle ore 10:08 Ansel Zandegran <
> ansel.zandeg...@infor.com> ha scritto:
>
> > Hi,
> > That is possible with 4 letter commands. We are using it now. In 3.5.x it
> > is going to be removed in favour of admin server (embedded web server).
> > We are running in an environment where it’s not possible to run JMX or
> > embedded web servers.
> >
> > So I am wondering if there is another way? It would be nice to have this
> > info as a znode.
> >
> > Best regards,
> > Ansel
> >
> > > On 8 May 2018, at 09:55, Flavio Junqueira  wrote:
> > >
> > > Hi Enrico,
> > >
> > > You can determine the state of a server it via 4-letter commands. Would
> > that work for you?
> > >
> > > -Flavio
> > >
> > >> On 8 May 2018, at 09:09, Enrico Olivelli  wrote:
> > >>
> > >> Hi,
> > >> is there any way to see in JMX which is the leader of a ZooKeeper
> > cluster?
> > >>
> > >> My problem is: given access to any of the nodes of the cluster I want
> to
> > >> know from JMX which is the current leader.
> > >> It seems to me that this information is not available, you can know
> > only if
> > >> the local node is Leader or Follower.
> > >>
> > >> Cheers
> > >> Enrico
> > >
> >
> >
>


Re: Discover LEADER from JMX

2018-05-09 Thread Enrico Olivelli
So I am trying to create a patch in order to expose on JMX the id of the
current "leader" (on the JVM of a follower)

I am trying to find in ZK which is the variable which holds the ID of the
current leader.
I am new to the internal of QuorumPeer

Can someone give me some hint ?

Enrico

Il giorno mar 8 mag 2018 alle ore 10:08 Ansel Zandegran <
ansel.zandeg...@infor.com> ha scritto:

> Hi,
> That is possible with 4 letter commands. We are using it now. In 3.5.x it
> is going to be removed in favour of admin server (embedded web server).
> We are running in an environment where it’s not possible to run JMX or
> embedded web servers.
>
> So I am wondering if there is another way? It would be nice to have this
> info as a znode.
>
> Best regards,
> Ansel
>
> > On 8 May 2018, at 09:55, Flavio Junqueira  wrote:
> >
> > Hi Enrico,
> >
> > You can determine the state of a server it via 4-letter commands. Would
> that work for you?
> >
> > -Flavio
> >
> >> On 8 May 2018, at 09:09, Enrico Olivelli  wrote:
> >>
> >> Hi,
> >> is there any way to see in JMX which is the leader of a ZooKeeper
> cluster?
> >>
> >> My problem is: given access to any of the nodes of the cluster I want to
> >> know from JMX which is the current leader.
> >> It seems to me that this information is not available, you can know
> only if
> >> the local node is Leader or Follower.
> >>
> >> Cheers
> >> Enrico
> >
>
>


ZooKeeper_branch35_jdk8 - Build # 952 - Failure

2018-05-09 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk8/952/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 7.76 KB...]
[ivy:retrieve]  found commons-cli#commons-cli;1.2 in maven2
[ivy:retrieve]  found log4j#log4j;1.2.17 in maven2
[ivy:retrieve]  found org.apache.yetus#audience-annotations;0.5.0 in maven2
[ivy:retrieve]  found io.netty#netty;3.10.6.Final in maven2
[ivy:retrieve] :: resolution report :: resolve 615ms :: artifacts dl 33ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  default |   16  |   0   |   0   |   0   ||   16  |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.zookeeper#zookeeper
[ivy:retrieve]  confs: [default]
[ivy:retrieve]  16 artifacts copied, 0 already retrieved (4344kB/172ms)

clover.setup:

clover.info:

clover:

ivy-retrieve-javacc:
[mkdir] Created dir: 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch35_jdk8/build/javacc/lib
[ivy:retrieve] :: resolving dependencies :: 
org.apache.zookeeper#zookeeper;3.5.4-beta-SNAPSHOT
[ivy:retrieve]  confs: [javacc]
[ivy:retrieve]  found net.java.dev.javacc#javacc;5.0 in maven2
[ivy:retrieve] :: resolution report :: resolve 47ms :: artifacts dl 0ms
-
|  |modules||   artifacts   |
|   conf   | number| search|dwnlded|evicted|| number|dwnlded|
-
|  javacc  |   1   |   0   |   0   |   0   ||   1   |   0   |
-
[ivy:retrieve] :: retrieving :: org.apache.zookeeper#zookeeper
[ivy:retrieve]  confs: [javacc]
[ivy:retrieve]  1 artifacts copied, 0 already retrieved (291kB/3ms)

generate_jute_parser:
[mkdir] Created dir: 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch35_jdk8/build/jute_compiler/org/apache/jute/compiler/generated
[ivy:artifactproperty] DEPRECATED: 'ivy.conf.file' is deprecated, use 
'ivy.settings.file' instead
[ivy:artifactproperty] :: loading settings :: file = 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch35_jdk8/ivysettings.xml
 [move] Moving 1 file to 
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch35_jdk8/build/javacc/lib
   [javacc] Error occurred during initialization of VM
   [javacc] java.lang.OutOfMemoryError: unable to create new native thread
   [javacc] at java.lang.Thread.start0(Native Method)
   [javacc] at java.lang.Thread.start(Thread.java:717)
   [javacc] at java.lang.ref.Finalizer.(Finalizer.java:233)
   [javacc] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch35_jdk8/build.xml:311: 
/usr/local/asfpackages/java/jdk1.8.0_172/jre/bin/java failed with return code 1

Total time: 3 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Created] (ZOOKEEPER-3036) Unexpected exception in zookeeper

2018-05-09 Thread Oded (JIRA)
Oded created ZOOKEEPER-3036:
---

 Summary: Unexpected exception in zookeeper
 Key: ZOOKEEPER-3036
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
 Project: ZooKeeper
  Issue Type: Bug
  Components: jmx
Affects Versions: 3.4.10
 Environment: 3 Zookeepers, 5 kafka servers
Reporter: Oded


We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
cluster to fail:

2018-05-09 02:29:01,730 [myid:3] - ERROR 
[LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected exception 
causing shutdown while sock still open
java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
    at java.net.SocketInputStream.read(SocketInputStream.java:171)
    at java.net.SocketInputStream.read(SocketInputStream.java:141)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at 
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
    at 
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
    at 
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
    at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
2018-05-09 02:29:01,730 [myid:3] - WARN  
[LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE 
/192.168.0.91:42490 

 

We would expect that zookeeper will choose another Leader and the Kafka cluster 
will continue to work as expected, but that was not the case.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Question on merge script

2018-05-09 Thread Flavio Junqueira
Hey Michael,

I was trying to merge yesterday a PR generated against branch-3.5, and fetching 
the PR branch did not give me the merge script. I ended up asking the 
contributor to change the target branch to master so that I avoid any small 
hacks with the merge script.

We should consider doing the following two things, and let me know if it makes 
sense:
1- Clarifying that if a change is supposed to go to both branch-3.5 and master, 
the PR should be against master
2- Perhaps merging to branch-3.5 so that I see the script when I fetch a PR 
branch off branch-3.5. This is unusual, but it is not unreasonable that we have 
eventually PRs for branch-3.5 only.

I'm focusing on 3.5, but the same reasoning applies to 3.4.

-Flavio

 
> On 9 May 2018, at 01:49, Michael Han  wrote:
> 
> Hi Flavio,
> 
> The merge script is branch agnostic - it only cares about the pull request
> number. As long as in the pull request the correct target branch is
> specified, the merge script will do its job by merging the change to the
> specified target branch. I guess we could commit the same script to
> branch-3.5 but the current script in master should be able to do what you
> asked.
> 
> On Tue, May 8, 2018 at 4:06 PM, Flavio Junqueira  wrote:
> 
>> Could anyone remind me why we don't have the merge script on branch-3.5?
>> Say I have a change that targets branch-3.5 alone. Shouldn't I be able to
>> have a PR that targets branch-3.5 and use the merge script?
>> 
>> Thanks,
>> -Flavio
> 
> 
> 
> 
> -- 
> Cheers
> Michael