date:20150530


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2197:
---
Attachment: ZOOKEEPER-2197.patch

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-05-30 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566149#comment-14566149
]

Hadoop QA commented on ZOOKEEPER-2163:
--

+1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12736363/zookeeper-2163.11.patch
against trunk revision 1682623.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 5 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724//console

This message is automatically generated.

Introduce new ZNode type: container
---

Key: ZOOKEEPER-2163
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
Project: ZooKeeper
Issue Type: New Feature
Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman
Fix For: 3.6.0

Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch,
zookeeper-2163.3.patch, zookeeper-2163.5.patch, zookeeper-2163.6.patch,
zookeeper-2163.7.patch, zookeeper-2163.8.patch, zookeeper-2163.9.patch

BACKGROUND

A recurring problem for ZooKeeper users is garbage collection of parent
nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a
parent node under which participants create sequential nodes. When the
participant is done, it deletes its node. In practice, the ZooKeeper tree
begins to fill up with orphaned parent nodes that are no longer needed. The
ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can
become unstable due to the number of these nodes.
CURRENT SOLUTIONS
===
Apache Curator has a workaround solution for this by providing the Reaper
class which runs in the background looking for orphaned parent nodes and
deleting them. This isn’t ideal and it would be better if ZooKeeper supported
this directly.
PROPOSAL
=
ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes
to contain child nodes. This is not optimum as EPHEMERALs are tied to a
session and the general use case of parent nodes is for PERSISTENT nodes.
This proposal adds a new node type, CONTAINER. A CONTAINER node is the same
as a PERSISTENT node with the additional property that when its last child is
deleted, it is deleted (and CONTAINER nodes recursively up the tree are
deleted if empty).
CANONICAL USAGE

{code}
while ( true) { // or some reasonable limit
try {
zk.create(path, ...);
break;
} catch ( KeeperException.NoNodeException e ) {
try {
zk.createContainer(containerPath, ...);
} catch ( KeeperException.NodeExistsException ignore) {
}
}
}
{code}

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Success: ZOOKEEPER-2163 PreCommit Build #2724

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 371198 lines...]
 [exec] +1 overall.  Here are the results of testing the latest attachment 
 [exec]   
http://issues.apache.org/jira/secure/attachment/12736363/zookeeper-2163.11.patch
 [exec]   against trunk revision 1682623.
 [exec] 
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec] 
 [exec] +1 tests included.  The patch appears to include 5 new or 
modified tests.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 2.0.3) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2724//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 6d7ac2dcefac26fcc81ef3e8bcb0b204b6ab875f logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD SUCCESSFUL
Total time: 13 minutes 30 seconds
Archiving artifacts
Sending artifact delta relative to PreCommit-ZOOKEEPER-Build #2723
Archived 24 artifacts
Archive block size is 32768
Received 6 blocks and 33460976 bytes
Compression is 0.6%
Took 11 sec
Recording test results
Description set: ZOOKEEPER-2163
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (ZOOKEEPER-2163) Introduce new ZNode type: container

2015-05-30 Thread Jordan Zimmerman (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan Zimmerman updated ZOOKEEPER-2163:

Attachment: zookeeper-2163.11.patch

Last minute nits/reformats

 Introduce new ZNode type: container
 ---

 Key: ZOOKEEPER-2163
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2163
 Project: ZooKeeper
  Issue Type: New Feature
  Components: c client, java client, server
Affects Versions: 3.5.0
Reporter: Jordan Zimmerman
Assignee: Jordan Zimmerman
 Fix For: 3.6.0

 Attachments: zookeeper-2163.10.patch, zookeeper-2163.11.patch, 
 zookeeper-2163.3.patch, zookeeper-2163.5.patch, zookeeper-2163.6.patch, 
 zookeeper-2163.7.patch, zookeeper-2163.8.patch, zookeeper-2163.9.patch


 BACKGROUND
 
 A recurring problem for ZooKeeper users is garbage collection of parent 
 nodes. Many recipes (e.g. locks, leaders, etc.) call for the creation of a 
 parent node under which participants create sequential nodes. When the 
 participant is done, it deletes its node. In practice, the ZooKeeper tree 
 begins to fill up with orphaned parent nodes that are no longer needed. The 
 ZooKeeper APIs don’t provide a way to clean these. Over time, ZooKeeper can 
 become unstable due to the number of these nodes.
 CURRENT SOLUTIONS
 ===
 Apache Curator has a workaround solution for this by providing the Reaper 
 class which runs in the background looking for orphaned parent nodes and 
 deleting them. This isn’t ideal and it would be better if ZooKeeper supported 
 this directly.
 PROPOSAL
 =
 ZOOKEEPER-723 and ZOOKEEPER-834 have been proposed to allow EPHEMERAL nodes 
 to contain child nodes. This is not optimum as EPHEMERALs are tied to a 
 session and the general use case of parent nodes is for PERSISTENT nodes. 
 This proposal adds a new node type, CONTAINER. A CONTAINER node is the same 
 as a PERSISTENT node with the additional property that when its last child is 
 deleted, it is deleted (and CONTAINER nodes recursively up the tree are 
 deleted if empty).
 CANONICAL USAGE
 
 {code}
 while ( true) { // or some reasonable limit
 try {
 zk.create(path, ...);
 break;
 } catch ( KeeperException.NoNodeException e ) {
 try {
 zk.createContainer(containerPath, ...);
 } catch ( KeeperException.NodeExistsException ignore) {
}
 }
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

ZooKeeper-trunk-openjdk7 - Build # 826 - Still Failing

See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/826/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 369155 lines...]
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-30 19:56:48,046 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:41797:LearnerHandler@879] - Ignoring unexpected 
exception
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-30 19:56:48,045 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:41810:LearnerHandler@879] - Ignoring unexpected 
exception
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-30 19:56:48,047 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-30 19:56:48,046 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:14053:NIOServerCnxnFactory$AcceptThread@219]
 - accept thread exitted run method
[junit] 2015-05-30 19:56:48,046 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-30 19:56:48,048 [myid:] - INFO  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id5,name1=replica.5,name2=Leader]
[junit] 2015-05-30 19:56:48,048 [myid:] - INFO  
[/127.0.0.1:14055:QuorumCnxManager$Listener@659] - Leaving listener
[junit] 2015-05-30 19:56:48,048 [myid:] - WARN  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):QuorumPeer@1039]
 - Unexpected exception
[junit] java.lang.InterruptedException
[junit] at java.lang.Object.wait(Native Method)
[junit] at 
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:559)
[junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1036)
[junit] 2015-05-30 19:56:48,048 [myid:] - INFO  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):Leader@613] 
- Shutting down
[junit] 2015-05-30 19:56:48,048 [myid:] - WARN  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):QuorumPeer@1070]
 - PeerState set to LOOKING
[junit] 2015-05-30 19:56:48,048 [myid:] - WARN  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):QuorumPeer@1052]
 - QuorumPeer main thread exited
[junit] 2015-05-30 19:56:48,048 [myid:] - INFO  [main:QuorumUtil@254] - 
Shutting down leader election 
QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled)
[junit] 2015-05-30 19:56:48,049 [myid:] - INFO  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id5]
[junit] 2015-05-30 19:56:48,049 [myid:] - INFO  [main:QuorumUtil@259] - 
Waiting for QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled) 
to exit thread
[junit] 2015-05-30 19:56:48,049 [myid:] - INFO  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id5,name1=replica.5]
[junit] 2015-05-30 19:56:48,049 [myid:] - INFO  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id5,name1=replica.1]
[junit] 2015-05-30 19:56:48,049 [myid:] - INFO  
[QuorumPeer[myid=5](plain=/0:0:0:0:0:0:0:0:14053)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id5,name1=replica.2]
[junit] 2015-05-30

Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

2015-05-30 Thread Michi Mutsuzaki

Ok, since the vote didn't pass anyways, let's fix these problems:

1. Change the default test.junit.thread to 1. Chris, could you submit
a patch for this?
2. Fix the comment in FinalRequestProcessor.java. I'll submit a patch.

Let me know if you guys have seen any other problems. Also, please let
me know if the voting period of 2 weeks was too short. I'd like to
make sure everybody gets enough time to vote.

On Sat, May 30, 2015 at 8:55 AM, Flavio Junqueira
fpjunque...@yahoo.com.invalid wrote:
Another thing that is possibly not a reason to drop the config, but I'm
getting this with this RC:

[javac]
/home/fpj/code/zookeeper-3.5.1-alpha/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134:
error: unmappable character for encoding ASCII
[javac] // was not being queued ??? ZOOKEEPER-558) properly. This
happens, for example,

It is a trivial problem to solve, but it does generate a compilation error
for me.

-Flavio

On 30 May 2015, at 15:26, Flavio Junqueira fpjunque...@yahoo.com.INVALID
wrote:

I don't see a reason to -1 the release just because of the number of threads
junit is using. I've been a bit distracted with other things, but I'm coming
back to the release candidate now.

-Flavio

On 23 May 2015, at 22:09, Michi Mutsuzaki mutsuz...@gmail.com wrote:

I can go either way. Flavio, do you think we should set the default
test.junit.threads to 1 and create another release candidate?

On Fri, May 22, 2015 at 5:08 PM, Chris Nauroth cnaur...@hortonworks.com
wrote:
I haven't been able to repro this locally. Here are the details on my
Ubuntu VM:

uname -a
Linux ubuntu 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14
UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

java -version
java version 1.8.0_45
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

ant -version
Apache Ant(TM) version 1.9.4 compiled on April 29 2014

I'm getting 100% passing test runs with multiple concurrent JUnit
processes, including the tests that you mentioned were failing in your
environment.

I don't have any immediate ideas for what to try next. Everything has
been working well on Jenkins and multiple dev machines, so it seems like
there is some subtle environmental difference in this VM that I didn't
handle in the ZOOKEEPER-2183 patch.

Is this problematic for the release candidate? If so, then I recommend
doing a quick change to set the default test.junit.threads to 1 in
build.xml. That would restore the old single-process testing behavior.
We can change test-patch.sh to pass -Dtest.junit.threads=8 on the command
line, so we'll still get speedy pre-commit runs on Jenkins where it is
working well. We all can do the same when we run ant locally too. Let me
know if this is important, and I can put together a patch quickly.

Thanks!

--Chris Nauroth

From: Flavio Junqueira
fpjunque...@yahoo.commailto:fpjunque...@yahoo.com
Date: Friday, May 22, 2015 at 3:37 PM
To: Chris Nauroth
cnaur...@hortonworks.commailto:cnaur...@hortonworks.com
Cc: Zookeeper dev@zookeeper.apache.orgmailto:dev@zookeeper.apache.org
Subject: Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

That's the range I get in the vm. I also checked the load from log test
and the port it was trying to bind to is 11222.

-Flavio

On 22 May 2015, at 23:14, Chris Nauroth
cnaur...@hortonworks.commailto:cnaur...@hortonworks.com wrote:

No worries on the delay. Thank you for sharing.

That's interesting. The symptoms look similar to something we had seen
from an earlier iteration of the ZOOKEEPER-2183 patch that was assigning
ports from the ephemeral port range. This would cause a brief (but
noticeable) window in which the OS could assign the same ephemeral port to
a client socket while a server test still held onto that port assignment.
It was particularly noticeable for tests that stop and restart a server on
the same port, such as tests covering client reconnect logic. In the
final committed version of the ZOOKEEPER-2183 patch, I excluded the
ephemeral port range from use by port assignment. Typically, that's 32768
- 61000 on Linux.

Is it possible that this VM is configured to use a different ephemeral
port range? Here is what I get from recent stock Ubuntu and CentOS
installs:

cat /proc/sys/net/ipv4/ip_local_port_range
32768 61000

--Chris Nauroth

From: Flavio Junqueira
fpjunque...@yahoo.commailto:fpjunque...@yahoo.com
Date: Friday, May 22, 2015 at 2:47 PM
To: Chris Nauroth
cnaur...@hortonworks.commailto:cnaur...@hortonworks.com
Cc: Zookeeper dev@zookeeper.apache.orgmailto:dev@zookeeper.apache.org
Subject: Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

Sorry about the delay, here are the logs:

http://people.apache.org/~fpj/logs-3.5.1-rc1/

the load test is giving bind exceptions.

-Flavio

On 21 May 2015, at

[jira] [Created] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

Michi Mutsuzaki created ZOOKEEPER-2197:
--

 Summary: non-ascii character in FinalRequestProcessor.java
 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0


src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
error: unmappable character for encoding ASCII
[javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Failed: ZOOKEEPER-2197 PreCommit Build #2725

Jira: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 374520 lines...]
 [exec] 
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no new tests are needed 
for this patch.
 [exec] Also please list what manual steps were 
performed to verify this patch.
 [exec] 
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec] 
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 2.0.3) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Comment added.
 [exec] 6d2b4da58ce2891a2db5450a4c33242e44b2b170 logged out
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/build.xml:1782:
 exec returned: 1

Total time: 13 minutes 19 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Sending artifact delta relative to PreCommit-ZOOKEEPER-Build #2724
Archived 24 artifacts
Archive block size is 32768
Received 4 blocks and 33817476 bytes
Compression is 0.4%
Took 12 sec
Recording test results
Description set: ZOOKEEPER-2197
Email was triggered for: Failure
Sending email for trigger: Failure



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Commented] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java

2015-05-30 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566189#comment-14566189
]

Hadoop QA commented on ZOOKEEPER-2197:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12736371/ZOOKEEPER-2197.patch
against trunk revision 1682623.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3)
warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2725//console

This message is automatically generated.

non-ascii character in FinalRequestProcessor.java
-

Key: ZOOKEEPER-2197
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
Project: ZooKeeper
Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
Fix For: 3.5.1, 3.6.0

Attachments: ZOOKEEPER-2197.patch

src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134:
error: unmappable character for encoding ASCII
[javac] // was not being queued ??? ZOOKEEPER-558) properly. This
happens, for example,

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict

2015-05-30 Thread Hongchao Deng (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566120#comment-14566120
 ] 

Hongchao Deng commented on ZOOKEEPER-2189:
--

Hi [~suda].

I committed ZOOKEEPER-2098 but mistakenly wrote the commit message to be 
ZOOKEEPER-2189. Would you mind to open another JIRA and grant this JIRA number 
to me. Thanks!


 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict

2015-05-30 Thread Alexander Shraer (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566164#comment-14566164
 ] 

Alexander Shraer commented on ZOOKEEPER-2189:
-

Verifying acks as proposed in this JIRA will not solve this issue. Acks from 
observers are not required to elect a leader. If standaloneenabled=false server 
3 can be elected without seeing any other messages. 

Also, suppose you wrote the wrong ports for the other servers ? 

It seems that to fix such errors one needs some kind of config registry. 


 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (ZOOKEEPER-2198) Set default test,junit.threads to 1.

2015-05-30 Thread Chris Nauroth (JIRA)

Chris Nauroth created ZOOKEEPER-2198:


 Summary: Set default test,junit.threads to 1.
 Key: ZOOKEEPER-2198
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2198
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.5.1, 3.6.0


Some systems are seeing test failures under concurrent execution.  This issue 
proposes to change the default {{test.junit.threads}} to 1 so that those 
environments continue to get consistent test runs.  Jenkins and individual 
developer environments can set multiple threads with a command line argument, 
so most environments will still get the benefit of faster test runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566313#comment-14566313
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2197:


that sounds fine. i guess we can set the encoding in build.xml?

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566302#comment-14566302
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2193:


Thank you for the patch 


 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Issue Comment Deleted] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michi Mutsuzaki updated ZOOKEEPER-2193:
---
Comment: was deleted

(was: Thank you for the patch 
)

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2193) reconfig command completes even if parameter is wrong obviously

2015-05-30 Thread Alexander Shraer (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566329#comment-14566329
 ] 

Alexander Shraer commented on ZOOKEEPER-2193:
-

3 more minor comments:

1) I'm not sure that my == null can ever happen both because of the checks in 
the calling function and because the exclude... function also excludes null. 

2) Perhaps rename existing to something else since its not only existing its 
also the joiners that were processed before. For example if someone is adding 
multiple servers with the same command. Similarly the message in the thrown 
exception shouldn't say that the conflict is with one of the existing servers 
because it may be with one of the new ones. 

3) Consider making the message in the exception more specific - such as port x 
of server #y conflicts with port x of server #z. 

 reconfig command completes even if parameter is wrong obviously
 ---

 Key: ZOOKEEPER-2193
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2193
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server
Affects Versions: 3.5.0
 Environment: CentOS7 + Java7
Reporter: Yasuhito Fukuda
Assignee: Yasuhito Fukuda
 Attachments: ZOOKEEPER-2193-v2.patch, ZOOKEEPER-2193-v3.patch, 
 ZOOKEEPER-2193.patch


 Even if reconfig parameter is wrong, it was confirmed to complete.
 refer to the following.
 - Ensemble consists of four nodes
 {noformat}
 [zk: vm-101:2181(CONNECTED) 0] config
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 version=1
 {noformat}
 - add node by reconfig command
 {noformat}
 [zk: vm-101:2181(CONNECTED) 9] reconfig -add 
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 Committed new configuration:
 server.1=192.168.100.101:2888:3888:participant
 server.2=192.168.100.102:2888:3888:participant
 server.3=192.168.100.103:2888:3888:participant
 server.4=192.168.100.104:2888:3888:participant
 server.5=192.168.100.104:2888:3888:participant;0.0.0.0:2181
 version=30007
 {noformat}
 server.4 and server.5 of the IP address is a duplicate.
 In this state, reader election will not work properly.
 Besides, it is assumed an ensemble will be undesirable state.
 I think that need a parameter validation when reconfig.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2197) non-ascii character in FinalRequestProcessor.java


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566305#comment-14566305
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2197:
---

[~michim], [~fpj]: hmm, how about using -Dfile.encoding=utf8?

 non-ascii character in FinalRequestProcessor.java
 -

 Key: ZOOKEEPER-2197
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2197
 Project: ZooKeeper
  Issue Type: Bug
Reporter: Michi Mutsuzaki
Assignee: Michi Mutsuzaki
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2197.patch


 src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java:134: 
 error: unmappable character for encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558) properly. This 
 happens, for example,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2198) Set default test.junit.threads to 1.

2015-05-30 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated ZOOKEEPER-2198:
-
Summary: Set default test.junit.threads to 1.  (was: Set default 
test,junit.threads to 1.)

 Set default test.junit.threads to 1.
 

 Key: ZOOKEEPER-2198
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2198
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2198.001.patch


 Some systems are seeing test failures under concurrent execution.  This issue 
 proposes to change the default {{test.junit.threads}} to 1 so that those 
 environments continue to get consistent test runs.  Jenkins and individual 
 developer environments can set multiple threads with a command line argument, 
 so most environments will still get the benefit of faster test runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (ZOOKEEPER-2198) Set default test,junit.threads to 1.

2015-05-30 Thread Chris Nauroth (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated ZOOKEEPER-2198:
-
Attachment: ZOOKEEPER-2198.001.patch

 Set default test,junit.threads to 1.
 

 Key: ZOOKEEPER-2198
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2198
 Project: ZooKeeper
  Issue Type: Bug
  Components: build
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2198.001.patch


 Some systems are seeing test failures under concurrent execution.  This issue 
 proposes to change the default {{test.junit.threads}} to 1 so that those 
 environments continue to get consistent test runs.  Jenkins and individual 
 developer environments can set multiple threads with a command line argument, 
 so most environments will still get the benefit of faster test runs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

2015-05-30 Thread Chris Nauroth

Thank you, Michi.  I filed a patch for this on ZOOKEEPER-2198.

--Chris Nauroth




On 5/30/15, 1:19 PM, Michi Mutsuzaki mutsuz...@gmail.com wrote:

Ok, since the vote didn't pass anyways, let's fix these problems:

1. Change the default test.junit.thread to 1. Chris, could you submit
a patch for this?
2. Fix the comment in FinalRequestProcessor.java. I'll submit a patch.

Let me know if you guys have seen any other problems. Also, please let
me know if the voting period of 2 weeks was too short. I'd like to
make sure everybody gets enough time to vote.


On Sat, May 30, 2015 at 8:55 AM, Flavio Junqueira
fpjunque...@yahoo.com.invalid wrote:
 Another thing that is possibly not a reason to drop the config, but I'm
getting this with this RC:

 [javac] 
/home/fpj/code/zookeeper-3.5.1-alpha/src/java/main/org/apache/zookeeper/s
erver/FinalRequestProcessor.java:134: error: unmappable character for
encoding ASCII
 [javac] // was not being queued ??? ZOOKEEPER-558)
properly. This happens, for example,

 It is a trivial problem to solve, but it does generate a compilation
error for me.

 -Flavio

 On 30 May 2015, at 15:26, Flavio Junqueira
fpjunque...@yahoo.com.INVALID wrote:

 I don't see a reason to -1 the release just because of the number of
threads junit is using. I've been a bit distracted with other things,
but I'm coming back to the release candidate now.

 -Flavio


 On 23 May 2015, at 22:09, Michi Mutsuzaki mutsuz...@gmail.com wrote:

 I can go either way. Flavio, do you think we should set the default
 test.junit.threads to 1 and create another release candidate?

 On Fri, May 22, 2015 at 5:08 PM, Chris Nauroth
cnaur...@hortonworks.com wrote:
 I haven't been able to repro this locally.  Here are the details on
my Ubuntu VM:

 uname -a
 Linux ubuntu 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15
17:43:14 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

 java -version
 java version 1.8.0_45
 Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

 ant -version
 Apache Ant(TM) version 1.9.4 compiled on April 29 2014

 I'm getting 100% passing test runs with multiple concurrent JUnit
processes, including the tests that you mentioned were failing in
your environment.

 I don't have any immediate ideas for what to try next.  Everything
has been working well on Jenkins and multiple dev machines, so it
seems like there is some subtle environmental difference in this VM
that I didn't handle in the ZOOKEEPER-2183 patch.

 Is this problematic for the release candidate?  If so, then I
recommend doing a quick change to set the default test.junit.threads
to 1 in build.xml.  That would restore the old single-process testing
behavior.  We can change test-patch.sh to pass -Dtest.junit.threads=8
on the command line, so we'll still get speedy pre-commit runs on
Jenkins where it is working well.  We all can do the same when we run
ant locally too.  Let me know if this is important, and I can put
together a patch quickly.

 Thanks!

 --Chris Nauroth

 From: Flavio Junqueira
fpjunque...@yahoo.commailto:fpjunque...@yahoo.com
 Date: Friday, May 22, 2015 at 3:37 PM
 To: Chris Nauroth
cnaur...@hortonworks.commailto:cnaur...@hortonworks.com
 Cc: Zookeeper
dev@zookeeper.apache.orgmailto:dev@zookeeper.apache.org
 Subject: Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

 That's the range I get in the vm. I also checked the load from log
test and the port it was trying to bind to is 11222.

 -Flavio

 On 22 May 2015, at 23:14, Chris Nauroth
cnaur...@hortonworks.commailto:cnaur...@hortonworks.com wrote:

 No worries on the delay.  Thank you for sharing.

 That's interesting.  The symptoms look similar to something we had
seen from an earlier iteration of the ZOOKEEPER-2183 patch that was
assigning ports from the ephemeral port range.  This would cause a
brief (but noticeable) window in which the OS could assign the same
ephemeral port to a client socket while a server test still held onto
that port assignment.  It was particularly noticeable for tests that
stop and restart a server on the same port, such as tests covering
client reconnect logic.  In the final committed version of the
ZOOKEEPER-2183 patch, I excluded the ephemeral port range from use by
port assignment.  Typically, that's 32768 - 61000 on Linux.

 Is it possible that this VM is configured to use a different
ephemeral port range?  Here is what I get from recent stock Ubuntu
and CentOS installs:

 cat /proc/sys/net/ipv4/ip_local_port_range
 32768 61000

 --Chris Nauroth

 From: Flavio Junqueira
fpjunque...@yahoo.commailto:fpjunque...@yahoo.com
 Date: Friday, May 22, 2015 at 2:47 PM
 To: Chris Nauroth
cnaur...@hortonworks.commailto:cnaur...@hortonworks.com
 Cc: Zookeeper
dev@zookeeper.apache.orgmailto:dev@zookeeper.apache.org
 Subject: Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

 Sorry about the delay, here are the logs:

 http://people.apache.org/~fpj/logs-3.5.1-rc1/

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566340#comment-14566340
 ] 

Michi Mutsuzaki commented on ZOOKEEPER-2172:


I'm guessing node1 is hitting this case?

https://github.com/apache/zookeeper/blob/76bb6747c8250f28157636cf4011b78e7569727a/src/java/main/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L332

In this case we don't log the message that gets sent out.

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-30 Thread Alexander Shraer (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565874#comment-14565874
 ] 

Alexander Shraer commented on ZOOKEEPER-2172:
-

can you post the logs from the run you mention where the client doesn't 
disconnect ?

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

ZooKeeper-trunk - Build # 2707 - Failure

See https://builds.apache.org/job/ZooKeeper-trunk/2707/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 372051 lines...]
[junit] 2015-05-30 10:40:16,900 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:58393:LearnerHandler@879] - Ignoring unexpected 
exception
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-30 10:40:16,902 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-30 10:40:16,902 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-30 10:40:16,903 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:11358:NIOServerCnxnFactory$AcceptThread@219]
 - accept thread exitted run method
[junit] 2015-05-30 10:40:16,903 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=Leader]
[junit] 2015-05-30 10:40:16,904 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):QuorumPeer@1039]
 - Unexpected exception
[junit] java.lang.InterruptedException
[junit] at java.lang.Object.wait(Native Method)
[junit] at 
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:559)
[junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1036)
[junit] 2015-05-30 10:40:16,904 [myid:] - INFO  
[localhost/127.0.0.1:11365:QuorumCnxManager$Listener@659] - Leaving listener
[junit] 2015-05-30 10:40:16,904 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):Leader@613] 
- Shutting down
[junit] 2015-05-30 10:40:16,904 [myid:] - INFO  [main:QuorumUtil@254] - 
Shutting down leader election 
QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled)
[junit] 2015-05-30 10:40:16,904 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):QuorumPeer@1070]
 - PeerState set to LOOKING
[junit] 2015-05-30 10:40:16,904 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):QuorumPeer@1052]
 - QuorumPeer main thread exited
[junit] 2015-05-30 10:40:16,904 [myid:] - INFO  [main:QuorumUtil@259] - 
Waiting for QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled) 
to exit thread
[junit] 2015-05-30 10:40:16,905 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3]
[junit] 2015-05-30 10:40:16,905 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3]
[junit] 2015-05-30 10:40:16,905 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.1]
[junit] 2015-05-30 10:40:16,905 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:11358)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.2]
[junit] 2015-05-30 10:40:16,905 [myid:] - INFO  
[main:FourLetterWordMain@63] - connecting to 127.0.0.1 11352
[junit] 2015-05-30 10:40:16,906 [myid:] - INFO  [main:QuorumUtil@243] - 
127.0.0.1:11352 is no longer accepting client connections
[junit] 2015-05-30 10:40:16,906 [myid:] - INFO  
[main:FourLetterWordMain@63] - connecting to 127.0.0.1 11355
[junit] 2015-05-30 10:40:16,906 [myid:] - INFO  [main:QuorumUtil@243] - 
127.0.0.1:11355 is no longer accepting client connections
[junit] 2015-05-30 10:40:16,906 [myid:] - INFO  
[main:FourLetterWordMain@63] - connecting to 127.0.0.1 11358
[junit] 2015-05-30 10:40:16,906 [myid:] - INFO  [main:QuorumUtil@243] - 
127.0.0.1:11358 is no longer accepting client connections
[junit] 2015-05-30 10:40:16,908 [myid:] - INFO  [main:ZKTestCase$1@65] - 
SUCCEEDED testPortChange
[junit] 2015-05-30 10:40:16,908 [myid:] - INFO  [main:ZKTestCase$1@60] - 
FINISHED

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict

2015-05-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565932#comment-14565932
 ] 

Hudson commented on ZOOKEEPER-2189:
---

FAILURE: Integrated in ZooKeeper-trunk #2707 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/2707/])
ZOOKEEPER-2189: QuorumCnxManager: use BufferedOutputStream for initial msg
(Raul Gutierrez Segales via hdeng) (hdeng: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1682558)
* /zookeeper/trunk/CHANGES.txt
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java


 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2179) Typo in Watcher.java

2015-05-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565933#comment-14565933
 ] 

Hudson commented on ZOOKEEPER-2179:
---

FAILURE: Integrated in ZooKeeper-trunk #2707 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/2707/])
ZOOKEEPER-2179: Typo in Watcher.java (Archana T via rgs) (rgs: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1682539)
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/Watcher.java


 Typo in Watcher.java
 

 Key: ZOOKEEPER-2179
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2179
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.4.5, 3.5.0
Reporter: Eunchan Kim
Priority: Trivial
 Fix For: 3.4.7, 3.5.0, 3.6.0

 Attachments: ZOOKEEPER-2179.patch


 at zookeeper/src/java/main/org/apache/zookeeper/Watcher.java,
  * implement. A ZooKeeper client will get various events from the ZooKeepr
 should be fixed to 
  * implement. A ZooKeeper client will get various events from the ZooKeeper.
 (Zookeepr - Zookeeper)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2187) remove duplicated code between CreateRequest{,2}

2015-05-30 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14565931#comment-14565931
 ] 

Hudson commented on ZOOKEEPER-2187:
---

FAILURE: Integrated in ZooKeeper-trunk #2707 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/2707/])
ZOOKEEPER-2187: remove duplicated code between CreateRequest{,2}
(Raul Gutierrez Segales via hdeng) (hdeng: 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1682521)
* /zookeeper/trunk/CHANGES.txt
* /zookeeper/trunk/src/c/src/zookeeper.c
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/MultiTransactionRecord.java
* /zookeeper/trunk/src/java/main/org/apache/zookeeper/ZooKeeper.java
* 
/zookeeper/trunk/src/java/main/org/apache/zookeeper/server/PrepRequestProcessor.java
* /zookeeper/trunk/src/zookeeper.jute


 remove duplicated code between CreateRequest{,2}
 

 Key: ZOOKEEPER-2187
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2187
 Project: ZooKeeper
  Issue Type: Bug
  Components: c client, java client, server
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
Priority: Minor
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2187.patch


 To avoid cargo culting and reducing duplicated code we can merge most of 
 CreateRequest  CreateRequest2 given that only the Response object is 
 actually different.
 This will improve readability of the code plus make it less confusing for 
 people adding new opcodes in the future (i.e.: copying a request definition 
 vs reusing what's already there, etc.). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

ZooKeeper_branch35_jdk7 - Build # 310 - Failure

See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/310/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 364664 lines...]
[junit] 2015-05-30 10:10:04,165 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:2:LearnerHandler@595] - *** GOODBYE 
/127.0.0.1:2 
[junit] 2015-05-30 10:10:04,165 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:3:LearnerHandler@595] - *** GOODBYE 
/127.0.0.1:3 
[junit] 2015-05-30 10:10:04,166 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:3:LearnerHandler@879] - Ignoring unexpected 
exception
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-30 10:10:04,166 [myid:] - INFO  
[ConnnectionExpirer:NIOServerCnxnFactory$ConnectionExpirerThread@583] - 
ConnnectionExpirerThread interrupted
[junit] 2015-05-30 10:10:04,167 [myid:] - WARN  
[LearnerHandler-/127.0.0.1:2:LearnerHandler@879] - Ignoring unexpected 
exception
[junit] java.lang.InterruptedException
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
[junit] at 
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:877)
[junit] at 
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:598)
[junit] 2015-05-30 10:10:04,168 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-1:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-30 10:10:04,168 [myid:] - INFO  
[NIOServerCxnFactory.AcceptThread:0.0.0.0/0.0.0.0:19442:NIOServerCnxnFactory$AcceptThread@219]
 - accept thread exitted run method
[junit] 2015-05-30 10:10:04,168 [myid:] - INFO  
[NIOServerCxnFactory.SelectorThread-0:NIOServerCnxnFactory$SelectorThread@420] 
- selector thread exitted run method
[junit] 2015-05-30 10:10:04,168 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3,name2=Leader]
[junit] 2015-05-30 10:10:04,168 [myid:] - INFO  
[/127.0.0.1:19444:QuorumCnxManager$Listener@659] - Leaving listener
[junit] 2015-05-30 10:10:04,168 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):QuorumPeer@1039]
 - Unexpected exception
[junit] java.lang.InterruptedException
[junit] at java.lang.Object.wait(Native Method)
[junit] at 
org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:559)
[junit] at 
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1036)
[junit] 2015-05-30 10:10:04,169 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):Leader@613] 
- Shutting down
[junit] 2015-05-30 10:10:04,169 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):QuorumPeer@1070]
 - PeerState set to LOOKING
[junit] 2015-05-30 10:10:04,169 [myid:] - WARN  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):QuorumPeer@1052]
 - QuorumPeer main thread exited
[junit] 2015-05-30 10:10:04,169 [myid:] - INFO  [main:QuorumUtil@254] - 
Shutting down leader election 
QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled)
[junit] 2015-05-30 10:10:04,170 [myid:] - INFO  [main:QuorumUtil@259] - 
Waiting for QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled) 
to exit thread
[junit] 2015-05-30 10:10:04,169 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean [org.apache.ZooKeeperService:name0=ReplicatedServer_id3]
[junit] 2015-05-30 10:10:04,170 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.3]
[junit] 2015-05-30 10:10:04,170 [myid:] - INFO  
[QuorumPeer[myid=3](plain=/0:0:0:0:0:0:0:0:19442)(secure=disabled):MBeanRegistry@119]
 - Unregister MBean 
[org.apache.ZooKeeperService:name0=ReplicatedServer_id3,name1=replica.1]

Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1

2015-05-30 Thread Flavio Junqueira

I don't see a reason to -1 the release just because of the number of threads 
junit is using. I've been a bit distracted with other things, but I'm coming 
back to the release candidate now. 

-Flavio


 On 23 May 2015, at 22:09, Michi Mutsuzaki mutsuz...@gmail.com wrote:
 
 I can go either way. Flavio, do you think we should set the default
 test.junit.threads to 1 and create another release candidate?
 
 On Fri, May 22, 2015 at 5:08 PM, Chris Nauroth cnaur...@hortonworks.com 
 wrote:
 I haven't been able to repro this locally.  Here are the details on my 
 Ubuntu VM:
 
 uname -a
 Linux ubuntu 3.16.0-30-generic #40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 
 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
 
 java -version
 java version 1.8.0_45
 Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
 Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
 
 ant -version
 Apache Ant(TM) version 1.9.4 compiled on April 29 2014
 
 I'm getting 100% passing test runs with multiple concurrent JUnit processes, 
 including the tests that you mentioned were failing in your environment.
 
 I don't have any immediate ideas for what to try next.  Everything has been 
 working well on Jenkins and multiple dev machines, so it seems like there is 
 some subtle environmental difference in this VM that I didn't handle in the 
 ZOOKEEPER-2183 patch.
 
 Is this problematic for the release candidate?  If so, then I recommend 
 doing a quick change to set the default test.junit.threads to 1 in 
 build.xml.  That would restore the old single-process testing behavior.  We 
 can change test-patch.sh to pass -Dtest.junit.threads=8 on the command line, 
 so we'll still get speedy pre-commit runs on Jenkins where it is working 
 well.  We all can do the same when we run ant locally too.  Let me know if 
 this is important, and I can put together a patch quickly.
 
 Thanks!
 
 --Chris Nauroth
 
 From: Flavio Junqueira fpjunque...@yahoo.commailto:fpjunque...@yahoo.com
 Date: Friday, May 22, 2015 at 3:37 PM
 To: Chris Nauroth cnaur...@hortonworks.commailto:cnaur...@hortonworks.com
 Cc: Zookeeper dev@zookeeper.apache.orgmailto:dev@zookeeper.apache.org
 Subject: Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1
 
 That's the range I get in the vm. I also checked the load from log test and 
 the port it was trying to bind to is 11222.
 
 -Flavio
 
 On 22 May 2015, at 23:14, Chris Nauroth 
 cnaur...@hortonworks.commailto:cnaur...@hortonworks.com wrote:
 
 No worries on the delay.  Thank you for sharing.
 
 That's interesting.  The symptoms look similar to something we had seen from 
 an earlier iteration of the ZOOKEEPER-2183 patch that was assigning ports 
 from the ephemeral port range.  This would cause a brief (but noticeable) 
 window in which the OS could assign the same ephemeral port to a client 
 socket while a server test still held onto that port assignment.  It was 
 particularly noticeable for tests that stop and restart a server on the same 
 port, such as tests covering client reconnect logic.  In the final committed 
 version of the ZOOKEEPER-2183 patch, I excluded the ephemeral port range 
 from use by port assignment.  Typically, that's 32768 - 61000 on Linux.
 
 Is it possible that this VM is configured to use a different ephemeral port 
 range?  Here is what I get from recent stock Ubuntu and CentOS installs:
 
 cat /proc/sys/net/ipv4/ip_local_port_range
 32768 61000
 
 --Chris Nauroth
 
 From: Flavio Junqueira fpjunque...@yahoo.commailto:fpjunque...@yahoo.com
 Date: Friday, May 22, 2015 at 2:47 PM
 To: Chris Nauroth cnaur...@hortonworks.commailto:cnaur...@hortonworks.com
 Cc: Zookeeper dev@zookeeper.apache.orgmailto:dev@zookeeper.apache.org
 Subject: Re: [VOTE] Apache ZooKeeper release 3.5.1-alpha candidate 1
 
 Sorry about the delay, here are the logs:
 
 http://people.apache.org/~fpj/logs-3.5.1-rc1/
 
 the load test is giving bind exceptions.
 
 -Flavio
 
 On 21 May 2015, at 23:02, Chris Nauroth 
 cnaur...@hortonworks.commailto:cnaur...@hortonworks.com wrote:
 
 Thanks, sharing logs would be great.  I'll try to repro independently with
 JDK8 too.
 
 --Chris Nauroth
 
 
 
 
 On 5/21/15, 2:30 PM, Flavio Junqueira 
 fpjunque...@yahoo.com.INVALIDmailto:fpjunque...@yahoo.com.INVALID
 wrote:
 
 I accidently removed dev from the response, bringing it back in.
 The tests are failing intermittently for me. In the last run, I got these
 failing:
 [junit] Tests run: 8, Failures: 0, Errors: 4, Skipped: 0, Time elapsed:
 30.444 sec[junit] Test org.apache.zookeeper.test.LoadFromLogTest FAILED
 [junit] Tests run: 86, Failures: 0, Errors: 2, Skipped: 0, Time elapsed:
 264.272 sec[junit] Test org.apache.zookeeper.test.NioNettySuiteTest FAILED
 Still the same setup, linux + jdk 8. I can share logs if necessary.
 -Flavio
 
 
   On Thursday, May 21, 2015 8:28 PM, Chris Nauroth
 cnaur...@hortonworks.commailto:cnaur...@hortonworks.com wrote:
 
 
 
 Ah, my mistake.  I saw Azure and my brain jumped right to Windows.
 I suppose the

[jira] [Commented] (ZOOKEEPER-2172) Cluster crashes when reconfig a new node as a participant

2015-05-30 Thread Flavio Junqueira (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566031#comment-14566031
 ] 

Flavio Junqueira commented on ZOOKEEPER-2172:
-

There are a few really weird things here. Check these notifications:

{noformat}
Notification: 2 (message format version), -9223372036854775808 (n.leader), 0x0 
(n.zxid), 0x1 (n.round), LOOKING (n.state), 3 (n.sid), 0x0 (n.peerEPoch), 
LEADING (my state)10049 (n.config version)
{noformat}

I checked the logs of 3 and does sound like it sent this notification. 

{noformat}
Sending Notification: -9223372036854775808 (n.leader), 0x0 (n.zxid), 0x1 
(n.round), 1 (recipient), 3 (myid), 0x0 (n.peerEpoch)
{noformat}

The initialization of leader election here doesn't sound right.  And, as 
[~shralex] has pointed out, 2 and 3 apparently received notifications with 
0x as the round of the sender.

{noformat}
Notification: 2 (message format version), 1 (n.leader), 0x0 (n.zxid), 
0x (n.round), LEADING (n.state), 1 (n.sid), 0x1 (n.peerEPoch), 
LOOKING (my state)10049 (n.config version)
{noformat} 

I found no evidence in the log of 1 that it has actually set or sent such a 
value. 

The values I'm seeing in the notification across logs look a bit strange.

 Cluster crashes when reconfig a new node as a participant
 -

 Key: ZOOKEEPER-2172
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2172
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, quorum, server
Affects Versions: 3.5.0
 Environment: Ubuntu 12.04 + java 7
Reporter: Ziyou Wang
Priority: Critical
 Attachments: node-1.log, node-2.log, node-3.log, 
 zoo.cfg.dynamic.1005d, zoo.cfg.dynamic.next, zookeeper-1.log, 
 zookeeper-2.log, zookeeper-3.log


 The operations are quite simple: start three zk servers one by one, then 
 reconfig the cluster to add the new one as a participant. When I add the  
 third one, the zk cluster may enter a weird state and cannot recover.
  
   I found “2015-04-20 12:53:48,236 [myid:1] - INFO  [ProcessThread(sid:1 
 cport:-1)::PrepRequestProcessor@547] - Incremental reconfig” in node-1 log. 
 So the first node received the reconfig cmd at 12:53:48. Latter, it logged 
 “2015-04-20  12:53:52,230 [myid:1] - ERROR 
 [LearnerHandler-/10.0.0.2:55890:LearnerHandler@580] - Unexpected exception 
 causing shutdown while sock still open” and “2015-04-20 12:53:52,231 [myid:1] 
 - WARN  [LearnerHandler-/10.0.0.2:55890:LearnerHandler@595] - *** GOODBYE 
  /10.0.0.2:55890 ”. From then on, the first node and second node 
 rejected all client connections and the third node didn’t join the cluster as 
 a participant. The whole cluster was done.
  
  When the problem happened, all three nodes just used the same dynamic 
 config file zoo.cfg.dynamic.1005d which only contained the first two 
 nodes. But there was another unused dynamic config file in node-1 directory 
 zoo.cfg.dynamic.next  which already contained three nodes.
  
  When I extended the waiting time between starting the third node and 
 reconfiguring the cluster, the problem didn’t show again. So it should be a 
 race condition problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566038#comment-14566038
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2098:
---

[~hdeng]: typo in the commit message, it's ZOOKEEPER-2098 not ZOOKEEPER-2198

 QuorumCnxManager: use BufferedOutputStream for initial msg
 --

 Key: ZOOKEEPER-2098
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2098
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum, server
Affects Versions: 3.5.0
Reporter: Raul Gutierrez Segales
Assignee: Raul Gutierrez Segales
 Fix For: 3.5.1, 3.6.0

 Attachments: ZOOKEEPER-2098.patch, ZOOKEEPER-2098.patch


 Whilst writing fle-dump (a tool like 
 [zk-dump|https://github.com/twitter/zktraffic/], but to dump 
 FastLeaderElection messages), I noticed that QCM is using DataOutputStream 
 (which doesn't buffer) directly.
 So all calls to write() are written immediately to the network, which means 
 simple messaages like two participants exchanging Votes can take a couple 
 RTTs! This is specially terrible for global clusters (i.e.: x-country RTTs).
 The solution is to use BufferedOutputStream for the initial negotiation 
 between members of the cluster. Note that there are other places were 
 suboptimal (but not entirely unbuffered) writes to the network still exist. 
 I'll get those in separate tickets.
 After using BufferedOutputStream we get only 1 RTT for the initial message, 
 so elections  time for for participants to join a cluster is reduced.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2189) multiple leaders can be elected when configs conflict


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14566040#comment-14566040
 ] 

Raul Gutierrez Segales commented on ZOOKEEPER-2189:
---

this message is bogus, the commit message meant to reference ZOOKEEPER-2098

 multiple leaders can be elected when configs conflict
 -

 Key: ZOOKEEPER-2189
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2189
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.5.0
Reporter: Akihiro Suda

 This sequence leads the ensemble to a split-brain state:
  * Start server 1 (config=1:participant, 2:participant, 3:participant)
  * Start server 2 (config=1:participant, 2:participant, 3:participant)
  * 1 and 2 believe 2 is the leader
  * Start server 3 (config=1:observer, 2:observer, 3:participant)
  * 3 believes 3 is the leader, although 1 and 2 still believe 2 is the leader
 Such a split-brain ensemble is very unstable.
 Znodes can be lost easily:
  * Create some znodes on 2
  * Restart 1 and 2
  * 1, 2 and 3 can think 3 is the leader
  * znodes created on 2 are lost, as 1 and 2 sync with 3
 I consider this behavior as a bug and that ZK should fail gracefully if a 
 participant is listed as an observer in the config.
 In current implementation, ZK cannot detect such an invalid config, as 
 FastLeaderElection.sendNotification() sends notifications to only voting 
 members and hence there is no message from observers(1 and 2) to the new 
 voter (3).
 I think FastLeaderElection.sendNotification() should send notifications to 
 all the members and FastLeaderElection.Messenger.WorkerReceiver.run() should 
 verify acks.
 Any thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (ZOOKEEPER-2098) QuorumCnxManager: use BufferedOutputStream for initial msg