[jira] [Commented] (ZOOKEEPER-3018) Ephemeral node not deleted after session is gone

2018-04-13 Thread Daniel C (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438140#comment-16438140
 ] 

Daniel C commented on ZOOKEEPER-3018:
-

We have another incident of same issue reported.

This session owns this ephemeral node:

0x162a60381242e0c:
    
/consumers/cosStepExecutors/ids/cosStepExecutors_1523203806120_den01fkd.us.oracle.com-1523203806187-7327e8bb

 

Doing "stat 
/consumers/cosStepExecutors/ids/cosStepExecutors_1523203806120_den01fkd.us.oracle.com-1523203806187-7327e8bb":

czxid: 40609
mzxid: 40609
pzxid: 40609
ctime: 1523536225711 (2018-04-12T05:30:25.711-0700)
mtime: 1523536225711 (2018-04-12T05:30:25.711-0700)
version: 0
cversion: 0
aversion: 0
owner: 99824675737316876
datalen: 123
children: 0

 

"cons" indicated the session is still there:

echo cons | nc den01fkd 2181 | grep 162a60381242e0c
 
/10.196.42.142:22166[1](queued=0,recved=549,sent=549,sid=0x162a60381242e0c,lop=PING,est=1523536225314,to=9000,lcxid=0x8f8c,lzxid=0xd8ed,lresp=1523665779797,llat=0,minlat=0,avglat=0,maxlat=1)

 

However, SessionTracker "dump" indicated the session is already expired:

10 expire at Sat Apr 14 00:15:08 UTC 2018:
    .
    .
    0x162a60381242e0c

 

It seems session states are not consistent in Zookeeper causing the issue.

[~andorm], could you please have someone to take a look as zookeeper breaks the 
ephemeral node behavior?

 

 

> Ephemeral node not deleted after session is gone
> 
>
> Key: ZOOKEEPER-3018
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 
> GNU/Linux
>Reporter: Daniel C
>Priority: Major
>
> We have a live Zookeeper environment (quorum size is 2) and observed a 
> strange behavior:
>  * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and 
> /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>  * The Kafka clients were long gone but as of today (20+ days after), the two 
> ephemeral nodes are still present
>  
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
>  
> {noformat}
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>    /brokers/ids/822712429
> 0x162183ea9f70002:
>    /brokers/ids/707577499
>    /controller
> {noformat}
>  
>  
> 2) stat on /brokers/ids/822712429
>  
> {noformat}
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
> {noformat}
>  
>  
> 3) List full connection/session details for all clients connected
>  
> {noformat}
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  
> /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=4,lcxid=0x12,lzxid=0x,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
> {noformat}
>  
>  
>  
> {noformat}
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  
> /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
> {noformat}
>  
>  
> 4) health
>  
> {noformat}
> $ echo mntr | nc $SERVER1 2181
> zk_version   3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency 443
> zk_min_latency  0
> zk_packets_received   11158019
> zk_packets_sent   11158244
> zk_num_alive_connections   2
> zk_outstanding_requests  0
> zk_server_state follower
> zk_znode_count   344
> zk_watch_count   0
> zk_ephemerals_count 3
> zk_approximate_data_size  36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count 65536
> {noformat}
>  
>  
> 5) Server logs with related sessions:
> {noformat}
> Only found these logs from Server1 related to the sessions (0x162183ea9f70002 
> and 0x162183ea9f70003):
> 2018-03-12 03:28:35,127 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.196.18.60:26775
> 2018-03-12 03:28:35,131 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.196.18.60:26775; will be dropped if server 

[GitHub] zookeeper issue #487: ZOOKEEPER-2994 Tool required to recover log and snapsh...

2018-04-13 Thread phunt
Github user phunt commented on the issue:

https://github.com/apache/zookeeper/pull/487
  
Looks promising - doesn't seem very useful (and potentially dangerous) 
without docs - perhaps add a troubleshooting or recovery section here? 

http://zookeeper.apache.org/doc/r3.4.11/zookeeperAdmin.html#sc_dataFileManagement
The jira was original for 3.5+, I think this would be great to get into 
3.4+.


---


[jira] [Updated] (ZOOKEEPER-2994) Tool required to recover log and snapshot entries with CRC errors

2018-04-13 Thread Patrick Hunt (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt updated ZOOKEEPER-2994:

Fix Version/s: 3.4.13

> Tool required to recover log and snapshot entries with CRC errors
> -
>
> Key: ZOOKEEPER-2994
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2994
> Project: ZooKeeper
>  Issue Type: New Feature
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: 3.5.4, 3.6.0, 3.4.13
>
>
> In the even that the zookeeper transaction log or snapshot become corrupted 
> and fail CRC checks (preventing startup) we should have a mechanism to get 
> the cluster running again.
> Previously we achieved this by loading the broken transaction log with a 
> modified version of ZK with disabled CRC check and forced it to snapshot.
> It'd very handy to have a tool which can do this for us. LogFormatter and 
> SnapshotFormatter have already been designed to dump log and snapshot files, 
> it'd be nice to extend their functionality and add ability for such recovery.
> It has proven that once you end up with the corrupt txn log there is no way 
> to recover except manually modifying the crc check. That's basically why the 
> tool is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2415) SessionTest is using Thread deprecated API.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438036#comment-16438036
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2415:
---

Github user phunt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/497#discussion_r181526076
  
--- Diff: src/java/test/org/apache/zookeeper/test/SessionTimeoutTest.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.DisconnectableZooKeeper;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.data.Stat;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+
+public class SessionTimeoutTest extends ZKTestCase {
--- End diff --

Is there a reason why we can't use ClientBase instead of ZKTestCase?


> SessionTest is using Thread deprecated API.
> ---
>
> Key: ZOOKEEPER-2415
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2415
> Project: ZooKeeper
>  Issue Type: Test
>  Components: tests
>Affects Versions: 3.4.8, 3.5.1, 3.6.0
>Reporter: Flavio Junqueira
>Assignee: Andor Molnar
>Priority: Major
>
> The test class is using calls such as {{Thread.suspend}} and 
> {{Thread.resume}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #497: ZOOKEEPER-2415. SessionTest is using Thread dep...

2018-04-13 Thread phunt
Github user phunt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/497#discussion_r181526076
  
--- Diff: src/java/test/org/apache/zookeeper/test/SessionTimeoutTest.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.DisconnectableZooKeeper;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.data.Stat;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+
+public class SessionTimeoutTest extends ZKTestCase {
--- End diff --

Is there a reason why we can't use ClientBase instead of ZKTestCase?


---


[jira] [Commented] (ZOOKEEPER-2415) SessionTest is using Thread deprecated API.

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16438034#comment-16438034
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2415:
---

Github user phunt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/497#discussion_r181525743
  
--- Diff: src/java/test/org/apache/zookeeper/test/SessionTimeoutTest.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.DisconnectableZooKeeper;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.data.Stat;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+
+public class SessionTimeoutTest extends ZKTestCase {
+protected static final Logger LOG = 
LoggerFactory.getLogger(SessionTest.class);
--- End diff --

Should be SessionTimeoutTest rather than SessionTest.


> SessionTest is using Thread deprecated API.
> ---
>
> Key: ZOOKEEPER-2415
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2415
> Project: ZooKeeper
>  Issue Type: Test
>  Components: tests
>Affects Versions: 3.4.8, 3.5.1, 3.6.0
>Reporter: Flavio Junqueira
>Assignee: Andor Molnar
>Priority: Major
>
> The test class is using calls such as {{Thread.suspend}} and 
> {{Thread.resume}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #497: ZOOKEEPER-2415. SessionTest is using Thread dep...

2018-04-13 Thread phunt
Github user phunt commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/497#discussion_r181525743
  
--- Diff: src/java/test/org/apache/zookeeper/test/SessionTimeoutTest.java 
---
@@ -0,0 +1,188 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.DisconnectableZooKeeper;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.PortAssignment;
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.ZKTestCase;
+import org.apache.zookeeper.ZooDefs;
+import org.apache.zookeeper.data.Stat;
+import org.apache.zookeeper.server.ServerCnxnFactory;
+import org.apache.zookeeper.server.ZooKeeperServer;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.TimeUnit;
+
+import static org.apache.zookeeper.test.ClientBase.CONNECTION_TIMEOUT;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+
+public class SessionTimeoutTest extends ZKTestCase {
+protected static final Logger LOG = 
LoggerFactory.getLogger(SessionTest.class);
--- End diff --

Should be SessionTimeoutTest rather than SessionTest.


---


Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-13 Thread Abraham Fine
Thanks for following up Alex.


On Fri, Apr 13, 2018, at 14:48, Alexander Shraer wrote:
> We discussed with Pat offline and agreed to go without this patch,
> especially since we need to patch 3 branches: 3.4, 3.5 and master.> We'll 
> prepare 3.5 and master and then commit all 3 together in time
> for the next release. So Abe, please go ahead with your release.> 
> Alex
> 
> On Fri, Apr 13, 2018 at 2:26 PM, Patrick Hunt
>  wrote:>> Hey folks. I've been on vacation. My 0.02 - given 
> the release
>> candidate is>>  well underway, has sufficient votes/time to finalize, this 
>> is not a>>  regression in 3.4.12 and it's not yet committed I would think we
>>  finalize/push 3.4.12 then quickly followup with a 3.4.13 that
>>  addresses>>  this. Alex could be the RM given his interest/advocacy.
>> 
>>  Regards,
>> 
>>  Patrick
>> 
>> 
>> On Fri, Apr 13, 2018 at 11:55 AM, Abraham Fine
>>  wrote:>> 
>>  > Given that the primary driver of this release is to fix an issue
>>  > with the>>  > misuse of dataDir and dataLogDir I would rather see this 
>> release
>>  > make it>>  > out the door with minimal additional changes to core
>>  > functionality so>>  > people can more confidently upgrade.
>>  >
>>  > What do you think Pat?
>>  >
>>  > Abe
>>  >
>>  > On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
>>  > > Now that we have the fix, why delay it to next release?
>>  > >
>>  > > On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine 
>>  > > wrote:>>  > >
>>  > > > Let's wait until the next release to include this fix.
>>  > > >
>>  > > > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
>>  > > > > Hi,
>>  > > > >
>>  > > > > Please take a look on the new PR for ZK-2959:
>>  > > > > https://github.com/apache/zookeeper/pull/500
>>  > > > > If there are no further comments, I can commit it.
>>  > > > >
>>  > > > > Thanks,
>>  > > > > Alex
>>  > > > >
>>  > > > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer
>>  > > > > >  > >
>>  > > > wrote:
>>  > > > >
>>  > > > > > Hi,
>>  > > > > >
>>  > > > > > The bug described in  ZOOKEEPER-2959
>>  > > > > >   is
>>  > > > > > that>>  > > > > > getEpochToPropose an waitForEpochAck do not 
>> distinguish
>>  > > > > > between>>  > > > followers
>>  > > > > > and observers.
>>  > > > > > This can cause a candidate leader's acceptedEpoch to be
>>  > > > > > updated>>  > with
>>  > > > only
>>  > > > > > support from observers. Same for waitForEpochAck - passing
>>  > > > > > this>>  > method
>>  > > > > > allows the candidate leader to update the currentEpoch.
>>  > > > > > The latter>>  > > > helps
>>  > > > > > this server to win FLE elections continuously, and the
>>  > > > > > former>>  > > > > > (acceptedEpoch)
>>  > > > > > causes anyone trying to connect to the server to think
>>  > > > > > that it has>>  > more
>>  > > > > > up-to-date data and trucate their logs to match.
>>  > > > > >
>>  > > > > >
>>  > > > > > Alex
>>  > > > > >
>>  > > > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv
>>  > > > > > >>  > > > wrote:
>>  > > > > >
>>  > > > > >> Hi Alex,
>>  > > > > >>
>>  > > > > >> Can you give more details about the data loss scenario in
>>  > > > > >> Jira>>  > > > > >> ZOOKEEPER-2959 >  > jira/browse/ZOOKEEPER-2959
>>  > > > >?
>>  > > > > >> As far as I know, the leader will ignore the observers'
>>  > > > > >> ACK in>>  > > > > >> waitForNewLeaderAck, so it will not start 
>> serve traffic
>>  > > > > >> until it>>  > > > received
>>  > > > > >> the actual quorum ACK, if it doesn't have enough
>>  > > > > >> followers support>>  > > > before
>>  > > > > >> timeout, it will quit leading and it's learners will re-
>>  > > > > >> sync with>>  > new
>>  > > > > >> leader.
>>  > > > > >>
>>  > > > > >> Thanks,
>>  > > > > >> Fangmin
>>  > > > > >>
>>  > > > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <
>>  > shra...@gmail.com>
>>  > > > > >> wrote:
>>  > > > > >>
>>  > > > > >>> Btw we actually observed the described issue (data
>>  > > > > >>> loss),>>  > thankfully
>>  > > > in a
>>  > > > > >>> test environment. So I thought this is important to
>>  > > > > >>> share with>>  > the
>>  > > > > >>> community.
>>  > > > > >>>
>>  > > > > >>> Unfortunately I don’t have time to run a new ZK release
>>  > > > > >>> for>>  > this, so
>>  > > > I’m
>>  > > > > >>> not going to -1 your candidate, but we are actively
>>  > > > > >>> working on a>>  > fix
>>  > > > (ie
>>  > > > > >>> a
>>  > > > > >>> test at this point) and I can commit that as soon as we
>>  > > > > >>> have>>  > that.
>>  > > > > >>>
>>  > > > > >>> It may be worth while to delay the release by a few more
>>  > > > > >>> days,>>  > but
>>  > > > it’s
>>  > > > > >>> totally up to you since you’re running it.
>>  > > > > >>>
>>  > > > > >>> Cheers
>>  > > > > >>> Alex
>>  > > > > >>> On Thu, Apr 5, 2018 at 

Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-13 Thread Alexander Shraer
We discussed with Pat offline and agreed to go without this patch,
especially since we need to patch 3 branches: 3.4, 3.5 and master.
We'll prepare 3.5 and master and then commit all 3 together in time for the
next release. So Abe, please go ahead with your release.

Alex

On Fri, Apr 13, 2018 at 2:26 PM, Patrick Hunt  wrote:

> Hey folks. I've been on vacation. My 0.02 - given the release candidate is
> well underway, has sufficient votes/time to finalize, this is not a
> regression in 3.4.12 and it's not yet committed I would think we
> finalize/push 3.4.12 then quickly followup with a 3.4.13 that addresses
> this. Alex could be the RM given his interest/advocacy.
>
> Regards,
>
> Patrick
>
> On Fri, Apr 13, 2018 at 11:55 AM, Abraham Fine  wrote:
>
> > Given that the primary driver of this release is to fix an issue with the
> > misuse of dataDir and dataLogDir I would rather see this release make it
> > out the door with minimal additional changes to core functionality so
> > people can more confidently upgrade.
> >
> > What do you think Pat?
> >
> > Abe
> >
> > On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
> > > Now that we have the fix, why delay it to next release?
> > >
> > > On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine 
> wrote:
> > >
> > > > Let's wait until the next release to include this fix.
> > > >
> > > > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > > > > Hi,
> > > > >
> > > > > Please take a look on the new PR for ZK-2959:
> > > > > https://github.com/apache/zookeeper/pull/500
> > > > > If there are no further comments, I can commit it.
> > > > >
> > > > > Thanks,
> > > > > Alex
> > > > >
> > > > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer <
> shra...@gmail.com
> > >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > The bug described in  ZOOKEEPER-2959
> > > > > >   is that
> > > > > > getEpochToPropose an waitForEpochAck do not distinguish between
> > > > followers
> > > > > > and observers.
> > > > > > This can cause a candidate leader's acceptedEpoch to be updated
> > with
> > > > only
> > > > > > support from observers. Same for waitForEpochAck - passing this
> > method
> > > > > > allows the candidate leader to update the currentEpoch. The
> latter
> > > > helps
> > > > > > this server to win FLE elections continuously, and the former
> > > > > > (acceptedEpoch)
> > > > > > causes anyone trying to connect to the server to think that it
> has
> > more
> > > > > > up-to-date data and trucate their logs to match.
> > > > > >
> > > > > >
> > > > > > Alex
> > > > > >
> > > > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv  >
> > > > wrote:
> > > > > >
> > > > > >> Hi Alex,
> > > > > >>
> > > > > >> Can you give more details about the data loss scenario in Jira
> > > > > >> ZOOKEEPER-2959  > jira/browse/ZOOKEEPER-2959
> > > > >?
> > > > > >> As far as I know, the leader will ignore the observers' ACK in
> > > > > >> waitForNewLeaderAck, so it will not start serve traffic until it
> > > > received
> > > > > >> the actual quorum ACK, if it doesn't have enough followers
> support
> > > > before
> > > > > >> timeout, it will quit leading and it's learners will re-sync
> with
> > new
> > > > > >> leader.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Fangmin
> > > > > >>
> > > > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <
> > shra...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Btw we actually observed the described issue (data loss),
> > thankfully
> > > > in a
> > > > > >>> test environment. So I thought this is important to share with
> > the
> > > > > >>> community.
> > > > > >>>
> > > > > >>> Unfortunately I don’t have time to run a new ZK release for
> > this, so
> > > > I’m
> > > > > >>> not going to -1 your candidate, but we are actively working on
> a
> > fix
> > > > (ie
> > > > > >>> a
> > > > > >>> test at this point) and I can commit that as soon as we have
> > that.
> > > > > >>>
> > > > > >>> It may be worth while to delay the release by a few more days,
> > but
> > > > it’s
> > > > > >>> totally up to you since you’re running it.
> > > > > >>>
> > > > > >>> Cheers
> > > > > >>> Alex
> > > > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar <
> an...@cloudera.com
> > >
> > > > wrote:
> > > > > >>>
> > > > > >>> > Got that. I still believe it's a completely valid issue which
> > has
> > > > to be
> > > > > >>> > addressed, but it's not a showstopper. I'm afraid we're not
> > going
> > > > to
> > > > > >>> > convince each other, so it's probably Abe's call if he want
> to
> > > > create
> > > > > >>> > another release candidate for the fix.
> > > > > >>> >
> > > > > >>> > I reviewed the code on github and I think it just needs to be
> > > > covered
> > > > > >>> with
> > > > > >>> > a unit test to be complete.
> > > > > >>> >
> > > > > >>> > Regards,
> > 

Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-13 Thread Patrick Hunt
Hey folks. I've been on vacation. My 0.02 - given the release candidate is
well underway, has sufficient votes/time to finalize, this is not a
regression in 3.4.12 and it's not yet committed I would think we
finalize/push 3.4.12 then quickly followup with a 3.4.13 that addresses
this. Alex could be the RM given his interest/advocacy.

Regards,

Patrick

On Fri, Apr 13, 2018 at 11:55 AM, Abraham Fine  wrote:

> Given that the primary driver of this release is to fix an issue with the
> misuse of dataDir and dataLogDir I would rather see this release make it
> out the door with minimal additional changes to core functionality so
> people can more confidently upgrade.
>
> What do you think Pat?
>
> Abe
>
> On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
> > Now that we have the fix, why delay it to next release?
> >
> > On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine  wrote:
> >
> > > Let's wait until the next release to include this fix.
> > >
> > > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > > > Hi,
> > > >
> > > > Please take a look on the new PR for ZK-2959:
> > > > https://github.com/apache/zookeeper/pull/500
> > > > If there are no further comments, I can commit it.
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer  >
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The bug described in  ZOOKEEPER-2959
> > > > >   is that
> > > > > getEpochToPropose an waitForEpochAck do not distinguish between
> > > followers
> > > > > and observers.
> > > > > This can cause a candidate leader's acceptedEpoch to be updated
> with
> > > only
> > > > > support from observers. Same for waitForEpochAck - passing this
> method
> > > > > allows the candidate leader to update the currentEpoch. The latter
> > > helps
> > > > > this server to win FLE elections continuously, and the former
> > > > > (acceptedEpoch)
> > > > > causes anyone trying to connect to the server to think that it has
> more
> > > > > up-to-date data and trucate their logs to match.
> > > > >
> > > > >
> > > > > Alex
> > > > >
> > > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv 
> > > wrote:
> > > > >
> > > > >> Hi Alex,
> > > > >>
> > > > >> Can you give more details about the data loss scenario in Jira
> > > > >> ZOOKEEPER-2959  jira/browse/ZOOKEEPER-2959
> > > >?
> > > > >> As far as I know, the leader will ignore the observers' ACK in
> > > > >> waitForNewLeaderAck, so it will not start serve traffic until it
> > > received
> > > > >> the actual quorum ACK, if it doesn't have enough followers support
> > > before
> > > > >> timeout, it will quit leading and it's learners will re-sync with
> new
> > > > >> leader.
> > > > >>
> > > > >> Thanks,
> > > > >> Fangmin
> > > > >>
> > > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer <
> shra...@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >>> Btw we actually observed the described issue (data loss),
> thankfully
> > > in a
> > > > >>> test environment. So I thought this is important to share with
> the
> > > > >>> community.
> > > > >>>
> > > > >>> Unfortunately I don’t have time to run a new ZK release for
> this, so
> > > I’m
> > > > >>> not going to -1 your candidate, but we are actively working on a
> fix
> > > (ie
> > > > >>> a
> > > > >>> test at this point) and I can commit that as soon as we have
> that.
> > > > >>>
> > > > >>> It may be worth while to delay the release by a few more days,
> but
> > > it’s
> > > > >>> totally up to you since you’re running it.
> > > > >>>
> > > > >>> Cheers
> > > > >>> Alex
> > > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar  >
> > > wrote:
> > > > >>>
> > > > >>> > Got that. I still believe it's a completely valid issue which
> has
> > > to be
> > > > >>> > addressed, but it's not a showstopper. I'm afraid we're not
> going
> > > to
> > > > >>> > convince each other, so it's probably Abe's call if he want to
> > > create
> > > > >>> > another release candidate for the fix.
> > > > >>> >
> > > > >>> > I reviewed the code on github and I think it just needs to be
> > > covered
> > > > >>> with
> > > > >>> > a unit test to be complete.
> > > > >>> >
> > > > >>> > Regards,
> > > > >>> > Andor
> > > > >>> >
> > > > >>> >
> > > > >>> >
> > > > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <
> > > shra...@gmail.com>
> > > > >>> > wrote:
> > > > >>> >
> > > > >>> > > Yes sort of, FLE is finished, then enough observer's messages
> > > reach
> > > > >>> the
> > > > >>> > > leader before participant's messages do.
> > > > >>> > > Whether its rare depends on the number of observers and
> > > > >>> participants. For
> > > > >>> > > example with very few participants and many observers
> > > > >>> > > your chance of hitting this are quite high.
> > > > >>> > >
> > > > >>> > > Alex
> > > > >>> > >
> 

ZooKeeper-trunk-openjdk7 - Build # 1864 - Failure

2018-04-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk-openjdk7/1864/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 62.06 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.749 sec, Thread: 1, Class: 
org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
31.022 sec, Thread: 6, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.827 sec, Thread: 5, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 1
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
6
[junit] Running org.apache.zookeeper.test.SessionTest in thread 5
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.027 sec, Thread: 6, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
6
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.102 sec, Thread: 6, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.455 sec, Thread: 1, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.603 sec, Thread: 1, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.247 sec, Thread: 1, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 1
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.227 sec, Thread: 1, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.078 sec, Thread: 1, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 1
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.861 sec, Thread: 1, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 1
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
23.041 sec, Thread: 6, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.779 sec, Thread: 1, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 1
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 6
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.126 sec, Thread: 6, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 6
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.989 sec, Thread: 6, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 6
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.415 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 5
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.103 sec, Thread: 5, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 5
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.309 sec, Thread: 5, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.365 sec, Thread: 1, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.786 sec, Thread: 5, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
31.892 sec, Thread: 6, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
392.762 sec, Thread: 3, Class: org.apache.zookeeper.test.DisconnectedWatcherTest
[junit] Tests run: 105, Failures: 0, Errors: 0, 

Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-13 Thread Abraham Fine
Given that the primary driver of this release is to fix an issue with the 
misuse of dataDir and dataLogDir I would rather see this release make it out 
the door with minimal additional changes to core functionality so people can 
more confidently upgrade. 

What do you think Pat?

Abe

On Fri, Apr 13, 2018, at 11:37, Alexander Shraer wrote:
> Now that we have the fix, why delay it to next release?
> 
> On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine  wrote:
> 
> > Let's wait until the next release to include this fix.
> >
> > On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > > Hi,
> > >
> > > Please take a look on the new PR for ZK-2959:
> > > https://github.com/apache/zookeeper/pull/500
> > > If there are no further comments, I can commit it.
> > >
> > > Thanks,
> > > Alex
> > >
> > > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer 
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > The bug described in  ZOOKEEPER-2959
> > > >   is that
> > > > getEpochToPropose an waitForEpochAck do not distinguish between
> > followers
> > > > and observers.
> > > > This can cause a candidate leader's acceptedEpoch to be updated with
> > only
> > > > support from observers. Same for waitForEpochAck - passing this method
> > > > allows the candidate leader to update the currentEpoch. The latter
> > helps
> > > > this server to win FLE elections continuously, and the former
> > > > (acceptedEpoch)
> > > > causes anyone trying to connect to the server to think that it has more
> > > > up-to-date data and trucate their logs to match.
> > > >
> > > >
> > > > Alex
> > > >
> > > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv 
> > wrote:
> > > >
> > > >> Hi Alex,
> > > >>
> > > >> Can you give more details about the data loss scenario in Jira
> > > >> ZOOKEEPER-2959  > >?
> > > >> As far as I know, the leader will ignore the observers' ACK in
> > > >> waitForNewLeaderAck, so it will not start serve traffic until it
> > received
> > > >> the actual quorum ACK, if it doesn't have enough followers support
> > before
> > > >> timeout, it will quit leading and it's learners will re-sync with new
> > > >> leader.
> > > >>
> > > >> Thanks,
> > > >> Fangmin
> > > >>
> > > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer 
> > > >> wrote:
> > > >>
> > > >>> Btw we actually observed the described issue (data loss), thankfully
> > in a
> > > >>> test environment. So I thought this is important to share with the
> > > >>> community.
> > > >>>
> > > >>> Unfortunately I don’t have time to run a new ZK release for this, so
> > I’m
> > > >>> not going to -1 your candidate, but we are actively working on a fix
> > (ie
> > > >>> a
> > > >>> test at this point) and I can commit that as soon as we have that.
> > > >>>
> > > >>> It may be worth while to delay the release by a few more days, but
> > it’s
> > > >>> totally up to you since you’re running it.
> > > >>>
> > > >>> Cheers
> > > >>> Alex
> > > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar 
> > wrote:
> > > >>>
> > > >>> > Got that. I still believe it's a completely valid issue which has
> > to be
> > > >>> > addressed, but it's not a showstopper. I'm afraid we're not going
> > to
> > > >>> > convince each other, so it's probably Abe's call if he want to
> > create
> > > >>> > another release candidate for the fix.
> > > >>> >
> > > >>> > I reviewed the code on github and I think it just needs to be
> > covered
> > > >>> with
> > > >>> > a unit test to be complete.
> > > >>> >
> > > >>> > Regards,
> > > >>> > Andor
> > > >>> >
> > > >>> >
> > > >>> >
> > > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <
> > shra...@gmail.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Yes sort of, FLE is finished, then enough observer's messages
> > reach
> > > >>> the
> > > >>> > > leader before participant's messages do.
> > > >>> > > Whether its rare depends on the number of observers and
> > > >>> participants. For
> > > >>> > > example with very few participants and many observers
> > > >>> > > your chance of hitting this are quite high.
> > > >>> > >
> > > >>> > > Alex
> > > >>> > >
> > > >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <
> > an...@cloudera.com>
> > > >>> > wrote:
> > > >>> > >
> > > >>> > > > Maybe I'm missing something here, but this looks like a rare
> > edge
> > > >>> case
> > > >>> > to
> > > >>> > > > me. Participants must finish the leader election successfully
> > and
> > > >>> right
> > > >>> > > > after enough followers should fail to send epoch to the
> > leader, so
> > > >>> > > > observers can take it over.
> > > >>> > > >
> > > >>> > > > Is that description accurate?
> > > >>> > > >
> > > >>> > > > Andor
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <
> > > >>> shra...@gmail.com>
> > > >>> > > 

Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-13 Thread Alexander Shraer
Now that we have the fix, why delay it to next release?

On Fri, Apr 13, 2018 at 11:09 AM Abraham Fine  wrote:

> Let's wait until the next release to include this fix.
>
> On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> > Hi,
> >
> > Please take a look on the new PR for ZK-2959:
> > https://github.com/apache/zookeeper/pull/500
> > If there are no further comments, I can commit it.
> >
> > Thanks,
> > Alex
> >
> > On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer 
> wrote:
> >
> > > Hi,
> > >
> > > The bug described in  ZOOKEEPER-2959
> > >   is that
> > > getEpochToPropose an waitForEpochAck do not distinguish between
> followers
> > > and observers.
> > > This can cause a candidate leader's acceptedEpoch to be updated with
> only
> > > support from observers. Same for waitForEpochAck - passing this method
> > > allows the candidate leader to update the currentEpoch. The latter
> helps
> > > this server to win FLE elections continuously, and the former
> > > (acceptedEpoch)
> > > causes anyone trying to connect to the server to think that it has more
> > > up-to-date data and trucate their logs to match.
> > >
> > >
> > > Alex
> > >
> > > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv 
> wrote:
> > >
> > >> Hi Alex,
> > >>
> > >> Can you give more details about the data loss scenario in Jira
> > >> ZOOKEEPER-2959  >?
> > >> As far as I know, the leader will ignore the observers' ACK in
> > >> waitForNewLeaderAck, so it will not start serve traffic until it
> received
> > >> the actual quorum ACK, if it doesn't have enough followers support
> before
> > >> timeout, it will quit leading and it's learners will re-sync with new
> > >> leader.
> > >>
> > >> Thanks,
> > >> Fangmin
> > >>
> > >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer 
> > >> wrote:
> > >>
> > >>> Btw we actually observed the described issue (data loss), thankfully
> in a
> > >>> test environment. So I thought this is important to share with the
> > >>> community.
> > >>>
> > >>> Unfortunately I don’t have time to run a new ZK release for this, so
> I’m
> > >>> not going to -1 your candidate, but we are actively working on a fix
> (ie
> > >>> a
> > >>> test at this point) and I can commit that as soon as we have that.
> > >>>
> > >>> It may be worth while to delay the release by a few more days, but
> it’s
> > >>> totally up to you since you’re running it.
> > >>>
> > >>> Cheers
> > >>> Alex
> > >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar 
> wrote:
> > >>>
> > >>> > Got that. I still believe it's a completely valid issue which has
> to be
> > >>> > addressed, but it's not a showstopper. I'm afraid we're not going
> to
> > >>> > convince each other, so it's probably Abe's call if he want to
> create
> > >>> > another release candidate for the fix.
> > >>> >
> > >>> > I reviewed the code on github and I think it just needs to be
> covered
> > >>> with
> > >>> > a unit test to be complete.
> > >>> >
> > >>> > Regards,
> > >>> > Andor
> > >>> >
> > >>> >
> > >>> >
> > >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer <
> shra...@gmail.com>
> > >>> > wrote:
> > >>> >
> > >>> > > Yes sort of, FLE is finished, then enough observer's messages
> reach
> > >>> the
> > >>> > > leader before participant's messages do.
> > >>> > > Whether its rare depends on the number of observers and
> > >>> participants. For
> > >>> > > example with very few participants and many observers
> > >>> > > your chance of hitting this are quite high.
> > >>> > >
> > >>> > > Alex
> > >>> > >
> > >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar <
> an...@cloudera.com>
> > >>> > wrote:
> > >>> > >
> > >>> > > > Maybe I'm missing something here, but this looks like a rare
> edge
> > >>> case
> > >>> > to
> > >>> > > > me. Participants must finish the leader election successfully
> and
> > >>> right
> > >>> > > > after enough followers should fail to send epoch to the
> leader, so
> > >>> > > > observers can take it over.
> > >>> > > >
> > >>> > > > Is that description accurate?
> > >>> > > >
> > >>> > > > Andor
> > >>> > > >
> > >>> > > >
> > >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <
> > >>> shra...@gmail.com>
> > >>> > > > wrote:
> > >>> > > >
> > >>> > > > > To clarify - in a deployment with observers this bug can
> > >>> potentially
> > >>> > > > cause
> > >>> > > > > data loss. A server could be elected leader based just on the
> > >>> support
> > >>> > > of
> > >>> > > > > observers, even if this servers data is stale wrt other
> > >>> followers.
> > >>> > > > >
> > >>> > > > > It is certainly a blocker, just not sure if for 3.4.11 or
> 3.4.12.
> > >>> > > > >
> > >>> > > > >
> > >>> > > > > Alex
> > >>> > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar <
> an...@cloudera.com
> > >>> >
> > >>> > > wrote:
> > >>> > > > >
> 

Re: [VOTE] Apache ZooKeeper release 3.4.12 candidate 1

2018-04-13 Thread Abraham Fine
Let's wait until the next release to include this fix. 

On Mon, Apr 9, 2018, at 15:14, Alexander Shraer wrote:
> Hi,
> 
> Please take a look on the new PR for ZK-2959:
> https://github.com/apache/zookeeper/pull/500
> If there are no further comments, I can commit it.
> 
> Thanks,
> Alex
> 
> On Fri, Apr 6, 2018 at 11:33 AM, Alexander Shraer  wrote:
> 
> > Hi,
> >
> > The bug described in  ZOOKEEPER-2959
> >   is that
> > getEpochToPropose an waitForEpochAck do not distinguish between followers
> > and observers.
> > This can cause a candidate leader's acceptedEpoch to be updated with only
> > support from observers. Same for waitForEpochAck - passing this method
> > allows the candidate leader to update the currentEpoch. The latter helps
> > this server to win FLE elections continuously, and the former
> > (acceptedEpoch)
> > causes anyone trying to connect to the server to think that it has more
> > up-to-date data and trucate their logs to match.
> >
> >
> > Alex
> >
> > On Fri, Apr 6, 2018 at 10:04 AM, Fangmin Lv  wrote:
> >
> >> Hi Alex,
> >>
> >> Can you give more details about the data loss scenario in Jira
> >> ZOOKEEPER-2959 ?
> >> As far as I know, the leader will ignore the observers' ACK in
> >> waitForNewLeaderAck, so it will not start serve traffic until it received
> >> the actual quorum ACK, if it doesn't have enough followers support before
> >> timeout, it will quit leading and it's learners will re-sync with new
> >> leader.
> >>
> >> Thanks,
> >> Fangmin
> >>
> >> On Thu, Apr 5, 2018 at 12:57 PM, Alexander Shraer 
> >> wrote:
> >>
> >>> Btw we actually observed the described issue (data loss), thankfully in a
> >>> test environment. So I thought this is important to share with the
> >>> community.
> >>>
> >>> Unfortunately I don’t have time to run a new ZK release for this, so I’m
> >>> not going to -1 your candidate, but we are actively working on a fix (ie
> >>> a
> >>> test at this point) and I can commit that as soon as we have that.
> >>>
> >>> It may be worth while to delay the release by a few more days, but it’s
> >>> totally up to you since you’re running it.
> >>>
> >>> Cheers
> >>> Alex
> >>> On Thu, Apr 5, 2018 at 12:47 PM Andor Molnar  wrote:
> >>>
> >>> > Got that. I still believe it's a completely valid issue which has to be
> >>> > addressed, but it's not a showstopper. I'm afraid we're not going to
> >>> > convince each other, so it's probably Abe's call if he want to create
> >>> > another release candidate for the fix.
> >>> >
> >>> > I reviewed the code on github and I think it just needs to be covered
> >>> with
> >>> > a unit test to be complete.
> >>> >
> >>> > Regards,
> >>> > Andor
> >>> >
> >>> >
> >>> >
> >>> > On Thu, Apr 5, 2018 at 9:05 PM, Alexander Shraer 
> >>> > wrote:
> >>> >
> >>> > > Yes sort of, FLE is finished, then enough observer's messages reach
> >>> the
> >>> > > leader before participant's messages do.
> >>> > > Whether its rare depends on the number of observers and
> >>> participants. For
> >>> > > example with very few participants and many observers
> >>> > > your chance of hitting this are quite high.
> >>> > >
> >>> > > Alex
> >>> > >
> >>> > > On Thu, Apr 5, 2018 at 11:44 AM, Andor Molnar 
> >>> > wrote:
> >>> > >
> >>> > > > Maybe I'm missing something here, but this looks like a rare edge
> >>> case
> >>> > to
> >>> > > > me. Participants must finish the leader election successfully and
> >>> right
> >>> > > > after enough followers should fail to send epoch to the leader, so
> >>> > > > observers can take it over.
> >>> > > >
> >>> > > > Is that description accurate?
> >>> > > >
> >>> > > > Andor
> >>> > > >
> >>> > > >
> >>> > > > On Thu, Apr 5, 2018 at 7:35 PM, Alexander Shraer <
> >>> shra...@gmail.com>
> >>> > > > wrote:
> >>> > > >
> >>> > > > > To clarify - in a deployment with observers this bug can
> >>> potentially
> >>> > > > cause
> >>> > > > > data loss. A server could be elected leader based just on the
> >>> support
> >>> > > of
> >>> > > > > observers, even if this servers data is stale wrt other
> >>> followers.
> >>> > > > >
> >>> > > > > It is certainly a blocker, just not sure if for 3.4.11 or 3.4.12.
> >>> > > > >
> >>> > > > >
> >>> > > > > Alex
> >>> > > > > On Thu, Apr 5, 2018 at 10:29 AM Andor Molnar  >>> >
> >>> > > wrote:
> >>> > > > >
> >>> > > > > > I don't think it's a blocker.
> >>> > > > > > The jira and PR has been open since last December and 3.4.11
> >>> has
> >>> > > > released
> >>> > > > > > without it.
> >>> > > > > >
> >>> > > > > > Although this bug is also important to fix, I believe it's more
> >>> > > > important
> >>> > > > > > to release a fix for the regression we've found in 3.4.11 asap.
> >>> > > > > >
> >>> > > > > > Abe, any thoughts?
> >>> 

ZooKeeper_branch34_openjdk7 - Build # 1879 - Still Failing

2018-04-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/1879/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 39.92 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedClientTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.575 sec
[junit] Running org.apache.zookeeper.test.SaslAuthDesignatedServerTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.515 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.903 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.469 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.591 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.468 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.083 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.482 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
32.916 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.873 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.609 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.254 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.542 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.994 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.25 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.092 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.77 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.491 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
12.412 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.585 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1382:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1385:
 Tests failed!

Total time: 32 minutes 40 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Recording test results
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
9 tests failed.
FAILED:  org.apache.zookeeper.server.ZxidRolloverTest.testMultipleRollover

Error Message:
java.net.BindException: Address already in use

Stack Trace:
java.lang.RuntimeException: java.net.BindException: Address already in use
at org.apache.zookeeper.test.QuorumUtil.(QuorumUtil.java:116)
at org.apache.zookeeper.test.QuorumUtil.(QuorumUtil.java:121)
at 
org.apache.zookeeper.server.ZxidRolloverTest.setUp(ZxidRolloverTest.java:63)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
   

ZooKeeper_branch35_openjdk7 - Build # 910 - Failure

2018-04-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_openjdk7/910/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 61.29 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.644 sec, Thread: 5, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
5
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.115 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 5
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
72.053 sec, Thread: 6, Class: org.apache.zookeeper.test.QuorumZxidSyncTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 6
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 2.7 
sec, Thread: 6, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 14, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 
80.762 sec, Thread: 8, Class: org.apache.zookeeper.test.QuorumTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 6
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.945 sec, Thread: 6, Class: org.apache.zookeeper.test.StatTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 8
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.08 sec, Thread: 6, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 6
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.797 sec, Thread: 6, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.696 sec, Thread: 8, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 6
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 8
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
20.987 sec, Thread: 5, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 5
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.092 sec, Thread: 5, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 5
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.168 sec, Thread: 6, Class: org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.241 sec, Thread: 5, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 6
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 5
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.105 sec, Thread: 5, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 5
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
34.248 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.764 sec, Thread: 2, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.837 sec, Thread: 5, Class: org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.593 sec, Thread: 8, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
27.998 sec, Thread: 6, Class: org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
348.639 sec, Thread: 7, Class: org.apache.zookeeper.test.NettyNettySuiteTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
256.16 sec, Thread: 1, Class: org.apache.zookeeper.test.ReconfigTest
[junit] Tests run: 103, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
361.457 sec, Thread: 4, Class: org.apache.zookeeper.test.NioNettySuiteTest
[junit] Running org.apache.zookeeper.server.quorum.StandaloneDisabledTest 
in thread 3
[junit] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 
sec, Thread: 3, Class: org.apache.zookeeper.server.quorum.StandaloneDisabledTest
[junit] Test 

Failed: ZOOKEEPER- PreCommit Build #1578

2018-04-13 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1578/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.28 MB...]
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1578//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1578//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1578//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1724:
 exec returned: 1

Total time: 18 minutes 28 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-3019
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: ZOOKEEPER- PreCommit Build #1577

2018-04-13 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1577/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 75.94 MB...]
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1577//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1577//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1577//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1724:
 exec returned: 1

Total time: 18 minutes 6 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-3019
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

Failed: ZOOKEEPER- PreCommit Build #1576

2018-04-13 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1576/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 77.04 MB...]
 [exec] 
 [exec] -1 findbugs.  The patch appears to introduce 1 new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] +1 core tests.  The patch passed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1576//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1576//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1576//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1724:
 exec returned: 1

Total time: 18 minutes 53 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-3019
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
All tests passed

ZooKeeper_branch35_jdk7 - Build # 1355 - Failure

2018-04-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch35_jdk7/1355/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 60.23 KB...]
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.949 sec, Thread: 8, Class: 
org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.099 sec, Thread: 2, Class: org.apache.zookeeper.test.SaslClientTest
[junit] Running org.apache.zookeeper.test.SaslSuperUserTest in thread 2
[junit] Running org.apache.zookeeper.test.ServerCnxnTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.552 sec, Thread: 2, Class: org.apache.zookeeper.test.SaslSuperUserTest
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest in thread 
2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.607 sec, Thread: 8, Class: org.apache.zookeeper.test.ServerCnxnTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.118 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionInvalidationTest
[junit] Running org.apache.zookeeper.test.SessionTest in thread 8
[junit] Running org.apache.zookeeper.test.SessionTrackerCheckTest in thread 
2
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.176 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionTrackerCheckTest
[junit] Running org.apache.zookeeper.test.SessionUpgradeTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.407 sec, Thread: 2, Class: org.apache.zookeeper.test.SessionUpgradeTest
[junit] Running org.apache.zookeeper.test.StandaloneTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.713 sec, Thread: 2, Class: org.apache.zookeeper.test.StandaloneTest
[junit] Running org.apache.zookeeper.test.StatTest in thread 2
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.065 sec, Thread: 2, Class: org.apache.zookeeper.test.StatTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
177.646 sec, Thread: 3, Class: org.apache.zookeeper.test.RecoveryTest
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest in thread 3
[junit] Running org.apache.zookeeper.test.StringUtilTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.116 sec, Thread: 2, Class: org.apache.zookeeper.test.StringUtilTest
[junit] Running org.apache.zookeeper.test.SyncCallTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.051 sec, Thread: 2, Class: org.apache.zookeeper.test.SyncCallTest
[junit] Running org.apache.zookeeper.test.TruncateTest in thread 2
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
3.115 sec, Thread: 3, Class: org.apache.zookeeper.test.StaticHostProviderTest
[junit] Running org.apache.zookeeper.test.WatchEventWhenAutoResetTest in 
thread 3
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
35.705 sec, Thread: 8, Class: org.apache.zookeeper.test.SessionTest
[junit] Running org.apache.zookeeper.test.WatchedEventTest in thread 8
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.156 sec, Thread: 8, Class: org.apache.zookeeper.test.WatchedEventTest
[junit] Running org.apache.zookeeper.test.WatcherFuncTest in thread 8
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
4.484 sec, Thread: 8, Class: org.apache.zookeeper.test.WatcherFuncTest
[junit] Running org.apache.zookeeper.test.WatcherTest in thread 8
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
126.106 sec, Thread: 7, Class: org.apache.zookeeper.test.RestoreCommittedLogTest
[junit] Running org.apache.zookeeper.test.X509AuthTest in thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.171 sec, Thread: 7, Class: org.apache.zookeeper.test.X509AuthTest
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest in 
thread 7
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
17.811 sec, Thread: 2, Class: org.apache.zookeeper.test.TruncateTest
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest in thread 2
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.231 sec, Thread: 2, Class: org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
23.344 sec, Thread: 3, Class: 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
22.29 sec, Thread: 7, Class: 

Failed: ZOOKEEPER- PreCommit Build #1575

2018-04-13 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1575/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 74.32 MB...]
 [exec] 
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
(version 3.0.1) warnings.
 [exec] 
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec] 
 [exec] -1 core tests.  The patch failed core unit tests.
 [exec] 
 [exec] +1 contrib tests.  The patch passed contrib unit tests.
 [exec] 
 [exec] Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1575//testReport/
 [exec] Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1575//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
 [exec] Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/1575//console
 [exec] 
 [exec] This message is automatically generated.
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Adding comment to Jira.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] 
 [exec] 
 [exec] 
==
 [exec] 
==
 [exec] Finished build.
 [exec] 
==
 [exec] 
==
 [exec] 
 [exec] 
 [exec] Unable to log in to server: 
https://issues.apache.org/jira/rpc/soap/jirasoapservice-v2 with user: hadoopqa.
 [exec]  Cause: ; nested exception is: 
 [exec] javax.net.ssl.SSLException: Received fatal alert: 
protocol_version
 [exec] mv: 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 and 
'/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/patchprocess'
 are the same file

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/PreCommit-ZOOKEEPER-github-pr-build/build.xml:1722:
 exec returned: 1

Total time: 13 minutes 8 seconds
Build step 'Execute shell' marked build as failure
Archiving artifacts
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Recording test results
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
[description-setter] Description set: ZOOKEEPER-2988
Putting comment on the pull request
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7
Setting JDK_1_7_LATEST__HOME=/home/jenkins/tools/java/latest1.7



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderOutOfView

Error Message:
Corrupt peer should not attempt connection to out of view leader

Stack Trace:
junit.framework.AssertionFailedError: Corrupt peer should not attempt 
connection to out of view leader
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderOutOfView(QuorumPeerMainTest.java:1085)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)

[jira] [Commented] (ZOOKEEPER-2988) NPE triggered if server receives a vote for a server id not in their voting view

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436930#comment-16436930
 ] 

ASF GitHub Bot commented on ZOOKEEPER-2988:
---

Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/476
  
Made all changes requested in comments, I can alter the pull requests for 
the other ZooKeeper branches once we reach agreement on this one.


> NPE triggered if server receives a vote for a server id not in their voting 
> view
> 
>
> Key: ZOOKEEPER-2988
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2988
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Brian Nixon
>Priority: Minor
>
> We've observed the following behavior in elections when a node is lagging 
> behind the quorum in its view of the ensemble topology.
> - Node A is operating with node B in its voting view, but without view of 
> node C.
> - B votes for C.
> - A then switches its vote to C, but throws a NPE when attempting to connect.
> This causes the QuorumPeer to spin up a Follower only to immediately have it 
> shutdown by the exception.
> Ideally, A would not advertise a vote for a server that it will not follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #476: ZOOKEEPER-2988: NPE triggered if server receives a vot...

2018-04-13 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/476
  
Made all changes requested in comments, I can alter the pull requests for 
the other ZooKeeper branches once we reach agreement on this one.


---