Re: question about approach to take

2018-11-13 Thread Ted Dunning
The general rule at Apache is that if somebody has an itch to implement
something, they do it.

I have a strong need for this feature and will be implementing it. Doing so
doesn't change what features get pushed.

Regarding votes and such, commits are normally only subject to technical
blocks. Absent a valid technical objection to using an existing
configuration format, I plan to move forward with this. I am sympathetic to
a desire for a new config and will even contribute to that effort, but I
can't wait for it.

On Tue, Nov 13, 2018, 22:25 Alexander Shraer  This seems like a good feature for ZooKeeper to eventually have, but I
> don't see why it must make 3.4 and 3.5 while other features are pushed out
> to 3.6.
> Like any other feature, it would be subject to a vote and isn't a
> unilateral decision.
>
> On Tue, Nov 13, 2018 at 1:59 PM Ted Dunning  wrote:
>
> > I am going to push this feature out sooner rather than later. That isn't
> a
> > question. I and my team are going to do the work. Others are very welcome
> > to help and I am sure that there will be high value in getting reviews
> from
> > a wide group.
> >
> > But we are already working on the code. And we will be pushing a version
> > into both 3.4 and 3.5. I think that 3.6-ish is a great target for an
> > improved configuration syntax. Better configuration is a great goal, but
> it
> > isn't OK to delay other work.
> >
> >
> > On Tue, Nov 13, 2018 at 3:47 PM Alexander Shraer 
> > wrote:
> >
> > > I also wanted to get other's views on this.
> > >
> > > My opinion is that the current server configuration format
> > > (server.x=ip:port:port:role;ip:port) has run its course. There are
> > multiple
> > > proposals for additions/changes to the server configuration,
> > > that would be simplified from having a more extensible format, such as
> a
> > > json blob, as proposed by Brian Nixon here:
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> > > It is true that such an extension hasn't happened yet, however it may
> not
> > > be a good idea to continue adding individual features to the existing
> > > format instead of making this change.
> > >
> > > For longer than a year, maybe more, I've seen features pushed out to
> 3.6
> > to
> > > avoid destabilizing the 3.5 release. If we follow the same logic here,
> > this
> > > would be a 3.6 feature, so compatibility with the old format doesn't
> seem
> > > very important.
> > >
> > > What do others think ?
> > >
> > >
> > > Thanks,
> > > Alex
> > >
> > >
> > >
> > > On Mon, Nov 12, 2018 at 11:47 PM Ted Dunning 
> > > wrote:
> > >
> > > > There is a JIRA live for the network resilience feature that I
> > mentioned
> > > > previously.
> > > >
> > > > The design document
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> > > > >
> > > > (also
> > > > copied into the JIRA) has essentially converged except for two
> points.
> > > >
> > > > These include:
> > > >
> > > > 1) Artem Chernatsky has pointed out an opportunity to factor our port
> > > sets
> > > > in the configuration syntax as well as an interesting interaction
> with
> > > the
> > > > existing behavior where the current servers already listen to the
> > > specified
> > > > ports on all NICs. This semantics of this interaction between
> > > configuration
> > > > options need to be specified rigorously, but this doesn't appear to
> > > impact
> > > > code complexity much, nor introduce any real difficulties.
> > > >
> > > > 2) Alex Shraer seems to feel that there is a strong interaction
> between
> > > > this
> > > > issue  and a
> > > > proposed
> > > > refactorization of the configuration file syntax (mentioned in a
> > comment
> > > in
> > > > 3166, but apparently doesn't have an independent issue). In
> particular,
> > > he
> > > > seems to think that the syntax refactorization is a blocker for the
> > > network
> > > > resilience. My own feeling is that there is some interaction, but
> there
> > > is
> > > > no strong ordering between the two issues if the implementors of this
> > > issue
> > > > are willing to commit to supporting any consensus syntax change that
> is
> > > > adopted. Essentially, there can be an additional issue filed which is
> > > > blocked by both the syntax change issue and 3188 (network resilience)
> > to
> > > > support any new syntax. The work for 3188 needs to support the old
> > syntax
> > > > in any case so that we can backport changes to 3.4.
> > > >
> > > > Other open issues that are affected by configuration syntax change
> > > include
> > > > 2534 , 2531
> > > > , 195
> > > > , and 2225
> > > > . None of
> these
> > > has
> > > > any serious impact other than 

[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-13 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r233289109
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -103,71 +105,102 @@
 boolean isConnected() {
 // Assuming that isConnected() is only used to initiate connection,
 // not used by some other connection status judgement.
-return channel != null;
+connectLock.lock();
+try {
+return channel != null || connectFuture != null;
+} finally {
+connectLock.unlock();
+}
+}
+
+private Bootstrap configureBootstrapAllocator(Bootstrap bootstrap) {
+ByteBufAllocator testAllocator = TEST_ALLOCATOR.get();
+if (testAllocator != null) {
+return bootstrap.option(ChannelOption.ALLOCATOR, 
testAllocator);
+} else {
+return bootstrap;
+}
 }
 
 @Override
 void connect(InetSocketAddress addr) throws IOException {
 firstConnect = new CountDownLatch(1);
 
-ClientBootstrap bootstrap = new ClientBootstrap(channelFactory);
-
-bootstrap.setPipelineFactory(new 
ZKClientPipelineFactory(addr.getHostString(), addr.getPort()));
-bootstrap.setOption("soLinger", -1);
-bootstrap.setOption("tcpNoDelay", true);
-
-connectFuture = bootstrap.connect(addr);
-connectFuture.addListener(new ChannelFutureListener() {
-@Override
-public void operationComplete(ChannelFuture channelFuture) 
throws Exception {
-// this lock guarantees that channel won't be assgined 
after cleanup().
-connectLock.lock();
-try {
-if (!channelFuture.isSuccess() || connectFuture == 
null) {
-LOG.info("future isn't success, cause: {}", 
channelFuture.getCause());
-return;
-}
-// setup channel, variables, connection, etc.
-channel = channelFuture.getChannel();
-
-disconnected.set(false);
-initialized = false;
-lenBuffer.clear();
-incomingBuffer = lenBuffer;
-
-sendThread.primeConnection();
-updateNow();
-updateLastSendAndHeard();
-
-if (sendThread.tunnelAuthInProgress()) {
-waitSasl.drainPermits();
-needSasl.set(true);
-sendPrimePacket();
-} else {
-needSasl.set(false);
-}
+Bootstrap bootstrap = new Bootstrap()
+.group(eventLoopGroup)
+.channel(NettyUtils.nioOrEpollSocketChannel())
+.option(ChannelOption.SO_LINGER, -1)
+.option(ChannelOption.TCP_NODELAY, true)
+.handler(new ZKClientPipelineFactory(addr.getHostString(), 
addr.getPort()));
+bootstrap = configureBootstrapAllocator(bootstrap);
+bootstrap.validate();
 
-// we need to wake up on first connect to avoid 
timeout.
-wakeupCnxn();
-firstConnect.countDown();
-LOG.info("channel is connected: {}", 
channelFuture.getChannel());
-} finally {
-connectLock.unlock();
+connectLock.lock();
+try {
+connectFuture = bootstrap.connect(addr);
+connectFuture.addListener(new ChannelFutureListener() {
+@Override
+public void operationComplete(ChannelFuture channelFuture) 
throws Exception {
+// this lock guarantees that channel won't be assigned 
after cleanup().
+connectLock.lock();
+try {
+if (!channelFuture.isSuccess()) {
+LOG.info("future isn't success, cause:", 
channelFuture.cause());
+return;
+} else if (connectFuture == null) {
--- End diff --

How could `connectFuture` be null?
`connectFuture.addListener` call would have already thrown NPE in that case.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-13 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r233288683
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java 
---
@@ -103,71 +105,102 @@
 boolean isConnected() {
 // Assuming that isConnected() is only used to initiate connection,
 // not used by some other connection status judgement.
-return channel != null;
+connectLock.lock();
+try {
+return channel != null || connectFuture != null;
--- End diff --

Why would you like to check `connectFuture` too?


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-13 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r233279457
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -116,170 +116,104 @@ public void channelConnected(ChannelHandlerContext 
ctx,
 
 NettyServerCnxn cnxn = new NettyServerCnxn(channel,
 zkServer, NettyServerCnxnFactory.this);
-ctx.setAttachment(cnxn);
+ctx.channel().attr(CONNECTION_ATTRIBUTE).set(cnxn);
 
 if (secure) {
-SslHandler sslHandler = 
ctx.getPipeline().get(SslHandler.class);
-ChannelFuture handshakeFuture = sslHandler.handshake();
+SslHandler sslHandler = 
ctx.pipeline().get(SslHandler.class);
+Future handshakeFuture = 
sslHandler.handshakeFuture();
 handshakeFuture.addListener(new 
CertificateVerifier(sslHandler, cnxn));
 } else {
-allChannels.add(ctx.getChannel());
+allChannels.add(ctx.channel());
 addCnxn(cnxn);
 }
 }
 
 @Override
-public void channelDisconnected(ChannelHandlerContext ctx,
-ChannelStateEvent e) throws Exception
-{
+public void channelInactive(ChannelHandlerContext ctx) throws 
Exception {
 if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnected " + e);
+LOG.trace("Channel inactive {}", ctx.channel());
 }
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+allChannels.remove(ctx.channel());
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
 if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnect caused close " + e);
+LOG.trace("Channel inactive caused close {}", cnxn);
 }
 cnxn.close();
 }
 }
 
 @Override
-public void exceptionCaught(ChannelHandlerContext ctx, 
ExceptionEvent e)
-throws Exception
-{
-LOG.warn("Exception caught " + e, e.getCause());
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void exceptionCaught(ChannelHandlerContext ctx, Throwable 
cause) throws Exception {
+LOG.warn("Exception caught", cause);
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
 if (LOG.isDebugEnabled()) {
-LOG.debug("Closing " + cnxn);
+LOG.debug("Closing {}", cnxn);
 }
 cnxn.close();
 }
 }
 
 @Override
-public void messageReceived(ChannelHandlerContext ctx, 
MessageEvent e)
-throws Exception
-{
-if (LOG.isTraceEnabled()) {
-LOG.trace("message received called " + e.getMessage());
-}
+public void userEventTriggered(ChannelHandlerContext ctx, Object 
evt) throws Exception {
 try {
-if (LOG.isDebugEnabled()) {
-LOG.debug("New message " + e.toString()
-+ " from " + ctx.getChannel());
-}
-NettyServerCnxn cnxn = 
(NettyServerCnxn)ctx.getAttachment();
-synchronized(cnxn) {
-processMessage(e, cnxn);
+if (evt == NettyServerCnxn.AutoReadEvent.ENABLE) {
+LOG.debug("Received AutoReadEvent.ENABLE");
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).get();
+// TODO(ilyam): Not sure if cnxn can be null here. It 
becomes null if channelInactive()
--- End diff --

Do you need to remove `cnxn` from the channel in the mentioned two events?
Null check wouldn't do any harm though.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-13 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r233282137
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -316,16 +251,17 @@ public void operationComplete(ChannelFuture future)
 if (KeeperException.Code.OK !=
 authProvider.handleAuthentication(cnxn, null)) 
{
 LOG.error("Authentication failed for session 0x{}",
-Long.toHexString(cnxn.sessionId));
+Long.toHexString(cnxn.getSessionId()));
 cnxn.close();
 return;
 }
 
-allChannels.add(future.getChannel());
+final Channel futureChannel = future.getNow();
--- End diff --

I think `get()` would be enough, but the check is harmful anyway.


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-13 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r233278670
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
 ---
@@ -116,170 +116,104 @@ public void channelConnected(ChannelHandlerContext 
ctx,
 
 NettyServerCnxn cnxn = new NettyServerCnxn(channel,
 zkServer, NettyServerCnxnFactory.this);
-ctx.setAttachment(cnxn);
+ctx.channel().attr(CONNECTION_ATTRIBUTE).set(cnxn);
 
 if (secure) {
-SslHandler sslHandler = 
ctx.getPipeline().get(SslHandler.class);
-ChannelFuture handshakeFuture = sslHandler.handshake();
+SslHandler sslHandler = 
ctx.pipeline().get(SslHandler.class);
+Future handshakeFuture = 
sslHandler.handshakeFuture();
 handshakeFuture.addListener(new 
CertificateVerifier(sslHandler, cnxn));
 } else {
-allChannels.add(ctx.getChannel());
+allChannels.add(ctx.channel());
 addCnxn(cnxn);
 }
 }
 
 @Override
-public void channelDisconnected(ChannelHandlerContext ctx,
-ChannelStateEvent e) throws Exception
-{
+public void channelInactive(ChannelHandlerContext ctx) throws 
Exception {
 if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnected " + e);
+LOG.trace("Channel inactive {}", ctx.channel());
 }
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+allChannels.remove(ctx.channel());
+NettyServerCnxn cnxn = 
ctx.channel().attr(CONNECTION_ATTRIBUTE).getAndSet(null);
 if (cnxn != null) {
 if (LOG.isTraceEnabled()) {
-LOG.trace("Channel disconnect caused close " + e);
+LOG.trace("Channel inactive caused close {}", cnxn);
 }
 cnxn.close();
 }
 }
 
 @Override
-public void exceptionCaught(ChannelHandlerContext ctx, 
ExceptionEvent e)
-throws Exception
-{
-LOG.warn("Exception caught " + e, e.getCause());
-NettyServerCnxn cnxn = (NettyServerCnxn) ctx.getAttachment();
+public void exceptionCaught(ChannelHandlerContext ctx, Throwable 
cause) throws Exception {
--- End diff --

You remove `ctx.channel()` from `allChannels` in the Inactive method. Which 
was actually not the case in the original impl, but I think it makes perfect 
sense.

Don't you wanna do the same in here?


---


[GitHub] zookeeper pull request #669: ZOOKEEPER-3152: Port ZK netty stack to netty4

2018-11-13 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/669#discussion_r233286512
  
--- Diff: 
zookeeper-server/src/test/java/org/apache/zookeeper/test/TestByteBufAllocator.java
 ---
@@ -0,0 +1,152 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.test;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import java.util.concurrent.atomic.AtomicReference;
+
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.CompositeByteBuf;
+import io.netty.buffer.PooledByteBufAllocator;
+import io.netty.util.ResourceLeakDetector;
+
+/**
+ * This is a custom ByteBufAllocator that tracks outstanding allocations 
and
+ * crashes the program if any of them are leaked.
+ *
+ * Never use this class in production, it will cause your server to run out
+ * of memory! This is because it holds strong references to all allocated
+ * buffers and doesn't release them until checkForLeaks() is called at the
+ * end of a unit test.
+ *
+ * Note: the original code was copied from 
https://github.com/airlift/drift,
+ * with the permission and encouragement of airlift's author (dain). 
Airlift
+ * uses the same apache 2.0 license as Zookeeper so this should be ok.
+ *
+ * However, the code was modified to take advantage of Netty's built-in
+ * leak tracking and make a best effort to print details about buffer 
leaks.
+ *
+ */
+public class TestByteBufAllocator extends PooledByteBufAllocator {
+private static AtomicReference INSTANCE =
+new AtomicReference<>(null);
+
+/**
+ * Get the singleton testing allocator.
+ * @return the singleton allocator, creating it if one does not exist.
+ */
+public static TestByteBufAllocator getInstance() {
+TestByteBufAllocator result = INSTANCE.get();
+if (result == null) {
+ResourceLeakDetector.Level oldLevel = 
ResourceLeakDetector.getLevel();
+
ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID);
+INSTANCE.compareAndSet(null, new 
TestByteBufAllocator(oldLevel));
+result = INSTANCE.get();
+}
+return result;
+}
+
+/**
+ * Destroys the singleton testing allocator and throws an error if any 
of the
+ * buffers allocated by it have been leaked. Attempts to print leak 
details to
+ * standard error before throwing, by using netty's built-in leak 
tracking.
+ * Note that this might not always work, since it only triggers when a 
buffer
+ * is garbage-collected and calling System.gc() does not guarantee 
that a buffer
+ * will actually be GC'ed.
+ *
+ * This should be called at the end of a unit test's tearDown() method.
+ */
+public static void checkForLeaks() {
+TestByteBufAllocator result = INSTANCE.getAndSet(null);
+if (result != null) {
+result.checkInstanceForLeaks();
+}
+}
+
+private final List trackedBuffers = new ArrayList<>();
+private final ResourceLeakDetector.Level oldLevel;
+
+private TestByteBufAllocator(ResourceLeakDetector.Level oldLevel)
+{
+super(false);
+this.oldLevel = oldLevel;
+}
+
+@Override
+protected ByteBuf newHeapBuffer(int initialCapacity, int maxCapacity)
+{
+return track(super.newHeapBuffer(initialCapacity, maxCapacity));
+}
+
+@Override
+protected ByteBuf newDirectBuffer(int initialCapacity, int maxCapacity)
+{
+return track(super.newDirectBuffer(initialCapacity, maxCapacity));
+}
+
+@Override
+public CompositeByteBuf compositeHeapBuffer(int maxNumComponents)
+{
+return track(super.compositeHeapBuffer(maxNumComponents));
+}
+
+

[jira] [Commented] (ZOOKEEPER-3018) Ephemeral node not deleted after session is gone

2018-11-13 Thread Clouds Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685975#comment-16685975
 ] 

Clouds Xu commented on ZOOKEEPER-3018:
--

Yes, if leader clock skewed,  the thread SessionTrackerImpl will sleep or skip 
over a period of time。

> Ephemeral node not deleted after session is gone
> 
>
> Key: ZOOKEEPER-3018
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 
> GNU/Linux
>Reporter: Daniel C
>Priority: Major
> Attachments: zk-3018.zip
>
>
> We have a live Zookeeper environment (quorum size is 2) and observed a 
> strange behavior:
>  * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and 
> /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>  * The Kafka clients were long gone but as of today (20+ days after), the two 
> ephemeral nodes are still present
>  
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
>  
> {noformat}
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>    /brokers/ids/822712429
> 0x162183ea9f70002:
>    /brokers/ids/707577499
>    /controller
> {noformat}
>  
>  
> 2) stat on /brokers/ids/822712429
>  
> {noformat}
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
> {noformat}
>  
>  
> 3) List full connection/session details for all clients connected
>  
> {noformat}
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  
> /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=4,lcxid=0x12,lzxid=0x,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
> {noformat}
>  
>  
>  
> {noformat}
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  
> /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
> {noformat}
>  
>  
> 4) health
>  
> {noformat}
> $ echo mntr | nc $SERVER1 2181
> zk_version   3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency 443
> zk_min_latency  0
> zk_packets_received   11158019
> zk_packets_sent   11158244
> zk_num_alive_connections   2
> zk_outstanding_requests  0
> zk_server_state follower
> zk_znode_count   344
> zk_watch_count   0
> zk_ephemerals_count 3
> zk_approximate_data_size  36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count 65536
> {noformat}
>  
>  
> 5) Server logs with related sessions:
> {noformat}
> Only found these logs from Server1 related to the sessions (0x162183ea9f70002 
> and 0x162183ea9f70003):
> 2018-03-12 03:28:35,127 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.196.18.60:26775
> 2018-03-12 03:28:35,131 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.196.18.60:26775; will be dropped if server is in 
> r-o mode
> 2018-03-12 03:28:35,131 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.196.18.60:26775
> 2018-03-12 03:28:35,137 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775
>  
> 2018-03-12 03:30:36,415 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.247.114.70:39260
> 2018-03-12 03:30:36,422 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.247.114.70:39260; will be dropped if server is in 
> r-o mode
> 2018-03-12 03:30:36,423 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.247.114.70:39260
> 2018-03-12 03:30:36,428 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x162183ea9f70003 with 

[jira] [Updated] (ZOOKEEPER-3190) Spell check on the Zookeeper server files

2018-11-13 Thread Dinesh Appavoo (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Appavoo updated ZOOKEEPER-3190:
--
Summary: Spell check on the Zookeeper server files  (was: Spell check the 
Zookeeper server files)

> Spell check on the Zookeeper server files
> -
>
> Key: ZOOKEEPER-3190
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3190
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation, other
>Reporter: Dinesh Appavoo
>Priority: Minor
>  Labels: newbie
>
> This JIRA is to do spell check on the zookeeper server files [ 
> zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server ]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3190) Spell check the Zookeeper server files

2018-11-13 Thread Dinesh Appavoo (JIRA)
Dinesh Appavoo created ZOOKEEPER-3190:
-

 Summary: Spell check the Zookeeper server files
 Key: ZOOKEEPER-3190
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3190
 Project: ZooKeeper
  Issue Type: Improvement
  Components: documentation, other
Reporter: Dinesh Appavoo


This JIRA is to do spell check on the zookeeper server files [ 
zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server ]. 






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: question about approach to take

2018-11-13 Thread Michael Han
I think there are two outstanding topics we need converge here.

First - does this work (ZOOKEEPER-3188) require a new configuration
subsystem (with new syntax, more powerful features etc)?
Second - what releases should ZOOKEEPER-3188 go in?

For first, technically ZOOKEEPER-3188 is not blocked by any work related to
the new configuration subsystem, as it seems totally possible to do this on
top of existing configuration format. However, I tend to agree with Alex
that having an improved configuration system that's more extensible would
be great, which will save time and effort later for the community as we
don't have to rewrite ZOOKEEPER-3188 to make it fit into the new subsystem.
Given how much folks want a new extensible system (to support things like
secure port, follower port that hosts observers, and many more), now it
seems a good time to do this first, though there is intrinsic dependency
between this and ZOOKEEPER-3188.

Second about release - I tend to agree with Alex and Andor that this
feature should not go to 3.4, because reasons Andor mentioned very well in
his reply. For 3.5, I have a mixed feeling because we are close to finally
reach stable release and having a dependency like this sounds not a good
idea - but I will leave the final decision to the release manager of the
upcoming stable 3.5 release. For 3.6, I think no one will object to land
this feature once it's available.

I also made some comments on ZOOKEEPER-3188 JIRA about high level design.

On Tue, Nov 13, 2018 at 3:17 PM Andor Molnar 
wrote:

> Hi Ted,
>
> Thanks for your contribution and the accurate design of this proposal. I
> have the following comments on it:
>
> 1)
> What's the need of the explicit support of multiple network interfaces? DNS
> names can resolve into multiple address and we could easily implement
> trying the multiple addresses to a single server in a round-robin fashion.
> Currently it's done by getByName() which always chooses the first one. What
> are the benefits of your implementation?
>
> 2)
> In terms of releases.
>
> The community already agreed on that 3.4 only accepts critical bug and
> security fixes. We could always start a vote on anything, but personally I
> doubt that such a significant change in the networking code could ever make
> it into 3.4. The reason why we're doing this is that 3.5 is just about to
> be released very soon, therefore we encourage new features and enhancements
> to be implemented 3.5 onwards.
>
> I believe it needs to get into master first. Once we have an insight on how
> significant the code change is exactly and the impact, we'll be able to
> talk about backporting it to 3.5.
>
> Don't worry about this too much. We don't want to wait another 4 years to
> release 3.6: with the current pace of patches from contributors (especially
> Facebook), we'll have another major (3.6) release very soon. Months rather
> than years.
>
> Regards,
> Andor
>
>
>
>
>
>
>
>
>
> On Tue, Nov 13, 2018 at 2:25 PM, Alexander Shraer 
> wrote:
>
> > This seems like a good feature for ZooKeeper to eventually have, but I
> > don't see why it must make 3.4 and 3.5 while other features are pushed
> out
> > to 3.6.
> > Like any other feature, it would be subject to a vote and isn't a
> > unilateral decision.
> >
> > On Tue, Nov 13, 2018 at 1:59 PM Ted Dunning 
> wrote:
> >
> > > I am going to push this feature out sooner rather than later. That
> isn't
> > a
> > > question. I and my team are going to do the work. Others are very
> welcome
> > > to help and I am sure that there will be high value in getting reviews
> > from
> > > a wide group.
> > >
> > > But we are already working on the code. And we will be pushing a
> version
> > > into both 3.4 and 3.5. I think that 3.6-ish is a great target for an
> > > improved configuration syntax. Better configuration is a great goal,
> but
> > it
> > > isn't OK to delay other work.
> > >
> > >
> > > On Tue, Nov 13, 2018 at 3:47 PM Alexander Shraer 
> > > wrote:
> > >
> > > > I also wanted to get other's views on this.
> > > >
> > > > My opinion is that the current server configuration format
> > > > (server.x=ip:port:port:role;ip:port) has run its course. There are
> > > multiple
> > > > proposals for additions/changes to the server configuration,
> > > > that would be simplified from having a more extensible format, such
> as
> > a
> > > > json blob, as proposed by Brian Nixon here:
> > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> > > > It is true that such an extension hasn't happened yet, however it may
> > not
> > > > be a good idea to continue adding individual features to the existing
> > > > format instead of making this change.
> > > >
> > > > For longer than a year, maybe more, I've seen features pushed out to
> > 3.6
> > > to
> > > > avoid destabilizing the 3.5 release. If we follow the same logic
> here,
> > > this
> > > > would be a 3.6 feature, so compatibility with the old format doesn't
> > seem
> > > > very important.
> > > >
> 

[jira] [Commented] (ZOOKEEPER-3188) Improve resilience to network

2018-11-13 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685929#comment-16685929
 ] 

Michael Han commented on ZOOKEEPER-3188:


A couple of comments on the high level design:

* Did we consider the compatibility requirement here? Will the new 
configuration format be backward compatible? One concrete use case is if a 
customer upgrades to new version with this multiple address per server 
capability but wants to roll back without rewriting the config files to older 
version.

* Did we evaluate the impact of this feature on existing server to server 
mutual authentication and authorization feature (e.g. ZOOKEEPER-1045 for 
Kerberos, ZOOKEEPER-236 for SSL), and also the impact on operations? For 
example how to configure Kerberos principals and / or SSL certs per host given 
multiple potential IP address and / or FQDN names per server?

* Could we provide more details on expected level of support with regards to 
dynamic reconfiguration feature? Examples would be great - for example: we 
would support adding, removing, or updating server address that's appertained 
to a given server via dynamic reconfiguration, and also the expected behavior 
in each case. For example, adding a new address to an existing ensemble member 
should not cause any disconnect / reconnect but removing an in use address of a 
server should cause a disconnect. Likely the dynamic reconfig API / CLI / doc 
should be updated because of this.

> Improve resilience to network
> -
>
> Key: ZOOKEEPER-3188
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3188
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>
> We propose to add network level resiliency to Zookeeper. The ideas that we 
> have on the topic have been discussed on the mailing list and via a 
> specification document that is located at 
> [https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]
> That document is copied to this issue which is being created to report the 
> results of experimental implementations.
> h1. Zookeeper Network Resilience
> h2. Background
> Zookeeper is designed to help in building distributed systems. It provides a 
> variety of operations for doing this and all of these operations have rather 
> strict guarantees on semantics. Zookeeper itself is a distributed system made 
> up of cluster containing a leader and a number of followers. The leader is 
> designated in a process known as leader election in which a majority of all 
> nodes in the cluster must agree on a leader. All subsequent operations are 
> initiated by the leader and completed when a majority of nodes have confirmed 
> the operation. Whenever an operation cannot be confirmed by a majority or 
> whenever the leader goes missing for a time, a new leader election is 
> conducted and normal operations proceed once a new leader is confirmed.
>  
> The details of this are not important relative to this discussion. What is 
> important is that the semantics of the operations conducted by a Zookeeper 
> cluster and the semantics of how client processes communicate with the 
> cluster depend only on the basic fact that messages sent over TCP connections 
> will never appear out of order or missing. Central to the design of ZK is 
> that a server to server network connection is used as long as it works to use 
> it and a new connection is made when it appears that the old connection isn't 
> working.
>  
> As currently implemented, however, each member of a Zookeeper cluster can 
> have only a single address as viewed from some other process. This means, 
> absent network link bonding, that the loss of a single switch or a few 
> network connections could completely stop the operations of a the Zookeeper 
> cluster. It is the goal of this work to address this issue by allowing each 
> server to listen on multiple network interfaces and to connect to other 
> servers any of several addresses. The effect will be to allow servers to 
> communicate over redundant network paths to improve resiliency to network 
> failures without changing any core algorithms.
> h2. Proposed Change
> Interestingly, the correct operations of a Zookeeper cluster do not depend on 
> _how_ a TCP connection was made. There is no reason at all not to advertise 
> multiple addresses for members of a Zookeeper cluster. 
>  
> Connections between members of a Zookeeper cluster and between a client and a 
> cluster member are established by referencing a configuration file (for 
> cluster members) that specifies the address of all of the nodes in a cluster 
> or by using a connection string containing possible addresses of Zookeeper 
> cluster members. As soon as a connection is made, any desired authentication 
> or encryption 

[GitHub] zookeeper issue #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thread and...

2018-11-13 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/689
  
@tumativ There is a very short window that we'll still add the dead watcher 
to the cleaner thread, I don't expect that will add too much GC overhead for 
these small amount of dead watch bits.

For your 2nd point, you can interrupt and wait for this thread to exit 
using join, although I'm not sure it's worth to, give these clean up could be 
done in background without affecting us starting a new ZK server with new 
ZKDatabase.

So I still prefer to simply interrupt instead of this change, both from 
complexity (which also means error-prone and hard to maintain in the future) 
and efficient sacrificed here.


---


Re: question about approach to take

2018-11-13 Thread Andor Molnar
Hi Ted,

Thanks for your contribution and the accurate design of this proposal. I
have the following comments on it:

1)
What's the need of the explicit support of multiple network interfaces? DNS
names can resolve into multiple address and we could easily implement
trying the multiple addresses to a single server in a round-robin fashion.
Currently it's done by getByName() which always chooses the first one. What
are the benefits of your implementation?

2)
In terms of releases.

The community already agreed on that 3.4 only accepts critical bug and
security fixes. We could always start a vote on anything, but personally I
doubt that such a significant change in the networking code could ever make
it into 3.4. The reason why we're doing this is that 3.5 is just about to
be released very soon, therefore we encourage new features and enhancements
to be implemented 3.5 onwards.

I believe it needs to get into master first. Once we have an insight on how
significant the code change is exactly and the impact, we'll be able to
talk about backporting it to 3.5.

Don't worry about this too much. We don't want to wait another 4 years to
release 3.6: with the current pace of patches from contributors (especially
Facebook), we'll have another major (3.6) release very soon. Months rather
than years.

Regards,
Andor









On Tue, Nov 13, 2018 at 2:25 PM, Alexander Shraer  wrote:

> This seems like a good feature for ZooKeeper to eventually have, but I
> don't see why it must make 3.4 and 3.5 while other features are pushed out
> to 3.6.
> Like any other feature, it would be subject to a vote and isn't a
> unilateral decision.
>
> On Tue, Nov 13, 2018 at 1:59 PM Ted Dunning  wrote:
>
> > I am going to push this feature out sooner rather than later. That isn't
> a
> > question. I and my team are going to do the work. Others are very welcome
> > to help and I am sure that there will be high value in getting reviews
> from
> > a wide group.
> >
> > But we are already working on the code. And we will be pushing a version
> > into both 3.4 and 3.5. I think that 3.6-ish is a great target for an
> > improved configuration syntax. Better configuration is a great goal, but
> it
> > isn't OK to delay other work.
> >
> >
> > On Tue, Nov 13, 2018 at 3:47 PM Alexander Shraer 
> > wrote:
> >
> > > I also wanted to get other's views on this.
> > >
> > > My opinion is that the current server configuration format
> > > (server.x=ip:port:port:role;ip:port) has run its course. There are
> > multiple
> > > proposals for additions/changes to the server configuration,
> > > that would be simplified from having a more extensible format, such as
> a
> > > json blob, as proposed by Brian Nixon here:
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> > > It is true that such an extension hasn't happened yet, however it may
> not
> > > be a good idea to continue adding individual features to the existing
> > > format instead of making this change.
> > >
> > > For longer than a year, maybe more, I've seen features pushed out to
> 3.6
> > to
> > > avoid destabilizing the 3.5 release. If we follow the same logic here,
> > this
> > > would be a 3.6 feature, so compatibility with the old format doesn't
> seem
> > > very important.
> > >
> > > What do others think ?
> > >
> > >
> > > Thanks,
> > > Alex
> > >
> > >
> > >
> > > On Mon, Nov 12, 2018 at 11:47 PM Ted Dunning 
> > > wrote:
> > >
> > > > There is a JIRA live for the network resilience feature that I
> > mentioned
> > > > previously.
> > > >
> > > > The design document
> > > > <
> > > >
> > >
> > https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_
> kOQaJZ2GDo7j36fI/edit?usp=sharing
> > > > >
> > > > (also
> > > > copied into the JIRA) has essentially converged except for two
> points.
> > > >
> > > > These include:
> > > >
> > > > 1) Artem Chernatsky has pointed out an opportunity to factor our port
> > > sets
> > > > in the configuration syntax as well as an interesting interaction
> with
> > > the
> > > > existing behavior where the current servers already listen to the
> > > specified
> > > > ports on all NICs. This semantics of this interaction between
> > > configuration
> > > > options need to be specified rigorously, but this doesn't appear to
> > > impact
> > > > code complexity much, nor introduce any real difficulties.
> > > >
> > > > 2) Alex Shraer seems to feel that there is a strong interaction
> between
> > > > this
> > > > issue  and a
> > > > proposed
> > > > refactorization of the configuration file syntax (mentioned in a
> > comment
> > > in
> > > > 3166, but apparently doesn't have an independent issue). In
> particular,
> > > he
> > > > seems to think that the syntax refactorization is a blocker for the
> > > network
> > > > resilience. My own feeling is that there is some interaction, but
> there
> > > is
> > > > no strong ordering between the two issues if the implementors of this
> > > 

Re: question about approach to take

2018-11-13 Thread Alexander Shraer
This seems like a good feature for ZooKeeper to eventually have, but I
don't see why it must make 3.4 and 3.5 while other features are pushed out
to 3.6.
Like any other feature, it would be subject to a vote and isn't a
unilateral decision.

On Tue, Nov 13, 2018 at 1:59 PM Ted Dunning  wrote:

> I am going to push this feature out sooner rather than later. That isn't a
> question. I and my team are going to do the work. Others are very welcome
> to help and I am sure that there will be high value in getting reviews from
> a wide group.
>
> But we are already working on the code. And we will be pushing a version
> into both 3.4 and 3.5. I think that 3.6-ish is a great target for an
> improved configuration syntax. Better configuration is a great goal, but it
> isn't OK to delay other work.
>
>
> On Tue, Nov 13, 2018 at 3:47 PM Alexander Shraer 
> wrote:
>
> > I also wanted to get other's views on this.
> >
> > My opinion is that the current server configuration format
> > (server.x=ip:port:port:role;ip:port) has run its course. There are
> multiple
> > proposals for additions/changes to the server configuration,
> > that would be simplified from having a more extensible format, such as a
> > json blob, as proposed by Brian Nixon here:
> > https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> > It is true that such an extension hasn't happened yet, however it may not
> > be a good idea to continue adding individual features to the existing
> > format instead of making this change.
> >
> > For longer than a year, maybe more, I've seen features pushed out to 3.6
> to
> > avoid destabilizing the 3.5 release. If we follow the same logic here,
> this
> > would be a 3.6 feature, so compatibility with the old format doesn't seem
> > very important.
> >
> > What do others think ?
> >
> >
> > Thanks,
> > Alex
> >
> >
> >
> > On Mon, Nov 12, 2018 at 11:47 PM Ted Dunning 
> > wrote:
> >
> > > There is a JIRA live for the network resilience feature that I
> mentioned
> > > previously.
> > >
> > > The design document
> > > <
> > >
> >
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> > > >
> > > (also
> > > copied into the JIRA) has essentially converged except for two points.
> > >
> > > These include:
> > >
> > > 1) Artem Chernatsky has pointed out an opportunity to factor our port
> > sets
> > > in the configuration syntax as well as an interesting interaction with
> > the
> > > existing behavior where the current servers already listen to the
> > specified
> > > ports on all NICs. This semantics of this interaction between
> > configuration
> > > options need to be specified rigorously, but this doesn't appear to
> > impact
> > > code complexity much, nor introduce any real difficulties.
> > >
> > > 2) Alex Shraer seems to feel that there is a strong interaction between
> > > this
> > > issue  and a
> > > proposed
> > > refactorization of the configuration file syntax (mentioned in a
> comment
> > in
> > > 3166, but apparently doesn't have an independent issue). In particular,
> > he
> > > seems to think that the syntax refactorization is a blocker for the
> > network
> > > resilience. My own feeling is that there is some interaction, but there
> > is
> > > no strong ordering between the two issues if the implementors of this
> > issue
> > > are willing to commit to supporting any consensus syntax change that is
> > > adopted. Essentially, there can be an additional issue filed which is
> > > blocked by both the syntax change issue and 3188 (network resilience)
> to
> > > support any new syntax. The work for 3188 needs to support the old
> syntax
> > > in any case so that we can backport changes to 3.4.
> > >
> > > Other open issues that are affected by configuration syntax change
> > include
> > > 2534 , 2531
> > > , 195
> > > , and 2225
> > > . None of these
> > has
> > > any serious impact other than the fact that configuration needs to be
> > > abstracted as part of any change. Some appear to be quite old and may
> > have
> > > already been solved or made moot.
> > >
> > > My own feeling is that pushing for this issue (3188) to include a
> change
> > to
> > > the configuration syntax as well as the core network resilience feature
> > > proposed is an unacceptable increase in scope. I have filed a new
> > tracking
> > > issue  (3189)
> > > capturing the intended rework after a change in configuration syntax,
> > but I
> > > can't find anywhere that the configuration change is captured in a
> issue
> > to
> > > add the dependency.
> > >
> > > I also see no particular way that configuration syntax change (as
> > desirable
> > > as it might be) 

Re: question about approach to take

2018-11-13 Thread Ted Dunning
I am going to push this feature out sooner rather than later. That isn't a
question. I and my team are going to do the work. Others are very welcome
to help and I am sure that there will be high value in getting reviews from
a wide group.

But we are already working on the code. And we will be pushing a version
into both 3.4 and 3.5. I think that 3.6-ish is a great target for an
improved configuration syntax. Better configuration is a great goal, but it
isn't OK to delay other work.


On Tue, Nov 13, 2018 at 3:47 PM Alexander Shraer  wrote:

> I also wanted to get other's views on this.
>
> My opinion is that the current server configuration format
> (server.x=ip:port:port:role;ip:port) has run its course. There are multiple
> proposals for additions/changes to the server configuration,
> that would be simplified from having a more extensible format, such as a
> json blob, as proposed by Brian Nixon here:
> https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> It is true that such an extension hasn't happened yet, however it may not
> be a good idea to continue adding individual features to the existing
> format instead of making this change.
>
> For longer than a year, maybe more, I've seen features pushed out to 3.6 to
> avoid destabilizing the 3.5 release. If we follow the same logic here, this
> would be a 3.6 feature, so compatibility with the old format doesn't seem
> very important.
>
> What do others think ?
>
>
> Thanks,
> Alex
>
>
>
> On Mon, Nov 12, 2018 at 11:47 PM Ted Dunning 
> wrote:
>
> > There is a JIRA live for the network resilience feature that I mentioned
> > previously.
> >
> > The design document
> > <
> >
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> > >
> > (also
> > copied into the JIRA) has essentially converged except for two points.
> >
> > These include:
> >
> > 1) Artem Chernatsky has pointed out an opportunity to factor our port
> sets
> > in the configuration syntax as well as an interesting interaction with
> the
> > existing behavior where the current servers already listen to the
> specified
> > ports on all NICs. This semantics of this interaction between
> configuration
> > options need to be specified rigorously, but this doesn't appear to
> impact
> > code complexity much, nor introduce any real difficulties.
> >
> > 2) Alex Shraer seems to feel that there is a strong interaction between
> > this
> > issue  and a
> > proposed
> > refactorization of the configuration file syntax (mentioned in a comment
> in
> > 3166, but apparently doesn't have an independent issue). In particular,
> he
> > seems to think that the syntax refactorization is a blocker for the
> network
> > resilience. My own feeling is that there is some interaction, but there
> is
> > no strong ordering between the two issues if the implementors of this
> issue
> > are willing to commit to supporting any consensus syntax change that is
> > adopted. Essentially, there can be an additional issue filed which is
> > blocked by both the syntax change issue and 3188 (network resilience) to
> > support any new syntax. The work for 3188 needs to support the old syntax
> > in any case so that we can backport changes to 3.4.
> >
> > Other open issues that are affected by configuration syntax change
> include
> > 2534 , 2531
> > , 195
> > , and 2225
> > . None of these
> has
> > any serious impact other than the fact that configuration needs to be
> > abstracted as part of any change. Some appear to be quite old and may
> have
> > already been solved or made moot.
> >
> > My own feeling is that pushing for this issue (3188) to include a change
> to
> > the configuration syntax as well as the core network resilience feature
> > proposed is an unacceptable increase in scope. I have filed a new
> tracking
> > issue  (3189)
> > capturing the intended rework after a change in configuration syntax,
> but I
> > can't find anywhere that the configuration change is captured in a issue
> to
> > add the dependency.
> >
> > I also see no particular way that configuration syntax change (as
> desirable
> > as it might be) blocks this feature.
> >
> > I would love to hear other opinions.
> >
> >
> > I, myself, think that the "support resilience under new syntax when
> > available" approach
> >
>


[GitHub] zookeeper issue #690: ZOOKEEPER-3179: Add snapshot compression to reduce the...

2018-11-13 Thread yisong-yue
Github user yisong-yue commented on the issue:

https://github.com/apache/zookeeper/pull/690
  
👍 Thanks for the feedback! 😄


---


[GitHub] zookeeper issue #690: ZOOKEEPER-3179: Add snapshot compression to reduce the...

2018-11-13 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/690
  
@yisong-yue 
Yeah, that's true. Perhaps it would be too much hassle and I'm trying to 
over-engineer things here. Let's just leave it as it is now and only do 
refactoring if we could benefit from it.


---


[GitHub] zookeeper issue #690: ZOOKEEPER-3179: Add snapshot compression to reduce the...

2018-11-13 Thread yisong-yue
Github user yisong-yue commented on the issue:

https://github.com/apache/zookeeper/pull/690
  
@anmolnar 
One drawback I see with this structure is that it makes it harder to 
dynamically pick deserialization mode based on a snapshot's filename, since 
`FileSnap` manages a whole `snapDir` directory now. Maybe we should separate 
directory management logic from snapshot file (de)serialization in FileSnap 
class.


---


ZooKeeper_branch34_openjdk7 - Build # 2121 - Failure

2018-11-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk7/2121/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 43.50 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.833 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.609 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.662 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.547 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.089 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.576 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.588 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.748 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.874 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.709 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.752 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.72 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
7.447 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.385 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.093 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.057 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
27.861 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
14.172 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.705 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.084 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1408:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk7/build.xml:1411:
 Tests failed!

Total time: 32 minutes 21 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Recording test results
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/
Setting OPENJDK_7_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-7-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderOutOfView

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderOutOfView(QuorumPeerMainTest.java:1321)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:55)

Re: question about approach to take

2018-11-13 Thread Enrico Olivelli
Il giorno mar 13 nov 2018 alle ore 16:47 Alexander Shraer
 ha scritto:
>
> I also wanted to get other's views on this.
>
> My opinion is that the current server configuration format
> (server.x=ip:port:port:role;ip:port) has run its course. There are multiple
> proposals for additions/changes to the server configuration,
> that would be simplified from having a more extensible format, such as a
> json blob, as proposed by Brian Nixon here:
> https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> It is true that such an extension hasn't happened yet, however it may not
> be a good idea to continue adding individual features to the existing
> format instead of making this change.
>
> For longer than a year, maybe more, I've seen features pushed out to 3.6 to
> avoid destabilizing the 3.5 release. If we follow the same logic here, this
> would be a 3.6 feature, so compatibility with the old format doesn't seem
> very important.
>
> What do others think ?

Thank you Ted and Alexander for working on this important topic, I am
following your work.

Alexander
"so compatibility with the old format doesn't seem very important"

btw we must support compatibility, upgrade path from 3.5 must be as
smooth as possible

my 2c

Enrico

>
>
> Thanks,
> Alex
>
>
>
> On Mon, Nov 12, 2018 at 11:47 PM Ted Dunning  wrote:
>
> > There is a JIRA live for the network resilience feature that I mentioned
> > previously.
> >
> > The design document
> > <
> > https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> > >
> > (also
> > copied into the JIRA) has essentially converged except for two points.
> >
> > These include:
> >
> > 1) Artem Chernatsky has pointed out an opportunity to factor our port sets
> > in the configuration syntax as well as an interesting interaction with the
> > existing behavior where the current servers already listen to the specified
> > ports on all NICs. This semantics of this interaction between configuration
> > options need to be specified rigorously, but this doesn't appear to impact
> > code complexity much, nor introduce any real difficulties.
> >
> > 2) Alex Shraer seems to feel that there is a strong interaction between
> > this
> > issue  and a
> > proposed
> > refactorization of the configuration file syntax (mentioned in a comment in
> > 3166, but apparently doesn't have an independent issue). In particular, he
> > seems to think that the syntax refactorization is a blocker for the network
> > resilience. My own feeling is that there is some interaction, but there is
> > no strong ordering between the two issues if the implementors of this issue
> > are willing to commit to supporting any consensus syntax change that is
> > adopted. Essentially, there can be an additional issue filed which is
> > blocked by both the syntax change issue and 3188 (network resilience) to
> > support any new syntax. The work for 3188 needs to support the old syntax
> > in any case so that we can backport changes to 3.4.
> >
> > Other open issues that are affected by configuration syntax change include
> > 2534 , 2531
> > , 195
> > , and 2225
> > . None of these has
> > any serious impact other than the fact that configuration needs to be
> > abstracted as part of any change. Some appear to be quite old and may have
> > already been solved or made moot.
> >
> > My own feeling is that pushing for this issue (3188) to include a change to
> > the configuration syntax as well as the core network resilience feature
> > proposed is an unacceptable increase in scope. I have filed a new tracking
> > issue  (3189)
> > capturing the intended rework after a change in configuration syntax, but I
> > can't find anywhere that the configuration change is captured in a issue to
> > add the dependency.
> >
> > I also see no particular way that configuration syntax change (as desirable
> > as it might be) blocks this feature.
> >
> > I would love to hear other opinions.
> >
> >
> > I, myself, think that the "support resilience under new syntax when
> > available" approach
> >


Re: question about approach to take

2018-11-13 Thread Alexander Shraer
I also wanted to get other's views on this.

My opinion is that the current server configuration format
(server.x=ip:port:port:role;ip:port) has run its course. There are multiple
proposals for additions/changes to the server configuration,
that would be simplified from having a more extensible format, such as a
json blob, as proposed by Brian Nixon here:
https://issues.apache.org/jira/browse/ZOOKEEPER-3166
It is true that such an extension hasn't happened yet, however it may not
be a good idea to continue adding individual features to the existing
format instead of making this change.

For longer than a year, maybe more, I've seen features pushed out to 3.6 to
avoid destabilizing the 3.5 release. If we follow the same logic here, this
would be a 3.6 feature, so compatibility with the old format doesn't seem
very important.

What do others think ?


Thanks,
Alex



On Mon, Nov 12, 2018 at 11:47 PM Ted Dunning  wrote:

> There is a JIRA live for the network resilience feature that I mentioned
> previously.
>
> The design document
> <
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> >
> (also
> copied into the JIRA) has essentially converged except for two points.
>
> These include:
>
> 1) Artem Chernatsky has pointed out an opportunity to factor our port sets
> in the configuration syntax as well as an interesting interaction with the
> existing behavior where the current servers already listen to the specified
> ports on all NICs. This semantics of this interaction between configuration
> options need to be specified rigorously, but this doesn't appear to impact
> code complexity much, nor introduce any real difficulties.
>
> 2) Alex Shraer seems to feel that there is a strong interaction between
> this
> issue  and a
> proposed
> refactorization of the configuration file syntax (mentioned in a comment in
> 3166, but apparently doesn't have an independent issue). In particular, he
> seems to think that the syntax refactorization is a blocker for the network
> resilience. My own feeling is that there is some interaction, but there is
> no strong ordering between the two issues if the implementors of this issue
> are willing to commit to supporting any consensus syntax change that is
> adopted. Essentially, there can be an additional issue filed which is
> blocked by both the syntax change issue and 3188 (network resilience) to
> support any new syntax. The work for 3188 needs to support the old syntax
> in any case so that we can backport changes to 3.4.
>
> Other open issues that are affected by configuration syntax change include
> 2534 , 2531
> , 195
> , and 2225
> . None of these has
> any serious impact other than the fact that configuration needs to be
> abstracted as part of any change. Some appear to be quite old and may have
> already been solved or made moot.
>
> My own feeling is that pushing for this issue (3188) to include a change to
> the configuration syntax as well as the core network resilience feature
> proposed is an unacceptable increase in scope. I have filed a new tracking
> issue  (3189)
> capturing the intended rework after a change in configuration syntax, but I
> can't find anywhere that the configuration change is captured in a issue to
> add the dependency.
>
> I also see no particular way that configuration syntax change (as desirable
> as it might be) blocks this feature.
>
> I would love to hear other opinions.
>
>
> I, myself, think that the "support resilience under new syntax when
> available" approach
>


[jira] [Commented] (ZOOKEEPER-3018) Ephemeral node not deleted after session is gone

2018-11-13 Thread Andor Molnar (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685352#comment-16685352
 ] 

Andor Molnar commented on ZOOKEEPER-3018:
-

[~danielchan] [~clouds xu] Clock skew?

> Ephemeral node not deleted after session is gone
> 
>
> Key: ZOOKEEPER-3018
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3018
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
> Environment: Linux 4.1.12-112.14.10.el6uek.x86_64 #2 SMP x86_64 
> GNU/Linux
>Reporter: Daniel C
>Priority: Major
> Attachments: zk-3018.zip
>
>
> We have a live Zookeeper environment (quorum size is 2) and observed a 
> strange behavior:
>  * Kafka created 2 ephemeral nodes /brokers/ids/822712429 and 
> /brokers/ids/707577499 on 2018-03-12 03:30:36.933
>  * The Kafka clients were long gone but as of today (20+ days after), the two 
> ephemeral nodes are still present
>  
> Troubleshooting:
> 1) Lists the outstanding sessions and ephemeral nodes
>  
> {noformat}
> $ echo dump | nc $SERVER1 2181
> SessionTracker dump:
> org.apache.zookeeper.server.quorum.LearnerSessionTracker@6d7fd863
> ephemeral nodes dump:
> Sessions with Ephemerals (2):
> 0x162183ea9f70003:
>    /brokers/ids/822712429
> 0x162183ea9f70002:
>    /brokers/ids/707577499
>    /controller
> {noformat}
>  
>  
> 2) stat on /brokers/ids/822712429
>  
> {noformat}
> zk> stat /brokers/ids/822712429
> czxid: 4294967344
> mzxid: 4294967344
> pzxid: 4294967344
> ctime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> mtime: 1520825436933 (2018-03-11T20:30:36.933-0700)
> version: 0
> cversion: 0
> aversion: 0
> owner: 99668799174148099
> datalen: 102
> children: 0
> {noformat}
>  
>  
> 3) List full connection/session details for all clients connected
>  
> {noformat}
> $ echo cons | nc $SERVER1 2181
>  /10.247.114.70:30401[0](queued=0,recved=1,sent=0)
>  
> /10.248.88.235:40430[1](queued=0,recved=345,sent=345,sid=0x162183ea9f70c22,lop=PING,est=1522713395028,to=4,lcxid=0x12,lzxid=0x,lresp=1522717802117,llat=0,minlat=0,avglat=0,maxlat=31)
> {noformat}
>  
>  
>  
> {noformat}
> $ echo cons | nc $SERVER2 2181
>  /10.196.18.61:28173[0](queued=0,recved=1,sent=0)
>  
> /10.247.114.69:42679[1](queued=0,recved=73800,sent=73800,sid=0x262183eaa21da96,lop=PING,est=1522651352906,to=9000,lcxid=0xe49f,lzxid=0x10004683d,lresp=1522717854847,llat=0,minlat=0,avglat=0,maxlat=1235)
> {noformat}
>  
>  
> 4) health
>  
> {noformat}
> $ echo mntr | nc $SERVER1 2181
> zk_version   3.4.6-1569965, built on 02/20/2014 09:09 GMT
> zk_avg_latency  0
> zk_max_latency 443
> zk_min_latency  0
> zk_packets_received   11158019
> zk_packets_sent   11158244
> zk_num_alive_connections   2
> zk_outstanding_requests  0
> zk_server_state follower
> zk_znode_count   344
> zk_watch_count   0
> zk_ephemerals_count 3
> zk_approximate_data_size  36654
> zk_open_file_descriptor_count   33
> zk_max_file_descriptor_count 65536
> {noformat}
>  
>  
> 5) Server logs with related sessions:
> {noformat}
> Only found these logs from Server1 related to the sessions (0x162183ea9f70002 
> and 0x162183ea9f70003):
> 2018-03-12 03:28:35,127 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.196.18.60:26775
> 2018-03-12 03:28:35,131 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.196.18.60:26775; will be dropped if server is in 
> r-o mode
> 2018-03-12 03:28:35,131 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.196.18.60:26775
> 2018-03-12 03:28:35,137 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x162183ea9f70002 with negotiated timeout 9000 for client /10.196.18.60:26775
>  
> 2018-03-12 03:30:36,415 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - 
> Accepted socket connection from /10.247.114.70:39260
> 2018-03-12 03:30:36,422 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection 
> request from old client /10.247.114.70:39260; will be dropped if server is in 
> r-o mode
> 2018-03-12 03:30:36,423 [myid:1] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client 
> attempting to establish new session at /10.247.114.70:39260
> 2018-03-12 03:30:36,428 [myid:1] - INFO  
> [CommitProcessor:1:ZooKeeperServer@617] - Established session 
> 0x162183ea9f70003 with negotiated timeout 9000 for client /10.247.114.70:39260
>  
> 

ZooKeeper_branch34_openjdk8 - Build # 121 - Failure

2018-11-13 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk8/121/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 43.45 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.767 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.615 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.677 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.614 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.075 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.756 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.948 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.909 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.957 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.969 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.723 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.663 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.673 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.931 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.088 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.454 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.501 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.099 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.741 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.078 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk8/build.xml:1408:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk8/build.xml:1411:
 Tests failed!

Total time: 40 minutes 44 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Recording test results
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testNewFollowerRestartAfterNewEpoch

Error Message:
Waiting too long

Stack Trace:
java.lang.RuntimeException: Waiting too long
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:449)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.LaunchServers(QuorumPeerMainTest.java:547)
at