Re: Zookeeper at Twitter by Micheal

2018-10-23 Thread Edward Ribeiro
Congrats, Michael,

Really cool blog post. :)

Em seg, 15 de out de 2018 05:26, Norbert Kalmar
 escreveu:

> Good blog post!
> Can't wait for the PRs ;)
> I'm very positive about ZooKeeper's future (heard lot of talks lately about
> etcd overthrowing ZooKeeper - no chance :) )
>
> Norbert
>
> On Sun, Oct 14, 2018 at 10:39 PM Andor Molnár  wrote:
>
> > Great stuff Michael!
> >
> >
> > Andor
> >
> >
> >
> > On 10/14/2018 12:26 AM, Enrico Olivelli wrote:
> > > Hi Michael,
> > > I just stepped into this very interesting post !
> > >
> > >
> >
> https://blog.twitter.com/engineering/en_us/topics/infrastructure/2018/zookeeper-at-twitter.html
> > >
> > > Thank you
> > > Enrico
> >
> >
>


Re: improving tolerance to network failures

2018-10-23 Thread Ted Dunning
There have been several comments on the document. I will be porting discussions 
from the document back to the mailing list each day.

Alex Shraer makes a good point that with the design as stated, there is no 
provision for dealing with the rebalancing of client connections during dynamic 
reconfiguration. I am very curious whether this needs to be addressed in the 
design since it seems that if connections are redirected, the same connection 
logic should apply. I suppose the text needs an update, regardless, even if 
there is no effect. But is there something I missed here? Will there be a code 
effect?

Another comment points out that if you don't have symmetrical hardware for the 
servers (i.e. more network interfaces on some), then client connections are 
likely to be more numerous on servers with more network connections. This is 
undoubtedly true.

I have a question, however, about this. Is this situation actually important 
enough to make the first version of this change? My own experience is that 
production settings typically involve Zookeeper servers with very consistent 
hardware where this would not be an issue.

What experience do others have, particularly in production situations?

On 2018/10/23 02:02:12, Ted Dunning  wrote: 
> ...
> I have started a collaborative document to work on the design approach.
> Once that is judged by the community to be sufficiently mature, I will move
> it to a JIRA.
> 
> That document is at
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> 
> The design document is currently open to the world for commenting so that
> anybody can suggest changes or ask questions. I will act as a bit of a
> moderator so that the document can remain completely open.
> 


Re: improving tolerance to network failures

2018-10-23 Thread Michael Han
>> Will there be a code effect?

There will be - the current rebalancing algorithm will be broken if no code
is done to StaticHostProvider.updateServerList to teach it aware of
multiple server addresses belong to the same server. For example, currently
if we add a new server through reconfig, the rebalance will kick in. In the
new proposal, if we add a new address to the existing server, if no code
change made to updateServerList, the rebalance will also kick in but it
should not, as in this case no new real servers are added.

>> My own experience is that production settings typically involve
Zookeeper servers with very consistent hardware where this would not be an
issue.

I think this is generally true, but we should consider cases where user is
upgrading hardware, which might take a while and during this time it would
be ideal if ZK offer the capability of balanced client connections across
ensemble with heterogeneous hardwares. As a user myself, I'd like to have
this feature, especially consider it seems not hard to implement. What Alex
proposed should work. Another approach might be to assign weights to each
address (a single server has weight one), and this will reduce to a
weighted random selection problem.

Overall, I think this proposal has little impact on server side, most
impact is on client side.


On Tue, Oct 23, 2018 at 9:34 AM Ted Dunning  wrote:

> There have been several comments on the document. I will be porting
> discussions from the document back to the mailing list each day.
>
> Alex Shraer makes a good point that with the design as stated, there is no
> provision for dealing with the rebalancing of client connections during
> dynamic reconfiguration. I am very curious whether this needs to be
> addressed in the design since it seems that if connections are redirected,
> the same connection logic should apply. I suppose the text needs an update,
> regardless, even if there is no effect. But is there something I missed
> here? Will there be a code effect?
>
> Another comment points out that if you don't have symmetrical hardware for
> the servers (i.e. more network interfaces on some), then client connections
> are likely to be more numerous on servers with more network connections.
> This is undoubtedly true.
>
> I have a question, however, about this. Is this situation actually
> important enough to make the first version of this change? My own
> experience is that production settings typically involve Zookeeper servers
> with very consistent hardware where this would not be an issue.
>
> What experience do others have, particularly in production situations?
>
> On 2018/10/23 02:02:12, Ted Dunning  wrote:
> > ...
> > I have started a collaborative document to work on the design approach.
> > Once that is judged by the community to be sufficiently mature, I will
> move
> > it to a JIRA.
> >
> > That document is at
> >
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> >
> > The design document is currently open to the world for commenting so that
> > anybody can suggest changes or ask questions. I will act as a bit of a
> > moderator so that the document can remain completely open.
> >
>


Re: improving tolerance to network failures

2018-10-23 Thread Ted Dunning
Michael,

I wouldn't characterize the current proposal as broken so much as it talks
about connection balancing rather than server balancing. Other than that, I
think I agree with what you are saying.

So we have two folks with a feeling that server balancing from the client
side is significantly better than connection balancing. I had thought that
this would be desirable to defer in the interest of code simplicity. That
may not be the right balance.

The point about hardware upgrades is a very good one.




On Tue, Oct 23, 2018 at 10:21 AM Michael Han  wrote:

> >> Will there be a code effect?
>
> There will be - the current rebalancing algorithm will be broken if no code
> is done to StaticHostProvider.updateServerList to teach it aware of
> multiple server addresses belong to the same server. For example, currently
> if we add a new server through reconfig, the rebalance will kick in. In the
> new proposal, if we add a new address to the existing server, if no code
> change made to updateServerList, the rebalance will also kick in but it
> should not, as in this case no new real servers are added.
>
> >> My own experience is that production settings typically involve
> Zookeeper servers with very consistent hardware where this would not be an
> issue.
>
> I think this is generally true, but we should consider cases where user is
> upgrading hardware, which might take a while and during this time it would
> be ideal if ZK offer the capability of balanced client connections across
> ensemble with heterogeneous hardwares. As a user myself, I'd like to have
> this feature, especially consider it seems not hard to implement. What Alex
> proposed should work. Another approach might be to assign weights to each
> address (a single server has weight one), and this will reduce to a
> weighted random selection problem.
>
> Overall, I think this proposal has little impact on server side, most
> impact is on client side.
>
>
> On Tue, Oct 23, 2018 at 9:34 AM Ted Dunning  wrote:
>
> > There have been several comments on the document. I will be porting
> > discussions from the document back to the mailing list each day.
> >
> > Alex Shraer makes a good point that with the design as stated, there is
> no
> > provision for dealing with the rebalancing of client connections during
> > dynamic reconfiguration. I am very curious whether this needs to be
> > addressed in the design since it seems that if connections are
> redirected,
> > the same connection logic should apply. I suppose the text needs an
> update,
> > regardless, even if there is no effect. But is there something I missed
> > here? Will there be a code effect?
> >
> > Another comment points out that if you don't have symmetrical hardware
> for
> > the servers (i.e. more network interfaces on some), then client
> connections
> > are likely to be more numerous on servers with more network connections.
> > This is undoubtedly true.
> >
> > I have a question, however, about this. Is this situation actually
> > important enough to make the first version of this change? My own
> > experience is that production settings typically involve Zookeeper
> servers
> > with very consistent hardware where this would not be an issue.
> >
> > What experience do others have, particularly in production situations?
> >
> > On 2018/10/23 02:02:12, Ted Dunning  wrote:
> > > ...
> > > I have started a collaborative document to work on the design approach.
> > > Once that is judged by the community to be sufficiently mature, I will
> > move
> > > it to a JIRA.
> > >
> > > That document is at
> > >
> >
> https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing
> > >
> > > The design document is currently open to the world for commenting so
> that
> > > anybody can suggest changes or ask questions. I will act as a bit of a
> > > moderator so that the document can remain completely open.
> > >
> >
>


[GitHub] zookeeper issue #665: [ZOOKEEPER-3163] Use session map in the Netty to impro...

2018-10-23 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/665
  
@maoling, we can port this back to 3.4 in the same Jira, I'll send out a PR 
separately for that.


---


[GitHub] zookeeper pull request #673: [ZOOKEEPER-3177] Refactor request throttle logi...

2018-10-23 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/673#discussion_r227532671
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java 
---
@@ -1107,6 +1102,19 @@ public void processPacket(ServerCnxn cnxn, 
ByteBuffer incomingBuffer) throws IOE
 BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
 RequestHeader h = new RequestHeader();
 h.deserialize(bia, "header");
+
+// Need to increase the outstanding request count first, otherwise
+// there might be a race condition that it enabled recv after
+// processing request and then disabled when check throttling.
+//
+// It changes the semantic a bit, since when check throttling it's
--- End diff --

@eolivelli I'll try to rephrase it, meanwhile please comment if you have 
any suggestion on how to rephrase this?


---


[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...

2018-10-23 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/632#discussion_r227532976
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/command/HashCommand.java
 ---
@@ -0,0 +1,49 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.command;
+
+import java.io.PrintWriter;
+import java.util.List;
+
+import org.apache.zookeeper.server.DataTree.ZxidDigest;
+import org.apache.zookeeper.server.ServerCnxn;
+
+/**
+ * Command used to dump the latest digest histories.
+ */
+public class HashCommand extends AbstractFourLetterCommand {
--- End diff --

That seems more consistent, will do.


---


[GitHub] zookeeper pull request #632: [ZOOKEEPER-3150] Add tree digest check and veri...

2018-10-23 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/632#discussion_r227533265
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/util/AdHash.java ---
@@ -0,0 +1,84 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.util;
+
+/**
+ * This incremental hash is used to keep track of the hash of
+ * the data tree to that we can quickly validate that things
+ * are in sync.
+ *
+ * See the excellent paper: A New Paradigm for collision-free hashing:
+ *   Incrementality at reduced cost,  M. Bellare and D. Micciancio
+ */
+public class AdHash {
--- End diff --

I'd like to keep it as is, since it's consistent with the name in the paper.


---


Re: [VOTE] Maven migration - separation of java files to server and client project?

2018-10-23 Thread Michael Han
+1 (for keep server, client and common together).

We can do client / server split separately. Interestingly, there was an
outstanding JIRA about the split, which we could use after maven build is
live:  https://issues.apache.org/jira/browse/ZOOKEEPER-233

On Tue, Oct 16, 2018 at 3:30 AM Andor Molnar  wrote:

> Thanks Norbert for taking care of this.
> No surprise here in a 10+ year old project.
>
> -1 (binding) for the separation
>
> Let’s keep server + client + common together for now. We can revisit this
> later, but the pro is that we keep the release artifact structure and not
> introducing breaking changes.
>
> Regards,
> Andor
>
>
>
> > On 2018. Oct 16., at 9:15, Enrico Olivelli  wrote:
> >
> > Yes,
> > I think it is NOT a good idea to go ahead with this separation.
> >
> > so -1 (non binding) from my side for now
> >
> > And your  patch is very good at demonstrating this.
> > We can't break compatibility in clients.
> >
> > We can move to Maven first and then re-think about separating client and
> server
> >
> > Enrico
> >
> > Il giorno lun 15 ott 2018 alle ore 23:55 Norbert Kalmar
> >  ha scritto:
> >>
> >> Sorry, I linked the document instead of the PR. I wanted to link the
> >> document at the beginning of the letter after "It was said here"
> >>
> >> The PR:
> >> https://github.com/apache/zookeeper/pull/670
> >>
> >> Norbert
> >>
> >> On Mon, Oct 15, 2018 at 11:49 PM Norbert Kalmar 
> >> wrote:
> >>
> >>> Hi community!
> >>>
> >>> As outlined in the start document, it was planned to separate the java
> >>> files to server and client, with common files in a separate common
> module.
> >>> It was said here:
> >>>
> >>> "Fifth iteration - move src/java/main to zk-server, which will be
> further
> >>> separated in Phase 2."
> >>>
> >>> But in order to save rebase for the contributors, I merged this into
> one
> >>> step. (I had a letter about it)
> >>> So I already created zookeeper-server, zookeeper-client and
> >>> zookeeper-common.
> >>>
> >>> But after doing the separation, I have to say... this just hardly makes
> >>> any sense.
> >>> Without breaking backward compatibility by making changes in the
> package
> >>> structure, it just makes the code more unreadable than before. Multiple
> >>> packages has to be present in all 3 modules (as it was never an
> intention
> >>> to separate it, so many classes are just not in their logical package,
> and
> >>> even inner classes used when top level would be required for the
> >>> separation). Client and server code can not be divided to only depend
> on
> >>> common. Either server depends on client - which makes more sense than
> the
> >>> other option - or client depend on server.
> >>> (Or make common so fat, only literally a few class remains in server
> and
> >>> client - which again, makes no sense).
> >>>
> >>> I created a pull request to illustrate what needs to be done, and this
> is
> >>> not even half complete:
> >>>
> >>>
> https://docs.google.com/document/d/1WXqhaPlCwchcWc8RCEzbCmVa4WbBDlfR3GQngikGjqc/edit?usp=sharing
> >>>
> >>> Some more detail in the description.
> >>>
> >>> My suggestion:
> >>> forget about zookeeper-client-java and zookeeper-common, and just leave
> >>> zookeeper-server.
> >>>
> >>> It just doesn't make any sense looking at the result, only makes the
> >>> project much more complicated. The java code is too much tangled
> together.
> >>>
> >>> What would this mean if I just create zookeeper-common? All the files
> had
> >>> to be renamed anyway, so some now would have 2 renames (fortunately
> most of
> >>> the files are in zookeeper-server anyway), and possible another rebase
> for
> >>> some PR's.
> >>>
> >>> Any input is appreciated.
> >>>
> >>> Regards,
> >>> Norbert
> >>>
> >>>
> >>>
> >>>
>
>


[GitHub] zookeeper issue #300: ZOOKEEPER-2807: Flaky test: org.apache.zookeeper.test....

2018-10-23 Thread lavacat
Github user lavacat commented on the issue:

https://github.com/apache/zookeeper/pull/300
  
@anmolnar applied the patch to latest master and run tests 10 times with 8 
threads. Original error in testNodeDataChanged is gone, but it failed 4 times 
with
2018-10-23 09:37:31,566 [myid:] - INFO  
[main:JUnit4ZKTestRunner$LoggedInvokeMethod@98] - TEST METHOD FAILED 
testNodeDataChanged
java.util.concurrent.TimeoutException: Failed to connect to ZooKeeper 
server.
at 
org.apache.zookeeper.test.ClientBase$CountdownWatcher.waitForConnected(ClientBase.java:151)
at 
org.apache.zookeeper.test.WatchEventWhenAutoResetTest.testNodeDataChanged(WatchEventWhenAutoResetTest.java:116)

I'll investigate more


---


[GitHub] zookeeper pull request #673: [ZOOKEEPER-3177] Refactor request throttle logi...

2018-10-23 Thread eolivelli
Github user eolivelli commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/673#discussion_r227557275
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java 
---
@@ -1107,6 +1102,19 @@ public void processPacket(ServerCnxn cnxn, 
ByteBuffer incomingBuffer) throws IOE
 BinaryInputArchive bia = BinaryInputArchive.getArchive(bais);
 RequestHeader h = new RequestHeader();
 h.deserialize(bia, "header");
+
+// Need to increase the outstanding request count first, otherwise
+// there might be a race condition that it enabled recv after
+// processing request and then disabled when check throttling.
+//
+// It changes the semantic a bit, since when check throttling it's
--- End diff --

Something simpler, without comparing current code with the old one.

Like:
Beware that we are actually checking the global outstanding request before 
this
request.

How does it sound to you?


---


[GitHub] zookeeper issue #628: ZOOKEEPER-3140: Allow Followers to host Observers

2018-10-23 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/628
  
@anmolnar : I see that you closed #660 without merging. Given that we're 
guessing it is the cause for the remaining test failures of this PR, is there 
something that I can do to to help address ZOOKEEPER-2320?


---


[GitHub] zookeeper issue #628: ZOOKEEPER-3140: Allow Followers to host Observers

2018-10-23 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/628
  
I am looking at 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2486/console, 
it was not clear to me which C test failed, aside from the ``./zktest-mt': 
free(): invalid pointer: 0x2b00d6385000` in the end. Note from the console 
log, it has:
`[exec] Zookeeper_simpleSystem::testRemoveWatchers ZooKeeper server started 
: elapsed 4610 : OK`, so this might be a different failure that @anmolnar was 
fixing.

Do we know which C test case is failing here?


---


[GitHub] zookeeper issue #628: ZOOKEEPER-3140: Allow Followers to host Observers

2018-10-23 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/628
  
>> This patch does not touch the c client or the default configurations for 
those tests so I'm unsure how to proceed.

My feeling is the failure is a flaky test, and has nothing to do with this 
patch. Though, it would be good if we can identify the exact failing test case, 
and rule out the possibility that it's caused by this patch (since C client 
depends on same java server code.).

Also, sorry for lagging on following up my previous review. I am resuming 
reviewing this patch this week.



---


[jira] [Commented] (ZOOKEEPER-3179) Add snapshot compression to reduce the disk IO

2018-10-23 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661432#comment-16661432
 ] 

Michael Han commented on ZOOKEEPER-3179:


Good feature. 

We can also consider provide the option to offload compression / decompression 
to dedicated hardware - e.g. FPGA. 

> Add snapshot compression to reduce the disk IO
> --
>
> Key: ZOOKEEPER-3179
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3179
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Fangmin Lv
>Priority: Major
> Fix For: 3.6.0
>
>
> When the snapshot becomes larger, the periodically snapshot after certain 
> number of txns will be more expensive. Which will in turn affect the maximum 
> throughput we can support within SLA, because of the disk contention between 
> snapshot and txn when they're on the same drive.
>  
> With compression like zstd/snappy/gzip, the actual snapshot size could be 
> much smaller, the compress ratio depends on the actual data. It might make 
> the recovery time (loading from disk) faster in some cases, but will take 
> longer sometimes because of the extra time used to compress/decompress.
>  
> Based on the production traffic, the performance various with different 
> compress method as well, that's why we provided different implementations, we 
> can select different compress method for different use cases.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2018-10-23 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661452#comment-16661452
 ] 

Michael Han commented on ZOOKEEPER-3180:


What will be we caching here? Is it the byte buffers that holding the 
(serialized) response body that going to write out to socket?

> Add response cache to improve the throughput of read heavy traffic 
> ---
>
> Key: ZOOKEEPER-3180
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Priority: Minor
> Fix For: 3.6.0
>
>
> On read heavy use case with large response data size, the serialization of 
> response takes time and added overhead to the GC.
> Add response cache helps improving the throughput we can support, which also 
> reduces the latency in general.
> This Jira is going to implement a LRU cache for the response, which shows 
> some performance gain on some of our production ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #665: [ZOOKEEPER-3163] Use session map in the Netty t...

2018-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/665


---


[jira] [Resolved] (ZOOKEEPER-3163) Use session map to improve the performance when closing session in Netty

2018-10-23 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3163.

Resolution: Fixed

Issue resolved by pull request 665
[https://github.com/apache/zookeeper/pull/665]

> Use session map to improve the performance when closing session in Netty
> 
>
> Key: ZOOKEEPER-3163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3163
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Previously, it needs to go through all the cnxns to find out the session to 
> close, which is O(N), N is the total connections we have.
> This will affect the performance of close session or renew session if there 
> are lots of connections on this server, this JIRA is going to reuse the 
> session map code in NIO implementation to improve the performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #642: ZOOKEEPER-3151: test Jenkins. Don't merge.

2018-10-23 Thread hanm
Github user hanm closed the pull request at:

https://github.com/apache/zookeeper/pull/642


---


ZooKeeper-trunk - Build # 245 - Still Failing

2018-10-23 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper-trunk/245/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 334.69 KB...]
 [exec]  : elapsed 1001 : OK
 [exec] Zookeeper_simpleSystem::testLogCallbackClearLog Message Received: 
[2018-10-24 02:19:56,718:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1080: Client 
environment:zookeeper.version=zookeeper C client 3.6.0]
 [exec] Log Message Received: [2018-10-24 
02:19:56,718:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1084: Client 
environment:host.name=asf910.gq1.ygridcore.net]
 [exec] Log Message Received: [2018-10-24 
02:19:56,718:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1091: Client 
environment:os.name=Linux]
 [exec] Log Message Received: [2018-10-24 
02:19:56,718:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1092: Client 
environment:os.arch=3.13.0-153-generic]
 [exec] Log Message Received: [2018-10-24 
02:19:56,718:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1093: Client 
environment:os.version=#203-Ubuntu SMP Thu Jun 14 08:52:28 UTC 2018]
 [exec] Log Message Received: [2018-10-24 
02:19:56,718:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1101: Client 
environment:user.name=jenkins]
 [exec] Log Message Received: [2018-10-24 
02:19:56,719:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1109: Client 
environment:user.home=/home/jenkins]
 [exec] Log Message Received: [2018-10-24 
02:19:56,719:11168(0x2b9a3ec45f40):ZOO_INFO@log_env@1121: Client 
environment:user.dir=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit]
 [exec] Log Message Received: [2018-10-24 
02:19:56,719:11168(0x2b9a3ec45f40):ZOO_INFO@zookeeper_init_internal@1167: 
Initiating client connection, host=127.0.0.1:22181 sessionTimeout=1 
watcher=0x4639e0 sessionId=0 sessionPasswd= context=0x7ffc4bd556a0 
flags=0]
 [exec] Log Message Received: [2018-10-24 
02:19:56,719:11168(0x2b9a40ca8700):ZOO_INFO@check_events@2454: initiated 
connection to server 127.0.0.1:22181]
 [exec] Log Message Received: [2018-10-24 
02:19:56,742:11168(0x2b9a40ca8700):ZOO_INFO@check_events@2506: session 
establishment complete on server 127.0.0.1:22181, sessionId=0x10225d7e5ee000f, 
negotiated timeout=1 ]
 [exec]  : elapsed 1001 : OK
 [exec] Zookeeper_simpleSystem::testAsyncWatcherAutoReset ZooKeeper server 
started : elapsed 10495 : OK
 [exec] Zookeeper_simpleSystem::testDeserializeString : elapsed 0 : OK
 [exec] Zookeeper_simpleSystem::testFirstServerDown : elapsed 1001 : OK
 [exec] Zookeeper_simpleSystem::testNonexistentHost : elapsed 1137 : OK
 [exec] Zookeeper_simpleSystem::testNullData : elapsed 1042 : OK
 [exec] Zookeeper_simpleSystem::testIPV6 : elapsed 1022 : OK
 [exec] Zookeeper_simpleSystem::testCreate : elapsed 1015 : OK
 [exec] Zookeeper_simpleSystem::testPath : elapsed 1058 : OK
 [exec] Zookeeper_simpleSystem::testPathValidation : elapsed 1150 : OK
 [exec] Zookeeper_simpleSystem::testPing : elapsed 17669 : OK
 [exec] Zookeeper_simpleSystem::testAcl : elapsed 1016 : OK
 [exec] Zookeeper_simpleSystem::testChroot : elapsed 3090 : OK
 [exec] Zookeeper_simpleSystem::testAuth ZooKeeper server started ZooKeeper 
server started : elapsed 31045 : OK
 [exec] Zookeeper_simpleSystem::testHangingClient : elapsed 1067 : OK
 [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithGlobal ZooKeeper 
server started ZooKeeper server started ZooKeeper server started : elapsed 
15726 : OK
 [exec] Zookeeper_simpleSystem::testWatcherAutoResetWithLocal ZooKeeper 
server started ZooKeeper server started ZooKeeper server started : elapsed 
15655 : OK
 [exec] Zookeeper_simpleSystem::testGetChildren2 : elapsed 1071 : OK
 [exec] Zookeeper_simpleSystem::testLastZxid : elapsed 4537 : OK
 [exec] Zookeeper_simpleSystem::testRemoveWatchers ZooKeeper server started 
: elapsed 4698 : OK
 [exec] Zookeeper_readOnly::testReadOnly : elapsed 4126 : OK
 [exec] Zookeeper_logClientEnv::testLogClientEnv : elapsed 1 : OK
 [exec] OK (76)
 [exec] FAIL: zktest-mt
 [exec] ==
 [exec] 1 of 2 tests failed
 [exec] Please report to u...@zookeeper.apache.org
 [exec] ==
 [exec] make[1]: Leaving directory 
`/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build/test/test-cppunit'
 [exec] *** Error in `./zktest-mt': free(): invalid pointer: 
0x2b9a3ec31000 ***
 [exec] /bin/bash: line 5: 11168 Aborted 
ZKROOT=/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/zookeeper-client/zookeeper-client-c/../..
 CLASSPATH=$CLASSPATH:$CLOVER_HOME/lib/clover.jar ${dir}$tst
 [exec] make[1]: *** [check-TESTS] Error 1
 [exec] make: *** [check-am] Error 2

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper-trunk/build.xml:1490: The 
following error occurred while 

[GitHub] zookeeper issue #628: ZOOKEEPER-3140: Allow Followers to host Observers

2018-10-23 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/628
  
The tests that fail are not entirely consistent. I've tried disabling 
TestLogClientEnv.cc, TestReadOnlyClient.cc, TestReconfigServer.cc, and 
TestWatchers.cc and this seems to work some of the time.


---


[jira] [Commented] (ZOOKEEPER-3181) ZOOKEEPER-2355 broke Curator TestingQuorumPeerMain

2018-10-23 Thread Akira Ajisaka (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661697#comment-16661697
 ] 

Akira Ajisaka commented on ZOOKEEPER-3181:
--

Recently Apache Hadoop upgraded ZooKeeper from 3.4.8 to 3.4.13 due to security 
concern (HADOOP-15816). And then ZK tests fail because Hadoop is using Curator 
2.12.0 (YARN-8937).

> ZOOKEEPER-2355 broke Curator TestingQuorumPeerMain
> --
>
> Key: ZOOKEEPER-3181
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3181
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Akira Ajisaka
>Priority: Major
>
> ZOOKEEPER-2355 added a getQuorumPeer method to QuorumPeerMain 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java#L194].
>  TestingQuorumPeerMain has an identically named method, which is now 
> unintentionally overridding the one in the base class.
> This is fixed by CURATOR-409, however, I'd like this to be fixed in ZooKeeper 
> as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3163) Use session map to improve the performance when closing session in Netty

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661701#comment-16661701
 ] 

Hudson commented on ZOOKEEPER-3163:
---

FAILURE: Integrated in Jenkins build Zookeeper-trunk-single-thread #72 (See 
[https://builds.apache.org/job/Zookeeper-trunk-single-thread/72/])
ZOOKEEPER-3163: Use session map in the Netty to improve close session (hanm: 
rev 1ce2ca8107438d283581d18d064a25bd6b74adf7)
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerCnxnFactory.java


> Use session map to improve the performance when closing session in Netty
> 
>
> Key: ZOOKEEPER-3163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3163
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Previously, it needs to go through all the cnxns to find out the session to 
> close, which is O(N), N is the total connections we have.
> This will affect the performance of close session or renew session if there 
> are lots of connections on this server, this JIRA is going to reuse the 
> session map code in NIO implementation to improve the performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3181) ZOOKEEPER-2355 broke Curator TestingQuorumPeerMain

2018-10-23 Thread Akira Ajisaka (JIRA)
Akira Ajisaka created ZOOKEEPER-3181:


 Summary: ZOOKEEPER-2355 broke Curator TestingQuorumPeerMain
 Key: ZOOKEEPER-3181
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3181
 Project: ZooKeeper
  Issue Type: Bug
Affects Versions: 3.4.11, 3.5.3
Reporter: Akira Ajisaka


ZOOKEEPER-2355 added a getQuorumPeer method to QuorumPeerMain 
[https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/quorum/QuorumPeerMain.java#L194].
 TestingQuorumPeerMain has an identically named method, which is now 
unintentionally overridding the one in the base class.

This is fixed by CURATOR-409, however, I'd like this to be fixed in ZooKeeper 
as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3163) Use session map to improve the performance when closing session in Netty

2018-10-23 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661753#comment-16661753
 ] 

Hudson commented on ZOOKEEPER-3163:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #246 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/246/])
ZOOKEEPER-3163: Use session map in the Netty to improve close session (hanm: 
rev 1ce2ca8107438d283581d18d064a25bd6b74adf7)
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxn.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxnFactory.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NettyServerCnxnFactory.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerCnxnFactory.java


> Use session map to improve the performance when closing session in Netty
> 
>
> Key: ZOOKEEPER-3163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3163
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Previously, it needs to go through all the cnxns to find out the session to 
> close, which is O(N), N is the total connections we have.
> This will affect the performance of close session or renew session if there 
> are lots of connections on this server, this JIRA is going to reuse the 
> session map code in NIO implementation to improve the performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)