[jira] [Commented] (ZOOKEEPER-3106) Zookeeper client supports IPv6 address and document the "IPV6 feature"

2018-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605227#comment-16605227
 ] 

Hadoop QA commented on ZOOKEEPER-3106:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2124//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2124//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2124//console

This message is automatically generated.

> Zookeeper client supports IPv6 address and document the "IPV6 feature"
> --
>
> Key: ZOOKEEPER-3106
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3106
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation, java client
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> This issue is the follow-up work of 
> [ZOOKEEPER-3057|https://issues.apache.org/jira/browse/ZOOKEEPER-3057]
> 1.ZK server side supports ipv6 style like this: 
> server.1=[2001:db8:1::242:ac11:2]:2888:3888,but zk client side supports ipv6 
> like this:2001:db8:1::242:ac11:2:2181.we need unify them.
>  Look at the kafka example 
> [KAFKA-1123|https://issues.apache.org/jira/browse/KAFKA-1123]. its producer 
> client also supports ipv6 like this: [2001:db8:1::242:ac11:2]
> 2.document the "IPV6 feature" to let user know.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3090) change continue to break

2018-09-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605072#comment-16605072
 ] 

Hudson commented on ZOOKEEPER-3090:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #178 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/178/])
ZOOKEEPER-3090: continue can be replaced with break (hanm: rev 
0b65e3d4cbb7647ea692bfea59c02e5b24bf821c)
* (edit) src/java/main/org/apache/zookeeper/server/persistence/FileTxnLog.java


> change continue to break
> 
>
> Key: ZOOKEEPER-3090
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3090
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: zhangbo
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> it's useful and enough to change continue to break,especially when call 
> getLogFiles(logDir.listFiles(), 0)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #580: ZOOKEEPER-3098: Add additional server metrics

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/580
  
And yes like others commented, please rebase and resolve conflicts, and 
reply to previous unattended comments. My previous comments need address are:

* Existing unit test failures and one findbug issue.
* Are there more types of metrics going to be added, such as Gauge? 
* Would be good to have some tests around the system.


---


[GitHub] zookeeper issue #580: ZOOKEEPER-3098: Add additional server metrics

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/580
  
One question about gathering metrics using this system: let's say I have a 
use case where I want to gather the number of requests processed by 
FinalRequestProcessor for each request type (e.g. Ping, CreateSession, and 
other OpCode types). How to do this using this metrics system? I think I have 
to add, for each OpCode, a new enum type in ServerMetrics, is this correct? 


---


[GitHub] zookeeper issue #580: ZOOKEEPER-3098: Add additional server metrics

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/580
  
We at Twitter manage large ZK clusters and we also have our own internal 
metrics system (you can't really operate things like ZK without metrics at 
large scale.). Our design is similar like this in terms of metrics collection 
and code instrumentation, but we coupled the metrics collection with 
publication. We are publishing metrics through Finagle (Twitter's RPC library) 
and the tight coupling caused several issues, such as circular dependencies 
between Finagle and ZooKeeper, where ZK depends on Finagle for metrics, and 
Finagle depends on ZK for ServerSets (naming resolution). 

I think the current design proposed in the community would solve this 
problem. Basically we will have two systems:
* The ZooKeeper metrics system for metrics collection.
* The pluggable system @eolivelli is working on is the backends for the 
metrics.

So IMO both work are sort of orthogonal and can be done in parallel. And I 
agree with the design decisions @jtuple proposed and @eolivelli commented 
previously.

Regarding use an external metrics type system: I think it would be more 
flexible for us to define and own the metrics definitions so it's easy to add / 
change the system without depends on third parties. 


---


[GitHub] zookeeper issue #612: [ZOOKEEPER-3131] Remove watcher when session closed in...

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/612
  
Close and reopen the pull request could also trigger a build, which might 
be more convenient than amending the commit.

For committers or those who have admin access to apache JIRA, just find the 
job associated with the PR, and click the "Rebuild" button.



---


[GitHub] zookeeper pull request #573: ZOOKEEPER-3090 continue can be replaced with br...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/573


---


[GitHub] zookeeper issue #615: ZOOKEEPER-3137: add a utility to truncate logs to a zx...

2018-09-05 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/615
  
@anmolnar , you can find a trial integration of the chop function with 
TxnLogToolkit at 
https://github.com/enixon/zookeeper/commit/b71ce04f70873f590c6cf5c24a691a1f1af82f48
 . It's not the cleanest integration since there's not much direct overlap 
between the dump method and the chop method but I think we can make some useful 
decisions here that will help centralize work on the tool for future patches.

Let me know what you think!


---


[GitHub] zookeeper issue #573: ZOOKEEPER-3090 continue can be replaced with break

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/573
  
I manually triggered a build and it's green now 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2122/ 

Github integration still broken for this PR for some reasons. This has been 
reported before, but i don't know why it's broken. 

I am committing this as it's pretty obvious and reviewed by multiple folks 
already.


---


[GitHub] zookeeper issue #615: ZOOKEEPER-3137: add a utility to truncate logs to a zx...

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/615
  
Integrate into TxnLogToolkit is a good idea. I did not mention it as I 
thought that might require a separate JIRA, because LogFormatter can also be 
integrated (it's the least powerful tool of the three) and then deprecated, and 
also the current place of TxnLogToolkit is a little bit weird, IMO server/util 
makes more sense than server/persistence. 


---


[jira] [Commented] (ZOOKEEPER-3137) add a utility to truncate logs to a zxid

2018-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605042#comment-16605042
 ] 

Hadoop QA commented on ZOOKEEPER-3137:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123//console

This message is automatically generated.

> add a utility to truncate logs to a zxid
> 
>
> Key: ZOOKEEPER-3137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3137
> Project: ZooKeeper
>  Issue Type: New Feature
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Add a utility that allows an admin to truncate a given transaction log to a 
> specified zxid. This can be similar to the existent LogFormatter. 
> Among the benefits, this allows an admin to put together a point-in-time view 
> of a data tree by manually mutating files from a saved backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #615: ZOOKEEPER-3137: add a utility to truncate logs to a zx...

2018-09-05 Thread enixon
Github user enixon commented on the issue:

https://github.com/apache/zookeeper/pull/615
  
@hanm, I've made your recommended changes with the exception noted in my 
comment. 

@anmolnar , looking at TxnLogToolkit now - I think this should be easy to 
integrate  :)


---


[GitHub] zookeeper pull request #615: ZOOKEEPER-3137: add a utility to truncate logs ...

2018-09-05 Thread enixon
Github user enixon commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/615#discussion_r215446231
  
--- Diff: src/java/main/org/apache/zookeeper/server/util/LogChopper.java ---
@@ -0,0 +1,152 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.zookeeper.server.util;
+
+import org.apache.jute.BinaryInputArchive;
+import org.apache.jute.BinaryOutputArchive;
+import org.apache.jute.Record;
+import org.apache.zookeeper.server.persistence.FileHeader;
+import org.apache.zookeeper.server.persistence.FileTxnLog;
+import org.apache.zookeeper.txn.TxnHeader;
+
+import java.io.BufferedInputStream;
+import java.io.BufferedOutputStream;
+import java.io.EOFException;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.zip.Adler32;
+import java.util.zip.Checksum;
+
+/**
+ * this class will chop the log at the specified zxid
+ */
+public class LogChopper {
+public static void main(String args[]) {
+if (args.length != 3) {
+System.out.println("Usage: LogChopper zxid_to_chop_to 
txn_log_to_chop chopped_filename");
+System.out.println("this program will read the 
txn_log_to_chop file and copy all the transactions");
+System.out.println("from it up to (and including) the 
given zxid into chopped_filename.");
+System.exit(1);
+}
+long zxid = Long.decode(args[0]);
+String txnLog = args[1];
+String choppedLog = args[2];
+
+int rc = 2;
+try (
+InputStream is = new BufferedInputStream(new 
FileInputStream(txnLog));
+OutputStream os = new BufferedOutputStream(new 
FileOutputStream(choppedLog))
+) {
+if (chop(is, os, zxid)) {
+rc = 0;
+}
+} catch (Exception e) {
+System.out.println("Got exception: " + e.getMessage());
+}
+System.exit(rc);
+}
+
+public static boolean chop(InputStream is, OutputStream os, long zxid) 
throws IOException {
+BinaryInputArchive logStream = BinaryInputArchive.getArchive(is);
+BinaryOutputArchive choppedStream = 
BinaryOutputArchive.getArchive(os);
+FileHeader fhdr = new FileHeader();
+fhdr.deserialize(logStream, "fileheader");
+
+if (fhdr.getMagic() != FileTxnLog.TXNLOG_MAGIC) {
+System.err.println("Invalid magic number in txn log file");
+return false;
+}
+System.out.println("ZooKeeper Transactional Log File with dbid "
++ fhdr.getDbid() + " txnlog format version "
++ fhdr.getVersion());
+
+fhdr.serialize(choppedStream, "fileheader");
+int count = 0;
+boolean hasZxid = false;
+long previousZxid = -1;
+while (true) {
+long crcValue;
+byte[] bytes;
+try {
+crcValue = logStream.readLong("crcvalue");
+
+bytes = logStream.readBuffer("txnEntry");
+} catch (EOFException e) {
+System.out.println("EOF reached after " + count + " 
txns.");
+// returning false because nothing was chopped
+return false;
+}
+if (bytes.length == 0) {
+// Since we preallocate, we define EOF to be an
+// empty transaction
+System.out.println("EOF reached after " + count + " 
txns.");
+// returning false because nothing was chopped
+return false;
+}
+
+Checksum crc = new Adler32();
+crc.update(bytes, 0, bytes.length);
+if 

[jira] [Commented] (ZOOKEEPER-3127) Fixing potential data inconsistency due to update last processed zxid with partial multi-op txn

2018-09-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605011#comment-16605011
 ] 

Hudson commented on ZOOKEEPER-3127:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #177 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/177/])
ZOOKEEPER-3127: Fixing potential data inconsistency due to update last (hanm: 
rev e501d9cc67fbaa6e825292fd838711259b6c9789)
* (edit) src/java/main/org/apache/zookeeper/server/ZKDatabase.java
* (edit) src/java/main/org/apache/zookeeper/server/DataTree.java
* (edit) 
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java
* (add) 
src/java/test/org/apache/zookeeper/server/quorum/FuzzySnapshotRelatedTest.java


> Fixing potential data inconsistency due to update last processed zxid with 
> partial multi-op txn
> ---
>
> Key: ZOOKEEPER-3127
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3127
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Found this issue while checking the code for another issue, this is a 
> relatively rare case which we haven't seen it on prod so far.
> Currently, the lastProcessedZxid is updated when applying the first txn of 
> multi-op, if there is a snapshot in progress, it's possible that the zxid 
> associated with the snapshot only include partial of the multi op.
> When loading snapshot, it will only load the txns after the zxid associated 
> with snapshot file, which could data inconsistency due to missing sub txns.
> To avoid this, we only update the lastProcessedZxid when the whole multi-op 
> txn is applied to DataTree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3129) Improve ZK Client resiliency by throwing a jute.maxbuffer size exception before sending a request to server

2018-09-05 Thread Karan Mehta (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16605005#comment-16605005
 ] 

Karan Mehta commented on ZOOKEEPER-3129:


Thoughts anyone?

> Improve ZK Client resiliency by throwing a jute.maxbuffer size exception 
> before sending a request to server
> ---
>
> Key: ZOOKEEPER-3129
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3129
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Priority: Major
>
> Zookeeper is mostly operated in controlled environments and the client/server 
> properties are usually known. With this Jira, I would like to propose a new 
> property on client side that represents the max jute buffer size server is 
> going to accept.
> On the ZKClient, in case of multi Op, the request is serialized and hence we 
> know the size of complete packet that will be sent. We can use this new 
> property to determine if the we are exceeding the limit and throw some form 
> of KeeperException. This would be fail fast mechanism and the application can 
> potentially retry by chunking up the request or serializing.
> Since the same properties are now present in two locations, over time, two 
> possibilities can happen.
> -- Server jutebuffer accepts value is more than what is specified on client 
> side
> The application might end up serializing it or zkclient can be made 
> configurable to retry even when it gets this exception
> -- Server jutebuffer accepts value is lower than what is specified on client 
> side
> That would have failed previously as well, so there is no change in behavior
> This would help silent failures like HBASE-18549 getting avoided. 
> Thoughts [~apurtell] [~xucang] [~anmolnar] [~hanm]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #612: [ZOOKEEPER-3131] Remove watcher when session closed in...

2018-09-05 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/612
  
Is there any easy way to trigger the build? It's a bit painful to doing 
trigger this by adding empty commit.


---


[jira] [Resolved] (ZOOKEEPER-3111) Add socket buffer size option to tune the TCP throughput between leader and learner

2018-09-05 Thread Fangmin Lv (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangmin Lv resolved ZOOKEEPER-3111.
---
Resolution: Not A Problem

> Add socket buffer size option to tune the TCP throughput between leader and 
> learner
> ---
>
> Key: ZOOKEEPER-3111
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3111
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Add the socket setting to let us able to tune the TCP receive and send window 
> size to improve the throughput during network congestion or transferring 
> large snapshot size during syncing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3111) Add socket buffer size option to tune the TCP throughput between leader and learner

2018-09-05 Thread Fangmin Lv (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604962#comment-16604962
 ] 

Fangmin Lv commented on ZOOKEEPER-3111:
---

Commented in the PR, also mention it here for easier reference:

The performance of different buffer size is highly depending on the traffic 
pattern, and may actually introduce higher latency, so it's hard to give a 
suggested buffer size.

Internally, we're not customizing the buffer size with this option anymore, 
instead we're using the system TCP auto-tuning options net.ipv4.tcp_wmem and 
net.ipv4.tcp_rmem, which are more flexible and will automatically adjusts 
socket buffer sizes as needed to optimally balance TCP performance and memory 
usage.

Since it's hard to configure correct, we decided to deprecate this patch.

> Add socket buffer size option to tune the TCP throughput between leader and 
> learner
> ---
>
> Key: ZOOKEEPER-3111
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3111
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Add the socket setting to let us able to tune the TCP receive and send window 
> size to improve the throughput during network congestion or transferring 
> large snapshot size during syncing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #593: [ZOOKEEPER-3111] Add socket buffer size option ...

2018-09-05 Thread lvfangmin
Github user lvfangmin closed the pull request at:

https://github.com/apache/zookeeper/pull/593


---


[GitHub] zookeeper issue #606: [ZOOKEEPER-3127] Fixing potential data inconsistency d...

2018-09-05 Thread hanm
Github user hanm commented on the issue:

https://github.com/apache/zookeeper/pull/606
  
committed to master @lvfangmin @breed 


---


[jira] [Resolved] (ZOOKEEPER-3127) Fixing potential data inconsistency due to update last processed zxid with partial multi-op txn

2018-09-05 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3127.

   Resolution: Fixed
Fix Version/s: (was: 3.4.14)
   (was: 3.5.5)

Issue resolved by pull request 606
[https://github.com/apache/zookeeper/pull/606]

> Fixing potential data inconsistency due to update last processed zxid with 
> partial multi-op txn
> ---
>
> Key: ZOOKEEPER-3127
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3127
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Found this issue while checking the code for another issue, this is a 
> relatively rare case which we haven't seen it on prod so far.
> Currently, the lastProcessedZxid is updated when applying the first txn of 
> multi-op, if there is a snapshot in progress, it's possible that the zxid 
> associated with the snapshot only include partial of the multi op.
> When loading snapshot, it will only load the txns after the zxid associated 
> with snapshot file, which could data inconsistency due to missing sub txns.
> To avoid this, we only update the lastProcessedZxid when the whole multi-op 
> txn is applied to DataTree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #606: [ZOOKEEPER-3127] Fixing potential data inconsis...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/606


---


[GitHub] zookeeper issue #590: [ZOOKEEPER-1177] Add the memory optimized watch manage...

2018-09-05 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/590
  
Thanks @nkalmar for the suggestion of bench project position, I was 
following the same directory as the src/java/systest for now, do you think we 
can move them together later? I'm not against to move it now.

If we want to move now, just to confirm the directory is 
src/java/org/apache/zookeeper/bench/ without main folder, right?


---


[GitHub] zookeeper issue #593: [ZOOKEEPER-3111] Add socket buffer size option to tune...

2018-09-05 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/593
  
@maoling, sorry for the lately reply, I forgot to reply this PR. 

We've done some benchmark when we added this code a year ago, I was trying 
to find some benchmark data at that point, but I cannot find it yet.

I've done some benchmark internally today, and found the performance of 
different buffer size is highly depending on the traffic pattern, and may 
actually introduce higher latency, so it's hard to give a suggested buffer 
size. But I don't think the extra negotiation time during connecting is a 
problem here, since we only set the buffer size between quorum servers, which 
is very few connections and it's stable.

Internally, we're not customizing the buffer size with this option anymore, 
instead we're using the system TCP Autotuning options net.ipv4.tcp_wmem and 
net.ipv4.tcp_rmem, which are more flexible and will automatically adjusts 
socket buffer sizes as needed to optimally balance TCP performance and memory 
usage, so we're not customizing the buffer size anymore.

**As for the unit test**, those SO_SENDBUF and SO_RCVBUF options are used 
by the platform's networking code as a hint for the size to set the underlying 
network I/O buffers, so it doesn't always guarantee it will use what you told 
it to. So I think this test is not that useful from this point of view.

Overall, I think we probably don't need this custom option anymore because 
it's hard to configure right.


---


[jira] [Commented] (ZOOKEEPER-2913) testEphemeralNodeDeletion is flaky

2018-09-05 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604742#comment-16604742
 ] 

Hudson commented on ZOOKEEPER-2913:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #176 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/176/])
ZOOKEEPER-2913: testEphemeralNodeDeletion is flaky (andor: rev 
e7ac12c952b47921bfecbad52112be51f5b9ede5)
* (edit) 
src/java/test/org/apache/zookeeper/server/quorum/EphemeralNodeDeletionTest.java


> testEphemeralNodeDeletion is flaky
> --
>
> Key: ZOOKEEPER-2913
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2913
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.6.0
>Reporter: Patrick Hunt
>Assignee: maoling
>Priority: Major
>  Labels: flaky, pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> testEphemeralNodeDeletion is showing up as flakey across a number of jobs.
> 1.https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
> 2.https://builds.apache.org/job/ZooKeeper_branch34_java9/305/testReport/junit/org.apache.zookeeper.server.quorum/EphemeralNodeDeletionTest/testEphemeralNodeDeletion/
> After session close ephemeral node must be deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3138) Potential race condition with Quorum Peer mutual authentication via SASL

2018-09-05 Thread Grzegorz Grzybek (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grzegorz Grzybek updated ZOOKEEPER-3138:

Description: 
I'm in the process of reconfiguring the ensemble to use mutual quorum peer 
authentication using SASL (ZOOKEEPER-1045).

In order to understand the impact on my code, I've checked _how it works_. Now 
I'm running & debugging 
{{org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest#testValidCredentials()}}
 test case.

I have now six threads (3 peers contacting each other):
* "QuorumConnectionThread-[myid=0]-2@1483" prio=5 tid=0x2b nid=NA runnable
* "QuorumConnectionThread-[myid=0]-3@1491" prio=5 tid=0x36 nid=NA runnable
* "QuorumConnectionThread-[myid=1]-1@1481" prio=5 tid=0x2d nid=NA runnable
* "QuorumConnectionThread-[myid=1]-4@1505" prio=5 tid=0x3c nid=NA runnable
* "QuorumConnectionThread-[myid=2]-2@1495" prio=5 tid=0x37 nid=NA runnable
* "QuorumConnectionThread-[myid=2]-4@1506" prio=5 tid=0x3d nid=NA runnable

at this point of invocation:
{noformat}
  java.lang.Thread.State: RUNNABLE
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handleNameCallback(SaslQuorumServerCallbackHandler.java:101)
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handle(SaslQuorumServerCallbackHandler.java:82)
  at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
  at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.authenticate(SaslQuorumAuthServer.java:100)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:467)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:386)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReceiverThread.run(QuorumCnxManager.java:422)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
{noformat}

which is this line:
{code:java}
private void handleNameCallback(NameCallback nc) {
// check to see if this user is in the user password database.
if (credentials.get(nc.getDefaultName()) == null) {
LOG.warn("User '{}' not found in list of DIGEST-MD5 authenticateable 
users.",
nc.getDefaultName());
return;
}
nc.setName(nc.getDefaultName());
/* >>> */ userName = nc.getDefaultName(); /* <<< */
}
{code}

each pair of threads is operating on single instance of 
{{org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler#userName}}.
 In the stack trace we have both shared and local variables/fields:
* {{o.a.z.server.quorum.QuorumCnxManager.QuorumConnectionReceiverThread#sock}} 
is thread-specific (ok)
* {{o.a.z.server.quorum.QuorumCnxManager#authServer}} is peer-specific 
(instance of {{o.a.z.server.quorum.auth.SaslQuorumAuthServer}}) but without a 
state that changes
* {{javax.security.sasl.SaslServer}} is thread-specific (ok) - this instance is 
created to handle sasl authentication, but is created using peer-specific JAAS 
subject (which is ok) and peer-specific 
{{o.a.z.server.quorum.auth.SaslQuorumAuthServer#serverLogin.callbackHadler}} 
{color:red}which is potentially a problem{color}

Each (out of six) thread handles different connection, but each pair (for given 
QuorumPeer) calls 
{{o.a.z.server.quorum.auth.SaslQuorumServerCallbackHandler#handleNameCallback()}}
 which modifies shared (peer-specific) field - {{userName}}.

I understand that [according to the example from 
Wiki|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication]
 all peers may use the same credentials (in simplest case).

But the "userName" comes from data sent by each peer, like this:
{noformat}
charset=utf-8,\
username="test",\
realm="zk-quorum-sasl-md5",\
nonce="iBqYWtaCrEE013S6Dv6xiOsR9uX2l/qKZcEZ1pm2",\
nc=0001,\
cnonce="LVaL9XYFjNxVBPCjPewXjEBsj9GuwIfBN/RXsKt5",\
digest-uri="zookeeper-quorum/zk-quorum-sasl-md5",\
maxbuf=65536,\
response=dd4e9e2115ec2e304484d5191f3fc771,\
qop=auth,\
authzid="test"
{noformat}

*And I can imagine such JAAS configuration for DIGEST-MD5 SASL algorithm, that 
each peer uses own credentials and is able to validate other peers' specific 
credentials.*:
{noformat}
QuorumServer {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   user_peer1="peer1";
   user_peer2="peer2";
   user_peer3="peer3";
};
QuorumLearner1 {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   username="peer1"
   password="peer1";
};
...
QuorumLearner2 {
   

[jira] [Updated] (ZOOKEEPER-3138) Potential race condition with Quorum Peer mutual authentication via SASL

2018-09-05 Thread Grzegorz Grzybek (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grzegorz Grzybek updated ZOOKEEPER-3138:

Description: 
I'm in the process of reconfiguring the ensemble to use mutual quorum peer 
authentication using SASL (ZOOKEEPER-1045).

In order to understand the impact on my code, I've checked _how it works_. Now 
I'm running & debugging 
{{org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest#testValidCredentials()}}
 test case.

I have now six threads (3 peers contacting each other):
* "QuorumConnectionThread-[myid=0]-2@1483" prio=5 tid=0x2b nid=NA runnable
* "QuorumConnectionThread-[myid=0]-3@1491" prio=5 tid=0x36 nid=NA runnable
* "QuorumConnectionThread-[myid=1]-1@1481" prio=5 tid=0x2d nid=NA runnable
* "QuorumConnectionThread-[myid=1]-4@1505" prio=5 tid=0x3c nid=NA runnable
* "QuorumConnectionThread-[myid=2]-2@1495" prio=5 tid=0x37 nid=NA runnable
* "QuorumConnectionThread-[myid=2]-4@1506" prio=5 tid=0x3d nid=NA runnable

at this point of invocation:
{noformat}
  java.lang.Thread.State: RUNNABLE
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handleNameCallback(SaslQuorumServerCallbackHandler.java:101)
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handle(SaslQuorumServerCallbackHandler.java:82)
  at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
  at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.authenticate(SaslQuorumAuthServer.java:100)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:467)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:386)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReceiverThread.run(QuorumCnxManager.java:422)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
{noformat}

which is this line:
{code:java}
private void handleNameCallback(NameCallback nc) {
// check to see if this user is in the user password database.
if (credentials.get(nc.getDefaultName()) == null) {
LOG.warn("User '{}' not found in list of DIGEST-MD5 authenticateable 
users.",
nc.getDefaultName());
return;
}
nc.setName(nc.getDefaultName());
/* >>> */ userName = nc.getDefaultName(); /* <<< */
}
{code}

each pair of threads is operating on single instance of 
{{org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler#userName}}.
 In the stack trace we have both shared and local variables/fields:
* {{o.a.z.server.quorum.QuorumCnxManager.QuorumConnectionReceiverThread#sock}} 
is thread-specific (ok)
* {{o.a.z.server.quorum.QuorumCnxManager#authServer}} is peer-specific 
(instance of {{o.a.z.server.quorum.auth.SaslQuorumAuthServer}}) but without a 
state that changes
* {{javax.security.sasl.SaslServer}} is thread-specific (ok) - this instance is 
created to handle sasl authentication, but is created using peer-specific JAAS 
subject (which is ok) and peer-specific 
{{o.a.z.server.quorum.auth.SaslQuorumAuthServer#serverLogin.callbackHadler}} 
{color:red}which is potentially a problem{color}

Each (out of six) thread handles different connection, but each pair (for given 
QuorumPeer) calls 
{{o.a.z.server.quorum.auth.SaslQuorumServerCallbackHandler#handleNameCallback()}}
 which modifies shared (peer-specific) field - {{userName}}.

I understand that [according to the example from 
Wiki|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication]
 all peers may use the same credentials (in simplest case).

But the "userName" comes from data sent by each peer, like this:
{noformat}
charset=utf-8,\
username="test",\
realm="zk-quorum-sasl-md5",\
nonce="iBqYWtaCrEE013S6Dv6xiOsR9uX2l/qKZcEZ1pm2",\
nc=0001,\
cnonce="LVaL9XYFjNxVBPCjPewXjEBsj9GuwIfBN/RXsKt5",\
digest-uri="zookeeper-quorum/zk-quorum-sasl-md5",\
maxbuf=65536,\
response=dd4e9e2115ec2e304484d5191f3fc771,\
qop=auth,\
authzid="test"
{noformat}

*And I can imagine such JAAS configuration for DIGEST-MD5 SASL algorithm, that 
each peer uses own credentials and is able to validate other peers' specific 
credentials.*:
{noformat}
QuorumServer {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   user_peer1="peer1";
   user_peer2="peer2";
   user_peer3="peer3";
};
QuorumLearner1 {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   username="peer1"
   password="peer1";
};
QuorumLearner2 {
   

[jira] [Updated] (ZOOKEEPER-3138) Potential race condition with Quorum Peer mutual authentication via SASL

2018-09-05 Thread Grzegorz Grzybek (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grzegorz Grzybek updated ZOOKEEPER-3138:

Description: 
I'm in the process of reconfiguring the ensemble to use mutual quorum peer 
authentication using SASL (ZOOKEEPER-1045).

In order to understand the impact on my code, I've checked _how it works_. Now 
I'm running & debugging 
{{org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest#testValidCredentials()}}
 test case.

I have now six threads (3 peers contacting each other):
* "QuorumConnectionThread-[myid=0]-2@1483" prio=5 tid=0x2b nid=NA runnable
* "QuorumConnectionThread-[myid=0]-3@1491" prio=5 tid=0x36 nid=NA runnable
* "QuorumConnectionThread-[myid=1]-1@1481" prio=5 tid=0x2d nid=NA runnable
* "QuorumConnectionThread-[myid=1]-4@1505" prio=5 tid=0x3c nid=NA runnable
* "QuorumConnectionThread-[myid=2]-2@1495" prio=5 tid=0x37 nid=NA runnable
* "QuorumConnectionThread-[myid=2]-4@1506" prio=5 tid=0x3d nid=NA runnable

at this point of invocation:
{noformat}
  java.lang.Thread.State: RUNNABLE
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handleNameCallback(SaslQuorumServerCallbackHandler.java:101)
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handle(SaslQuorumServerCallbackHandler.java:82)
  at 
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
  at 
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
  at 
org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.authenticate(SaslQuorumAuthServer.java:100)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:467)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:386)
  at 
org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReceiverThread.run(QuorumCnxManager.java:422)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
{noformat}

which is this line:
{code:java}
private void handleNameCallback(NameCallback nc) {
// check to see if this user is in the user password database.
if (credentials.get(nc.getDefaultName()) == null) {
LOG.warn("User '{}' not found in list of DIGEST-MD5 authenticateable 
users.",
nc.getDefaultName());
return;
}
nc.setName(nc.getDefaultName());
/* >>> */ userName = nc.getDefaultName(); /* <<< */
}
{code}

each pair of threads is operating on single instance of 
{{org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler#userName}}.
 In the stack trace we have both shared and local variables/fields:
* {{o.a.z.server.quorum.QuorumCnxManager.QuorumConnectionReceiverThread#sock}} 
is thread-specific (ok)
* {{o.a.z.server.quorum.QuorumCnxManager#authServer}} is peer-specific 
(instance of {{o.a.z.server.quorum.auth.SaslQuorumAuthServer}}) but without a 
state that changes
* {{javax.security.sasl.SaslServer}} is thread-specific (ok) - this instance is 
created to handle sasl authentication, but is created using peer-specific JAAS 
subject (which is ok) and peer-specific 
{{o.a.z.server.quorum.auth.SaslQuorumAuthServer#serverLogin.callbackHadler}} 
{color:red}which is potentially a problem{color}

Each (out of six) thread handles different connection, but each pair (for given 
QuorumPeer) calls 
{{o.a.z.server.quorum.auth.SaslQuorumServerCallbackHandler#handleNameCallback()}}
 which modifies shared (peer-specific) field - {{userName}}.

I understand that [according to the example from 
Wiki|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication]:
{noformat}
QuorumServer {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   user_peer1="peer1";
   user_peer2="peer2";
   user_peer3="peer3";
};
QuorumLearner1 {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   username="peer1"
   password="peer1";
};
...
{noformat}

All peers use the same credentials, but the "userName" comes from data sent by 
each peer, like this:
{noformat}
charset=utf-8,\
username="test",\
realm="zk-quorum-sasl-md5",\
nonce="iBqYWtaCrEE013S6Dv6xiOsR9uX2l/qKZcEZ1pm2",\
nc=0001,\
cnonce="LVaL9XYFjNxVBPCjPewXjEBsj9GuwIfBN/RXsKt5",\
digest-uri="zookeeper-quorum/zk-quorum-sasl-md5",\
maxbuf=65536,\
response=dd4e9e2115ec2e304484d5191f3fc771,\
qop=auth,\
authzid="test"
{noformat}

*And I can imagine such JAAS configuration for DIGEST-MD5 SASL algorithm, that 
each peer uses own credentials and is able to validate other peers' specific 
credentials.*

Isn't it a race condition? Like this 

[jira] [Created] (ZOOKEEPER-3138) Potential race condition with Quorum Peer mutual authentication via SASL

2018-09-05 Thread Grzegorz Grzybek (JIRA)
Grzegorz Grzybek created ZOOKEEPER-3138:
---

 Summary: Potential race condition with Quorum Peer mutual 
authentication via SASL
 Key: ZOOKEEPER-3138
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3138
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum
Affects Versions: 3.4.13
Reporter: Grzegorz Grzybek


I'm in the process of reconfiguring the ensemble to use mutual quorum peer 
authentication using SASL (ZOOKEEPER-1045).

In order to understand the impact on my code, I've checked _how it works_. Now 
I'm running & debugging 
{{org.apache.zookeeper.server.quorum.auth.QuorumDigestAuthTest#testValidCredentials()}}
 test case.

I have now six threads (3 peers contacting each other):
* "QuorumConnectionThread-[myid=0]-2@1483" prio=5 tid=0x2b nid=NA runnable
* "QuorumConnectionThread-[myid=0]-3@1491" prio=5 tid=0x36 nid=NA runnable
* "QuorumConnectionThread-[myid=1]-1@1481" prio=5 tid=0x2d nid=NA runnable
* "QuorumConnectionThread-[myid=1]-4@1505" prio=5 tid=0x3c nid=NA runnable
* "QuorumConnectionThread-[myid=2]-2@1495" prio=5 tid=0x37 nid=NA runnable
* "QuorumConnectionThread-[myid=2]-4@1506" prio=5 tid=0x3d nid=NA runnable

at this point of invocation:
{noformat}
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handleNameCallback(SaslQuorumServerCallbackHandler.java:101)
org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler.handle(SaslQuorumServerCallbackHandler.java:82)
com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
org.apache.zookeeper.server.quorum.auth.SaslQuorumAuthServer.authenticate(SaslQuorumAuthServer.java:100)
org.apache.zookeeper.server.quorum.QuorumCnxManager.handleConnection(QuorumCnxManager.java:467)
org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:386)
org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReceiverThread.run(QuorumCnxManager.java:422)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.lang.Thread.run(Thread.java:748)
{noformat}

which is this line:
{code:java}
private void handleNameCallback(NameCallback nc) {
// check to see if this user is in the user password database.
if (credentials.get(nc.getDefaultName()) == null) {
LOG.warn("User '{}' not found in list of DIGEST-MD5 authenticateable 
users.",
nc.getDefaultName());
return;
}
nc.setName(nc.getDefaultName());
/* >>> */ userName = nc.getDefaultName(); /* <<< */
}
{code}

each pair of threads is operating on single instance of 
{{org.apache.zookeeper.server.quorum.auth.SaslQuorumServerCallbackHandler#userName}}.
 In the stack trace we have both shared and local variables/fields:
* {{o.a.z.server.quorum.QuorumCnxManager.QuorumConnectionReceiverThread#sock}} 
is thread-specific (ok)
* {{o.a.z.server.quorum.QuorumCnxManager#authServer}} is peer-specific 
(instance of {{o.a.z.server.quorum.auth.SaslQuorumAuthServer}}) but without a 
state that changes
* {{javax.security.sasl.SaslServer}} is thread-specific (ok) - this instance is 
created to handle sasl authentication, but is created using peer-specific JAAS 
subject (which is ok) and peer-specific 
{{o.a.z.server.quorum.auth.SaslQuorumAuthServer#serverLogin.callbackHadler}} 
{color:red}which is potentially a problem{color}

Each (out of six) thread handles different connection, but each pair (for given 
QuorumPeer) calls 
{{o.a.z.server.quorum.auth.SaslQuorumServerCallbackHandler#handleNameCallback()}}
 which modifies shared (peer-specific) field - {{userName}}.

I understand that [according to the example from 
Wiki|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Server-Server+mutual+authentication]:
{noformat}
QuorumServer {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   user_peer1="peer1";
   user_peer2="peer2";
   user_peer3="peer3";
};
QuorumLearner1 {
   org.apache.zookeeper.server.auth.DigestLoginModule required
   username="peer1"
   password="peer1";
};
...
{noformat}

All peers use the same credentials, but the "userName" comes from data sent by 
each peer, like this:
{noformat}
charset=utf-8,\
username="test",\
realm="zk-quorum-sasl-md5",\
nonce="iBqYWtaCrEE013S6Dv6xiOsR9uX2l/qKZcEZ1pm2",\
nc=0001,\
cnonce="LVaL9XYFjNxVBPCjPewXjEBsj9GuwIfBN/RXsKt5",\
digest-uri="zookeeper-quorum/zk-quorum-sasl-md5",\
maxbuf=65536,\
response=dd4e9e2115ec2e304484d5191f3fc771,\
qop=auth,\
authzid="test"
{noformat}

*And I can imagine such JAAS configuration for DIGEST-MD5 SASL algorithm, that 
each peer uses own credentials and is able to validate other peers' specific 
credentials.*

Isn't it a 

[GitHub] zookeeper issue #587: ZOOKEEPER-3106: Zookeeper client supports IPv6 address...

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/587
  
@maoling please rebase the patch


---


[GitHub] zookeeper issue #573: ZOOKEEPER-3090 continue can be replaced with break

2018-09-05 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/573
  
@a470577391 Mr Jenkins has come back.could you plz close and reopen this PR 
to kick off he?
ping @anmolnar 



---


[GitHub] zookeeper issue #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is flaky

2018-09-05 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/608
  
> Do you mind opening a PR to fix the other test?

@anmolnar 
other same tests should analyse on the specific circumstances.I will follow 
up and hope others' work about this issue.
Sir,if you're free,could you plz step into 
[PR-587](https://github.com/apache/zookeeper/pull/587)(grin)?


---


[jira] [Commented] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

2018-09-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604492#comment-16604492
 ] 

Hadoop QA commented on ZOOKEEPER-1011:
--

-1 overall.  GitHub Pull Request  Build
  

+1 @author.  The patch does not contain any @author tags.

+0 tests included.  The patch appears to be a documentation patch that 
doesn't require tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 3.0.1) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2118//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2118//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2118//console

This message is automatically generated.

> fix Java Barrier Documentation example's race condition issue and polish up 
> the Barrier Documentation
> -
>
> Key: ZOOKEEPER-1011
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1011
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Semih Salihoglu
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a race condition in the Barrier example of the java doc: 
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's 
> in the enter() method. Here's the original example:
> boolean enter() throws KeeperException, InterruptedException{
> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL_SEQUENTIAL);
> while (true) {
> synchronized (mutex) {
> List list = zk.getChildren(root, true);
> if (list.size() < size) {
> mutex.wait();
> } else {
> return true;
> }
> }
> }
> }
> Here's the race condition scenario:
> Let's say there are two machines/nodes: node1 and node2 that will use this 
> code to synchronize over ZK. Let's say the following steps take place:
> node1 calls the zk.create method and then reads the number of children, and 
> sees that it's 1 and starts waiting. 
> node2 calls the zk.create method (doesn't call the zk.getChildren method yet, 
> let's say it's very slow) 
> node1 is notified that the number of children on the znode changed, it checks 
> that the size is 2 so it leaves the barrier, it does its work and then leaves 
> the barrier, deleting its node.
> node2 calls zk.getChildren and because node1 has already left, it sees that 
> the number of children is equal to 1. Since node1 will never enter the 
> barrier again, it will keep waiting.
> --- End of scenario ---
> Here's Flavio's fix suggestions (copying from the email thread):
> ...
> I see two possible action points out of this discussion:
>   
> 1- State clearly in the beginning that the example discussed is not correct 
> under the assumption that a process may finish the computation before another 
> has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the problem 
> and shows how to fix it. This is an interesting option that illustrates how 
> one could reason about a solution when developing with zookeeper.
> ...
> We'll go with the 2nd option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #593: [ZOOKEEPER-3111] Add socket buffer size option to tune...

2018-09-05 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/593
  
@lvfangmin 
it will be a very useful option for tunning the network perfermance between 
leader and follower.
but you should give us some test evidence(smirk).
can we get the inclusion:

> Increasing the receive buffer size can increase the performance of 
network I/O for high-volume connection, while decreasing it can help reduce the 
backlog of incoming data.

or something else?what is the reasonable value you suggest?


---


[GitHub] zookeeper issue #593: [ZOOKEEPER-3111] Add socket buffer size option to tune...

2018-09-05 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/593
  
> What do you mean by 'do regression' exactly?

@anmolnar like this(grin):
`int index = 0;
while (true) {
try {
   testSetSocketBufferSize();
} catch (Exception e) {
e.printStackTrace();
}
try {
Thread.sleep(200);
} catch (InterruptedException e) {
}
System.out.println("---" + (index++)+" passed!!!");
}`




---


[GitHub] zookeeper issue #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is flaky

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/608
  
Committed. 3.5 and master branches. Thanks @maoling .
Do you mind opening a PR to fix the other test?


---


[jira] [Resolved] (ZOOKEEPER-2913) testEphemeralNodeDeletion is flaky

2018-09-05 Thread Andor Molnar (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar resolved ZOOKEEPER-2913.
-
   Resolution: Fixed
Fix Version/s: 3.5.5
   3.6.0

Issue resolved by pull request 608
[https://github.com/apache/zookeeper/pull/608]

> testEphemeralNodeDeletion is flaky
> --
>
> Key: ZOOKEEPER-2913
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2913
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Affects Versions: 3.4.10, 3.5.3, 3.4.11, 3.6.0
>Reporter: Patrick Hunt
>Assignee: maoling
>Priority: Major
>  Labels: flaky, pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> testEphemeralNodeDeletion is showing up as flakey across a number of jobs.
> 1.https://builds.apache.org/view/S-Z/view/ZooKeeper/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
> 2.https://builds.apache.org/job/ZooKeeper_branch34_java9/305/testReport/junit/org.apache.zookeeper.server.quorum/EphemeralNodeDeletionTest/testEphemeralNodeDeletion/
> After session close ephemeral node must be deleted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper pull request #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is fla...

2018-09-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/zookeeper/pull/608


---


[jira] [Updated] (ZOOKEEPER-1011) fix Java Barrier Documentation example's race condition issue and polish up the Barrier Documentation

2018-09-05 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ZOOKEEPER-1011:
--
Labels: pull-request-available  (was: )

> fix Java Barrier Documentation example's race condition issue and polish up 
> the Barrier Documentation
> -
>
> Key: ZOOKEEPER-1011
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1011
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: documentation
>Reporter: Semih Salihoglu
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>
> There is a race condition in the Barrier example of the java doc: 
> http://hadoop.apache.org/zookeeper/docs/current/zookeeperTutorial.html. It's 
> in the enter() method. Here's the original example:
> boolean enter() throws KeeperException, InterruptedException{
> zk.create(root + "/" + name, new byte[0], Ids.OPEN_ACL_UNSAFE,
> CreateMode.EPHEMERAL_SEQUENTIAL);
> while (true) {
> synchronized (mutex) {
> List list = zk.getChildren(root, true);
> if (list.size() < size) {
> mutex.wait();
> } else {
> return true;
> }
> }
> }
> }
> Here's the race condition scenario:
> Let's say there are two machines/nodes: node1 and node2 that will use this 
> code to synchronize over ZK. Let's say the following steps take place:
> node1 calls the zk.create method and then reads the number of children, and 
> sees that it's 1 and starts waiting. 
> node2 calls the zk.create method (doesn't call the zk.getChildren method yet, 
> let's say it's very slow) 
> node1 is notified that the number of children on the znode changed, it checks 
> that the size is 2 so it leaves the barrier, it does its work and then leaves 
> the barrier, deleting its node.
> node2 calls zk.getChildren and because node1 has already left, it sees that 
> the number of children is equal to 1. Since node1 will never enter the 
> barrier again, it will keep waiting.
> --- End of scenario ---
> Here's Flavio's fix suggestions (copying from the email thread):
> ...
> I see two possible action points out of this discussion:
>   
> 1- State clearly in the beginning that the example discussed is not correct 
> under the assumption that a process may finish the computation before another 
> has started, and the example is there for illustration purposes;
> 2- Have another example following the current one that discusses the problem 
> and shows how to fix it. This is an interesting option that illustrates how 
> one could reason about a solution when developing with zookeeper.
> ...
> We'll go with the 2nd option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is flaky

2018-09-05 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/608
  
@anmolnar  Take it easy(grin).

> Do you think other flaky test could be impacted by the same issue?

I dig into another issue[ 
ZOOKEEPER-3023](https://issues.apache.org/jira/browse/ZOOKEEPER-3023) which may 
be the same issue or the timeout `50ms` is too short to get updated data.
When Jenkins does regression,the disk is heavy and busy.some state/data 
can't sync from leader to learner ,which may be a salient cause for flaky test.



---


[GitHub] zookeeper issue #611: ZOOKEEPER-3131

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/611
  
@enixon I think it leaks the map anyway, because when the new Watcher is 
added, new Watcher instance gets created which will be added to Map anyway and 
the empty one remains forever.

@wangchaod Would you please consider adding a quick unit test for the 
patch. Otherwise, it looks good.


---


[GitHub] zookeeper issue #593: [ZOOKEEPER-3111] Add socket buffer size option to tune...

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/593
  
@maoling I've run the test a few times on Mac and CentOs, but it doesn't 
seem to be flaky for me.
What do you mean by 'do regression' exactly?


---


[GitHub] zookeeper issue #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is flaky

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/608
  
@maoling 
+1 for the solution anyway. I tend to commit this anyway.
Do you think other flaky test could be impacted by the same issue?


---


[GitHub] zookeeper issue #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is flaky

2018-09-05 Thread maoling
Github user maoling commented on the issue:

https://github.com/apache/zookeeper/pull/608
  
@anmolnar 
This flaky test had happend many times found by ZOOKEEPER-2913 and 
ZOOKEEPER-3040 reported by Patrick Hunt
The build failed report url about it is 
[here](https://builds.apache.org/job/ZooKeeper_branch34_java9/305/testReport/junit/org.apache.zookeeper.server.quorum/EphemeralNodeDeletionTest/testEphemeralNodeDeletion/)
  which now is broken.
When Jenkins does regression,it will happen again.If you have more 
concerns,let this patch still,and waiting for this flaky test coming back :D


---


[GitHub] zookeeper issue #608: ZOOKEEPER-2913:testEphemeralNodeDeletion is flaky

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/608
  
Good job @maoling .
However I can't see this test in the latest flaky test report: 
https://builds.apache.org/job/ZooKeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html
Is this patch still needed?


---


[GitHub] zookeeper issue #607: WIP - ZOOKEEPER-2990: Implement probabilistic tracing

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/607
  
Basically my biggest concern is that this patch shouldn't go into 3.4, 
because it's not critical.
3.5 is just around the corner and improvements like this will encourage 
people to upgrade once stable version is available.


---


[GitHub] zookeeper issue #607: WIP - ZOOKEEPER-2990: Implement probabilistic tracing

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/607
  
Thanks for the contribution @lavacat .
Please take a look at my comments in the Jira.


---


[jira] [Commented] (ZOOKEEPER-2990) Implement probabilistic tracing

2018-09-05 Thread Andor Molnar (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604232#comment-16604232
 ] 

Andor Molnar commented on ZOOKEEPER-2990:
-

[~bkanivets]

What are the target versions of this patch?

I suggest 3.5 and 3.6 only, because it's a minor improvement. (should only fix 
critical and security issues in 3.4)

> Implement probabilistic tracing
> ---
>
> Key: ZOOKEEPER-2990
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2990
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Bogdan Kanivets
>Assignee: Bogdan Kanivets
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> It would be nice to have an ability to do probabilistic tracing similar to 
> Cassandra  
> [nodetool|https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/tools/nodetool/toolsSetTraceProbability.html]
> This will help debug issues in prod systems.
> I'd like to contribute if everyone is ok with the feature.
> My suggestion is to add an extra parameter to ZooTrace to handle it. 
> Questions:
>  * should it be one global param or per each ZooTrace mask? I'm thinking per 
> mask
>  * should it be a new 4lw or part of 'stmk'? Leaning towards new word and 
> refactoring param passing to words (stmk is a special case right now).
>  * there are places in the code that use LOG.trace directly. That will have 
> to change to ZooTrace
> I can make some initial implementation for demo/review.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #590: [ZOOKEEPER-1177] Add the memory optimized watch manage...

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/590
  
@lvfangmin Docs for the new caching parameter `zookeeper.bitHashCacheSize`? 


---


[GitHub] zookeeper pull request #590: [ZOOKEEPER-1177] Add the memory optimized watch...

2018-09-05 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/590#discussion_r215183702
  
--- Diff: 
src/java/main/org/apache/zookeeper/server/watch/WatchManagerOptimized.java ---
@@ -0,0 +1,355 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.zookeeper.server.watch;
+
+import java.io.PrintWriter;
+import java.util.HashMap;
+import java.util.BitSet;
+import java.util.HashSet;
+import java.util.ArrayList;
+import java.util.LinkedHashMap;
+import java.util.Map;
+import java.util.Map.Entry;
+import java.util.Random;
+import java.util.Set;
+import java.util.Iterator;
+import java.lang.Iterable;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.locks.ReadWriteLock;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+
+import org.apache.zookeeper.WatchedEvent;
+import org.apache.zookeeper.Watcher;
+import org.apache.zookeeper.Watcher.Event.EventType;
+import org.apache.zookeeper.Watcher.Event.KeeperState;
+import org.apache.zookeeper.server.ServerCnxn;
+import org.apache.zookeeper.server.util.BitMap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Optimized in memory and time complexity, compared to WatchManager, both 
the
+ * memory consumption and time complexity improved a lot, but it cannot
+ * efficiently remove the watcher when the session or socket is closed, for
+ * majority usecase this is not a problem.
+ *
+ * Changed made compared to WatchManager:
+ *
+ * - Use HashSet and BitSet to store the watchers to find a balance between
+ *   memory usage and time complexity
+ * - Use ReadWriteLock instead of synchronized to reduce lock retention
+ * - Lazily clean up the closed watchers
+ */
+public class WatchManagerOptimized
+implements IWatchManager, DeadWatcherListener {
+
+private static final Logger LOG =
+LoggerFactory.getLogger(WatchManagerOptimized.class);
+
+private final ConcurrentHashMap pathWatches =
+new ConcurrentHashMap();
+
+// watcher to bit id mapping
+private final BitMap watcherBitIdMap = new BitMap();
+
+// used to lazily remove the dead watchers
+private final WatcherCleaner watcherCleaner;
+
+private final ReentrantReadWriteLock addRemovePathRWLock = new 
ReentrantReadWriteLock();
+
+public WatchManagerOptimized() {
+watcherCleaner = new WatcherCleaner(this);
+watcherCleaner.start();
+}
+
+@Override
+public boolean addWatch(String path, Watcher watcher) {
+boolean result = false;
+addRemovePathRWLock.readLock().lock();
+try {
+// avoid race condition of adding a on flying dead watcher
+if (isDeadWatcher(watcher)) {
+LOG.debug("Ignoring addWatch with closed cnxn");
+} else {
+Integer bit = watcherBitIdMap.add(watcher);
+BitHashSet watchers = pathWatches.get(path);
+if (watchers == null) {
+watchers = new BitHashSet();
+BitHashSet existingWatchers = 
pathWatches.putIfAbsent(path, watchers);
+if (existingWatchers != null) {
+watchers = existingWatchers;
+}
+}
+result = watchers.add(bit);
+}
+} finally {
+addRemovePathRWLock.readLock().unlock();
+}
+return result;
+}
+
+@Override
+public boolean containsWatcher(String path, Watcher watcher) {
+BitHashSet watchers = pathWatches.get(path);
+if (watchers == null || 
!watchers.contains(watcherBitIdMap.getBit(watcher))) {
+return false;
+}
+return true;
+ 

[GitHub] zookeeper pull request #590: [ZOOKEEPER-1177] Add the memory optimized watch...

2018-09-05 Thread anmolnar
Github user anmolnar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/590#discussion_r215180602
  
--- Diff: src/java/main/org/apache/zookeeper/server/watch/WatchManager.java 
---
@@ -46,15 +48,26 @@
 private final Map> watch2Paths =
 new HashMap>();
 
-synchronized int size(){
+@Override
+public synchronized int size(){
 int result = 0;
 for(Set watches : watchTable.values()) {
 result += watches.size();
 }
 return result;
 }
 
-synchronized void addWatch(String path, Watcher watcher) {
+boolean isDeadWatcher(Watcher watcher) {
--- End diff --

Taking that into account and the jira fix version, this patch will 
definitely go into 3.5 as well.


---


[GitHub] zookeeper issue #615: ZOOKEEPER-3137: add a utility to truncate logs to a zx...

2018-09-05 Thread anmolnar
Github user anmolnar commented on the issue:

https://github.com/apache/zookeeper/pull/615
  
@enixon Your patch looks good, this is going to be a useful tool for 
ZooKeeper.
Have you considered adding this functionality to `TxnLogToolkit` (a tool 
developed by me :), instead of creating a new one?


---