[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-24 Thread tumativ
Github user tumativ commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236063610
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) {
 totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
 try {
 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
cleaning");
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.wait(100);
+synchronized(processingCompletedEvent) {
+   processingCompletedEvent.wait(100);
--- End diff --

Done


---


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-24 Thread tumativ
Github user tumativ commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236063614
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -163,8 +165,8 @@ public void doWork() throws Exception {
 long latency = Time.currentElapsedTime() - 
startTime;
 LOG.info("Takes {} to process {} watches", 
latency, total);
 totalDeadWatchers.addAndGet(-total);
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.notifyAll();
+synchronized(processingCompletedEvent) {
+   processingCompletedEvent.notifyAll();
--- End diff --

Done


---


[GitHub] zookeeper issue #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thread and...

2018-11-24 Thread lvfangmin
Github user lvfangmin commented on the issue:

https://github.com/apache/zookeeper/pull/689
  
@tumativ thanks for working on this, only a minor comment now, I'll merge 
this when you updated this.


---


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-24 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236063171
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) {
 totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
 try {
 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
cleaning");
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.wait(100);
+synchronized(processingCompletedEvent) {
+   processingCompletedEvent.wait(100);
--- End diff --

Can we change this with 4 extra white spaces relative to the synchronized 
statement?


---


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-24 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236063217
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -102,24 +104,24 @@ public void addDeadWatcher(int watcherBit) {
 totalDeadWatchers.get() >= maxInProcessingDeadWatchers) {
 try {
 RATE_LOGGER.rateLimitLog("Waiting for dead watchers 
cleaning");
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.wait(100);
-}
-} catch (InterruptedException e) {
-LOG.info("Got interrupted while waiting for dead watches " 
+
-"queue size");
-}
-}
-synchronized (this) {
-if (deadWatchers.add(watcherBit)) {
-totalDeadWatchers.incrementAndGet();
-if (deadWatchers.size() >= watcherCleanThreshold) {
-synchronized (cleanEvent) {
-cleanEvent.notifyAll();
-}
-}
-}
+   synchronized (processingCompletedEvent) {
--- End diff --

We don't have a general format wiki yet, we'll work on that, in general 
it's 4 white space after { line.


---


[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...

2018-11-24 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/689#discussion_r236063177
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java
 ---
@@ -163,8 +165,8 @@ public void doWork() throws Exception {
 long latency = Time.currentElapsedTime() - 
startTime;
 LOG.info("Takes {} to process {} watches", 
latency, total);
 totalDeadWatchers.addAndGet(-total);
-synchronized(totalDeadWatchers) {
-totalDeadWatchers.notifyAll();
+synchronized(processingCompletedEvent) {
+   processingCompletedEvent.notifyAll();
--- End diff --

ditto.


---


[GitHub] zookeeper pull request #692: ZOOKEEPER-3184: Use the same method to generate...

2018-11-24 Thread lvfangmin
Github user lvfangmin commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/692#discussion_r236063130
  
--- Diff: README.md ---
@@ -1,49 +1,24 @@
 ## Generating the static Apache ZooKeeper website
 
-In this directory you will find text files formatted using Markdown, with 
an `.md` suffix.
+In the `src/main/resources/markdown` directory you will find text files 
formatted using Markdown, with an `.md` suffix.
 
-Building the site requires [Jekyll](http://jekyllrb.com/docs) 3.6.2 or 
newer. 
-The easiest way to install jekyll is via a Ruby Gem. Jekyll will create a 
directory called `_site` 
-containing `index.html` as well as the rest of the compiled directories 
and files. _site should not
-be committed to git as this is the generated content.
-
-To install Jekyll and its required dependencies, execute `sudo gem install 
jekyll pygments.rb` 
-and `sudo pip install Pygments`. See the Jekyll installation page for more 
details.
+Building the site requires [Maven](http://maven.apache.org/) 3.5.0 or 
newer. 
+The easiest way to [install Maven](http://maven.apache.org/install.html) 
depends on your OS.
+The build process will create a directory called `target/html` containing 
`index.html` as well as the rest of the
+compiled directories and files. `target` should not be committed to git as 
it is generated content.
 
 You can generate the static ZooKeeper website by running:
 
-1. `jekyll build` in this directory.
-2. `cp -RP _released_docs _site/doc` - this will include the documentation 
(see "sub-dir" section below) in the generated site.
+1. `mvn clean install` in this directory.
+2. `cp -RP _released_docs _target/html` - this will include the 
documentation (see "sub-dir" section below) in the generated site.
--- End diff --

Thanks @tamaashu, can you update the readme to point out how to copy those 
doc? 

One suggest, the top and bottom layout is a bit strange, they're too wide 
comparing to the main content on that page, can we make those not that wide?


---


[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2018-11-24 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698085#comment-16698085
 ] 

Michael K. Edwards commented on ZOOKEEPER-2778:
---

Note that the current version of this patch also addresses ZOOKEEPER-2488.

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Assignee: Michael K. Edwards
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 6h 20m
>  Remaining Estimate: 0h
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] zookeeper issue #528: ZOOKEEPER-3034 Facing issues while building from sourc...

2018-11-24 Thread mkedwards
Github user mkedwards commented on the issue:

https://github.com/apache/zookeeper/pull/528
  
I'd like to see this brought back.  Not cool to drop autoconf when the 
CMake setup can't build shared libraries.  From 
`zookeeper-client/zookeeper-client-c/README`:
```
Current limitations of the CMake build system include lack of Solaris 
support,
no shared library option, no explicitly exported symbols (all are exported 
by
default), no versions on the libraries, and no documentation generation.
```


---


Re: ReconfigInProgress error

2018-11-24 Thread Michael K. Edwards
I don't often admit defeat; but I can't make heads or tails of the
error handling (or lack thereof) in the reconfiguration code paths.
If anybody wants to take a stab at explaining which parts of the
processAck -> tryToCommit -> processReconfig -> reconfigure call chain
should and shouldn't go through if the bind() call fails, maybe I can
try to write tests that verify that and modify the code under test to
behave accordingly.  I've filed ZOOKEEPER-3198 as an umbrella for this
work, and pushed what I've got to
https://github.com/mkedwards/zookeeper/tree/broken-bind-3.5, in case
somebody wants to try to take it forward from there.

In the meantime, I'm running tests in parallel inside a Docker
container (with a code state that has patches applied for all three
3.5 blocker/critical Jiras).  Nothing seems "flaky" yet.  We'll deploy
this in our QA environment next week, and throw some load at it, and
see what happens.  (And run the test suite a few hundred times, too.)

Alex (or anyone else), do you consider any of the other outstanding
Jiras to be obstacles to exercising the reconfiguration features in
3.5.x on a production cluster?  How serious is
https://issues.apache.org/jira/browse/ZOOKEEPER-2202 ?  Is it related
to https://issues.apache.org/jira/browse/ZOOKEEPER-2836 ?  And how
serious is https://issues.apache.org/jira/browse/ZOOKEEPER-1896 ?
Does mixing 3.4.x and 3.5.x in the same cluster work?  Is it best to
disable reconfig while migrating cluster members from 3.4.x to 3.5.x,
and then enable reconfig and do a rolling restart?
On Sat, Nov 24, 2018 at 12:13 PM Alexander Shraer  wrote:
>
> Hi Michael,
>
> In general, one reconfig op is allowed at a time, and this error indicates 
> that one is already in progress. If there are enough peers to form a quorum a 
> failure to connect to one of them shouldn’t be a problem. If there is not 
> enough, the leader is supposed to give up leadership. This is true in 
> general, unrelated to reconfig. A new leader will be elected and complete any 
> reconfig in progress. That’s the theory at least, there may be a bug in the 
> case you found.
>
> Some general flow is described in Sec 3.2 of our paper, 
> https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf
>
> There are also the wiki docs but they don’t talk about recovery much. 
> https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html
>
> Btw
>
> > robustness against
> Byzantine faults that one is led to expect from Zookeeper?
>
> ZK is not designed to handle Byzantine faults in general. It’s not to say 
> that there is no bug In the case you found.
>
> Thanks,
> Alex
>
> On Sat, Nov 24, 2018 at 11:32 AM Michael K. Edwards  
> wrote:
>>
>> I've been experimenting a bit with trying to propagate failures to
>> bind() server ports in tests up to where we can do something about it.
>> There's at least one category of test cases (callers of
>> ReconfigTest.testPortChangeToBlockedPort) where the server is supposed
>> to ride through a bind() failure, recovering on a subsequent
>> reconfiguration.  In my current code state, I'm encountering errors
>> like this:
>>
>> 2018-11-24 11:04:46,252 [myid:] - INFO  [ProcessThread(sid:3
>> cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException
>> when processing sessionid:0x1002b98aa83 type:reconfig cxid:0x1e
>> zxid:0x1002b txntype:-1 reqpath:n/a Error Path:null
>> Error:KeeperErrorCode = ReconfigInProgress
>>
>> I can hack things until this particular test passes, but it raises
>> questions about reconfiguration in general.  How exactly is the
>> cluster supposed to get out of this state?  If a cluster member drops
>> out of contact with the quorum while there is a reconfiguration in
>> flight, is there any recovery path that restores the ability to
>> process a reconfigure operation?  Is there a design doc for
>> reconfiguration that demonstrates the kind of robustness against
>> Byzantine faults that one is led to expect from Zookeeper?


[jira] [Commented] (ZOOKEEPER-3198) Handle port-binding failures in a systematic and documented fashion

2018-11-24 Thread Michael K. Edwards (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698065#comment-16698065
 ] 

Michael K. Edwards commented on ZOOKEEPER-3198:
---

An attempt (as yet, not very successful) to plumb BindExceptions up the stack 
is in https://github.com/mkedwards/zookeeper/tree/broken-bind-3.5 .  I'm 
currently foundering on test cases that call 
ReconfigTest.testPortChangeToBlockedPort().

> Handle port-binding failures in a systematic and documented fashion
> ---
>
> Key: ZOOKEEPER-3198
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3198
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.5.3, 3.6.0, 3.4.13
>Reporter: Michael K. Edwards
>Priority: Major
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
>
> Many test failures appear to result from bind failures due to port conflicts. 
>  This can arise in normal use as well.  Presently the code swallows the 
> exception (with an error log) at a low level.  It would probably be useful to 
> throw the exception far enough up the stack to trigger retry with a new port 
> (in tests) or a high-level (perhaps even fatal) error message (in normal use).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: ReconfigInProgress error

2018-11-24 Thread Alexander Shraer
Hi Michael,

In general, one reconfig op is allowed at a time, and this error indicates
that one is already in progress. If there are enough peers to form a quorum
a failure to connect to one of them shouldn’t be a problem. If there is not
enough, the leader is supposed to give up leadership. This is true in
general, unrelated to reconfig. A new leader will be elected and complete
any reconfig in progress. That’s the theory at least, there may be a bug in
the case you found.

Some general flow is described in Sec 3.2 of our paper,
https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf

There are also the wiki docs but they don’t talk about recovery much.
https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html

Btw

> robustness against
Byzantine faults that one is led to expect from Zookeeper?

ZK is not designed to handle Byzantine faults in general. It’s not to say
that there is no bug In the case you found.

Thanks,
Alex

On Sat, Nov 24, 2018 at 11:32 AM Michael K. Edwards 
wrote:

> I've been experimenting a bit with trying to propagate failures to
> bind() server ports in tests up to where we can do something about it.
> There's at least one category of test cases (callers of
> ReconfigTest.testPortChangeToBlockedPort) where the server is supposed
> to ride through a bind() failure, recovering on a subsequent
> reconfiguration.  In my current code state, I'm encountering errors
> like this:
>
> 2018-11-24 11:04:46,252 [myid:] - INFO  [ProcessThread(sid:3
> cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException
> when processing sessionid:0x1002b98aa83 type:reconfig cxid:0x1e
> zxid:0x1002b txntype:-1 reqpath:n/a Error Path:null
> Error:KeeperErrorCode = ReconfigInProgress
>
> I can hack things until this particular test passes, but it raises
> questions about reconfiguration in general.  How exactly is the
> cluster supposed to get out of this state?  If a cluster member drops
> out of contact with the quorum while there is a reconfiguration in
> flight, is there any recovery path that restores the ability to
> process a reconfigure operation?  Is there a design doc for
> reconfiguration that demonstrates the kind of robustness against
> Byzantine faults that one is led to expect from Zookeeper?
>


[jira] [Assigned] (ZOOKEEPER-3113) EphemeralType.get() fails to verify ephemeralOwner when currentElapsedTime() is small enough

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3113:
---

Assignee: Andor Molnar

> EphemeralType.get() fails to verify ephemeralOwner when currentElapsedTime() 
> is small enough
> 
>
> Key: ZOOKEEPER-3113
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3113
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> EphemeralTypeTest.testServerIds() unit test fails on some systems that 
> System.nanoTime() is smaller than a certain value.
> The test generates ephemeralOwner in the old way (pre ZOOKEEPER-2901) without 
> enabling the emulation flag and asserts for exception to be thrown when 
> serverId == 255. This is right. ZooKeeper should fail on this case, because 
> serverId cannot be larger than 254 if extended types are enabled. In this 
> case ephemeralOwner with 0xff in the most significant byte indicates an 
> extended type.
> The logic which does the validation is in EphemeralType.get().
> It checks 2 things:
>  * the extended type byte is set: 0xff,
>  * reserved bits (next 2 bytes) corresponds to a valid extended type.
> Here is the problem: currently we only have 1 extended type: TTL with value 
> of 0x in the reserved bits.
> Logic expects that if we have anything different from it in the reserved 
> bits, the ephemeralOwner is invalid and exception should be thrown. That's 
> what the test asserts for and it works on most systems, because the timestamp 
> part of the sessionId usually have some bits in the reserved bits as well 
> which eventually will be larger than 0, so the value is unsupported.
> I think the problem is twofold:
>  * Either if we have more extended types, we'll increase the possibility that 
> this logic will accept invalid sessionIds (as long as reserved bits indicate 
> a valid extended type),
>  * Or (which happens on some systems) if the currentElapsedTime (timestamp 
> part of sessionId) is small enough and doesn't occupy reserved bits, this 
> logic will accept the invalid sessionId.
> Unfortunately I cannot repro the problem yet: it constantly happens on a 
> specific Jenkins slave, but even with the same distro and same JDK version I 
> cannot reproduce the same nanoTime() values.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3067) Optionally suppress client environment logging.

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3067:
---

Assignee: James Peach

> Optionally suppress client environment logging.
> ---
>
> Key: ZOOKEEPER-3067
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3067
> Project: ZooKeeper
>  Issue Type: Task
>  Components: c client
>Reporter: James Peach
>Assignee: James Peach
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> It would be helpful to add a {{zookeeper_init}} flag to suppress the client 
> environment logging. In our deployment, this causes LDAP lookups for the 
> current user ID, which is otherwise an unnecessary service dependency for 
> ZooKeeper clients.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3162) Broken lock semantics in C client lock-recipe

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3162:
---

Assignee: Andrea Reale

> Broken lock semantics in C client lock-recipe
> -
>
> Key: ZOOKEEPER-3162
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3162
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.0.0, 3.4.13
>Reporter: Andrea Reale
>Assignee: Andrea Reale
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> As reported (but never fixed) in the past by ZOOKEEPER-2409, ZOOKEEPER-2038 
> and (partly) ZOOKEEPER-2878, the C client lock-recipe implementation is 
> broken.
> I identified three issues.
> The main one (as also reported in the aforementioned reports) is that the 
> logic that goes through the lock waiting list is broken. child_floor uses 
> strcmp and compares the full node name (i.e., sessionID-sequence) rather than 
> only comparing the sequence number. This makes it possible for two different 
> clients to hold the lock at the same time: assume two clients, one associated 
> with session A, the other with session B, with A < B lexicographically. Now 
> assume that at some point a thread in B holds a lock and a thread in A tries 
> to acquire the same lock. A will manage to get the lock because of the wrong 
> comparison function, so now two guys hold the lock.
> The second issue is a possible deadlock inside zkr_lock_operation. 
> zkr_lock_operation is always called by holding the mutex associated to the 
> client lock. In some cases, zkr_lock_operaton may decide to give-up locking 
> and call zkr_lock_unlock to release the lock. When this happens, it will try 
> to acquire again the same phtread mutex, which will lead to a deadlock.
> The third issue relates to the return value of zkr_lock_lock. According to 
> the API docs, the functions returns 0 when no errors. Then it is up to the 
> invoker to check when the lock is held by calling zkr_lock_isowner. However, 
> the implementation, in case of no error, returns zkr_lock_isowner. This is 
> wrong because it becomes impossible to distinguish an error condition from a 
> success (but not ownerhsip). Instead the API (as described in the docs, btw) 
> should return always 0 when no errors occur.
> Shortly I will add the link to a PR fixing the issues.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3072) Race condition in throttling

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3072:
---

Assignee: Botond Hejj

> Race condition in throttling
> 
>
> Key: ZOOKEEPER-3072
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4
>Reporter: Botond Hejj
>Assignee: Botond Hejj
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.4, 3.6.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> There is a race condition in the server throttling code. It is possible that 
> the disableRecv is called after enableRecv.
> Basically, the I/O work thread does this in processPacket: 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102]
>  
>     submitRequest(si);
>     }
>     }
>     cnxn.incrOutstandingRequests(h);
>     }
>  
> incrOutstandingRequests() checks for limit breach, and potentially turns on 
> throttling, 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384]
>  
> submitRequest() will create a logical request and en-queue it so that 
> Processor thread can pick it up. After being de-queued by Processor thread, 
> it does necessary handling, and then calls this 
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459]
>  :
>  
>     cnxn.sendResponse(hdr, rsp, "response");
>  
> and in sendResponse(), it first appends to outgoing buffer, and then checks 
> if un-throttle is needed:  
> [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708]
>  
> However, if there is a context switch between submitRequest() and 
> cnxn.incrOutstandingRequests(), so that Processor thread completes 
> cnxn.sendResponse() call before I/O thread switches back, then enableRecv() 
> will happen before disableRecv(), and enableRecv() will fail the CAS ops, 
> while disableRecv() will succeed, resulting in a deadlock: un-throttle is 
> needed for letting in requests, and sendResponse is needed to trigger 
> un-throttle, but sendResponse() requires an incoming message. From that point 
> on, ZK server will no longer select the affected client socket for read, 
> leading to the observed client-side failure in the subject.
> If you would like to reproduce this than setting the globalOutstandingLimit 
> down to 1 makes this reproducible easier as throttling starts with less 
> requests. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3061) add more details to 'Unhandled scenario for peer' log.warn message

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3061:
---

Assignee: Christine Poerschke

> add more details to 'Unhandled scenario for peer' log.warn message
> --
>
> Key: ZOOKEEPER-3061
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3061
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-3061.patch
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> A few lines earlier the {{LOG.info("Synchronizing with Follower sid: ...}} 
> logging already contains most relevant details but it would be convenient to 
> more directly have full details in the {{LOG.warn("Unhandled scenario for 
> peer sid: ...}} itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3137) add a utility to truncate logs to a zxid

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3137:
---

Assignee: Brian Nixon

> add a utility to truncate logs to a zxid
> 
>
> Key: ZOOKEEPER-3137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3137
> Project: ZooKeeper
>  Issue Type: New Feature
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add a utility that allows an admin to truncate a given transaction log to a 
> specified zxid. This can be similar to the existent LogFormatter. 
> Among the benefits, this allows an admin to put together a point-in-time view 
> of a data tree by manually mutating files from a saved backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3142) Extend SnapshotFormatter to dump data in json format

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3142:
---

Assignee: Brian Nixon

> Extend SnapshotFormatter to dump data in json format
> 
>
> Key: ZOOKEEPER-3142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3142
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Json format can be chained into other tools such as ncdu. Extend the 
> SnapshotFormatter functionality to dump json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-2325:
---

Assignee: Andrew Grasso

> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3190) Spell check on the Zookeeper server files

2018-11-24 Thread Patrick Hunt (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Hunt reassigned ZOOKEEPER-3190:
---

Assignee: Dinesh Appavoo

> Spell check on the Zookeeper server files
> -
>
> Key: ZOOKEEPER-3190
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3190
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation, other
>Reporter: Dinesh Appavoo
>Assignee: Dinesh Appavoo
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This JIRA is to do spell check on the zookeeper server files [ 
> zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server ]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


ReconfigInProgress error

2018-11-24 Thread Michael K. Edwards
I've been experimenting a bit with trying to propagate failures to
bind() server ports in tests up to where we can do something about it.
There's at least one category of test cases (callers of
ReconfigTest.testPortChangeToBlockedPort) where the server is supposed
to ride through a bind() failure, recovering on a subsequent
reconfiguration.  In my current code state, I'm encountering errors
like this:

2018-11-24 11:04:46,252 [myid:] - INFO  [ProcessThread(sid:3
cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException
when processing sessionid:0x1002b98aa83 type:reconfig cxid:0x1e
zxid:0x1002b txntype:-1 reqpath:n/a Error Path:null
Error:KeeperErrorCode = ReconfigInProgress

I can hack things until this particular test passes, but it raises
questions about reconfiguration in general.  How exactly is the
cluster supposed to get out of this state?  If a cluster member drops
out of contact with the quorum while there is a reconfiguration in
flight, is there any recovery path that restores the ability to
process a reconfigure operation?  Is there a design doc for
reconfiguration that demonstrates the kind of robustness against
Byzantine faults that one is led to expect from Zookeeper?


[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...

2018-11-24 Thread eolivelli
Github user eolivelli commented on the issue:

https://github.com/apache/zookeeper/pull/720
  
@TyqITstudent 
For the 'recursive' flag I mean:
- true: recurse thw tree
- false: count only first level

About the build...does the build pass locally on your machine?
It seems you are not renaming classes/fields in every point of code 


---


[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...

2018-11-24 Thread TyqITstudent
Github user TyqITstudent commented on the issue:

https://github.com/apache/zookeeper/pull/720
  
> At every push CI will retest your work.
> Alternatively you can close and reopen the PR

When Jenkins check these parts, I change nothing but output are errors. 
Could you please help me to solve this?  (In compile area)
[Jenkins compile 
result.](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2745/artifact/patchprocess/trunkJavacWarnings.txt/*view*/)



---


[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...

2018-11-24 Thread TyqITstudent
GitHub user TyqITstudent reopened a pull request:

https://github.com/apache/zookeeper/pull/720

add an API to get total count of recursive sub nodes of one node

In production environment, there will be always a situation that there are 
a lot of recursive sub nodes of one node. We need to count total number of it.

Now, we can only use API getChildren which returns the List of first level 
of sub nodes. We need to iterate every sub node to get recursive sub nodes. It 
will cost a lot of time.

In zookeeper server side, it uses Hasp to store node. The 
key of the map represents the path of the node. We can iterate the map get 
total number of all levels of sub nodes of one node.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/TyqITstudent/zookeeper ZOOKEEPER-3167

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/zookeeper/pull/720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #720


commit f21dab121f255959032148e6608b84c12ed0bd68
Author: tianyiqun <891707263@...>
Date:   2018-11-24T06:39:30Z

add an API to get total count of recursive sub nodes of one node

commit 1b527726f52499aa943de1ec63de4ce9967300cf
Author: tianyiqun <891707263@...>
Date:   2018-11-24T06:39:30Z

add an API to get total count of recursive sub nodes of one node

commit 67760fed151fce49f29fabc577eef19216cef94b
Author: tianyiqun <891707263@...>
Date:   2018-11-24T11:12:43Z

Merge branch 'ZOOKEEPER-3167' of https://github.com/TyqITstudent/zookeeper 
into ZOOKEEPER-3167




---


[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...

2018-11-24 Thread TyqITstudent
Github user TyqITstudent closed the pull request at:

https://github.com/apache/zookeeper/pull/720


---


[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...

2018-11-24 Thread TyqITstudent
Github user TyqITstudent commented on the issue:

https://github.com/apache/zookeeper/pull/720
  
> At every push CI will retest your work.
> Alternatively you can close and reopen the PR

Your advice is good, I will change some parts of my code.


---


[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...

2018-11-24 Thread TyqITstudent
Github user TyqITstudent commented on the issue:

https://github.com/apache/zookeeper/pull/720
  
name the API 'countChildren'
add a flag to ask for a recursive traversal or simply have the count 
without listing
add also the async version of the method

1.  your name is better, I can change method name.
2.  you means that the method returns the number of first level children 
(if flag is false)?
3.  I  will add the async method.



---


ZooKeeper_branch34_openjdk8 - Build # 132 - Failure

2018-11-24 Thread Apache Jenkins Server
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk8/132/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 43.45 KB...]
[junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.208 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.57 sec
[junit] Running org.apache.zookeeper.test.SaslAuthFailTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.691 sec
[junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.582 sec
[junit] Running org.apache.zookeeper.test.SaslClientTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.077 sec
[junit] Running org.apache.zookeeper.test.SessionInvalidationTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.703 sec
[junit] Running org.apache.zookeeper.test.SessionTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
11.124 sec
[junit] Running org.apache.zookeeper.test.SessionTimeoutTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.954 sec
[junit] Running org.apache.zookeeper.test.StandaloneTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.941 sec
[junit] Running org.apache.zookeeper.test.StatTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.903 sec
[junit] Running org.apache.zookeeper.test.StaticHostProviderTest
[junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.711 sec
[junit] Running org.apache.zookeeper.test.SyncCallTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.655 sec
[junit] Running org.apache.zookeeper.test.TruncateTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
9.554 sec
[junit] Running org.apache.zookeeper.test.UpgradeTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.94 sec
[junit] Running org.apache.zookeeper.test.WatchedEventTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.084 sec
[junit] Running org.apache.zookeeper.test.WatcherFuncTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
1.452 sec
[junit] Running org.apache.zookeeper.test.WatcherTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
28.087 sec
[junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
10.891 sec
[junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.727 sec
[junit] Running org.apache.jute.BinaryInputArchiveTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
0.076 sec

fail.build.on.test.failure:

BUILD FAILED
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk8/build.xml:1408:
 The following error occurred while executing this line:
/home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk8/build.xml:1411:
 Tests failed!

Total time: 39 minutes 37 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Recording test results
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/
Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testNewFollowerRestartAfterNewEpoch

Error Message:
Waiting too long

Stack Trace:
java.lang.RuntimeException: Waiting too long
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:449)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:439)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.LaunchServers(QuorumPeerMainTest.java:547)
at 

[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...

2018-11-24 Thread eolivelli
Github user eolivelli commented on the issue:

https://github.com/apache/zookeeper/pull/720
  
At every push CI will retest your work.
Alternatively you can close and reopen the PR


---


[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...

2018-11-24 Thread TyqITstudent
Github user TyqITstudent commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/720#discussion_r236039566
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java ---
@@ -616,6 +616,14 @@ private void processEvent(Object event) {
   } else {
   cb.processResult(rc, clientPath, p.ctx, null);
   }
+  } else if (p.response instanceof 
GetAllChildrenNumberResponse) {
--- End diff --

> Also need to import GetAllChildrenNumberResponse

I have added the import, and how to re-trigger the Jenkins build to check 
the code?


---


[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...

2018-11-24 Thread nkalmar
Github user nkalmar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/720#discussion_r236036464
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java ---
@@ -2495,6 +2495,30 @@ public void setACL(final String path, List acl, 
int version,
 return getChildren(path, watch ? watchManager.defaultWatcher : 
null);
 }
 
+/*
+*  Get all children number of one node
+* */
+public int getAllChildrenNumber(final String path)
+throws KeeperException, InterruptedException {
+int totalNumber = 0;
+final String clientPath = path;
+PathUtils.validatePath(clientPath);
+// the watch contains the un-chroot path
+WatchRegistration wcb = null;
+final String serverPath = prependChroot(clientPath);
+RequestHeader h = new RequestHeader();
+h.setType(ZooDefs.OpCode.getAllChildrenNumber);
+GetAllChildrenNumberRequest request = new 
GetAllChildrenNumberRequest();
--- End diff --

You need to import the class GetAllChildrenNumberRequest which jute 
generates.


---


[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...

2018-11-24 Thread nkalmar
Github user nkalmar commented on a diff in the pull request:

https://github.com/apache/zookeeper/pull/720#discussion_r236036475
  
--- Diff: 
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java ---
@@ -616,6 +616,14 @@ private void processEvent(Object event) {
   } else {
   cb.processResult(rc, clientPath, p.ctx, null);
   }
+  } else if (p.response instanceof 
GetAllChildrenNumberResponse) {
--- End diff --

Also need to import GetAllChildrenNumberResponse 


---


[jira] [Updated] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper

2018-11-24 Thread Ankit Kothana (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Kothana updated ZOOKEEPER-3199:
-
Description: 
We are using Zookeeper in our system along with Apache Kafka. However, 
Zookeeper is not producing any relevant logs (even with lower log levels 
specified in log4j.properties) in the log file that could help us in 
identifying what is currently going on in ZK or Kafka cluster. 

Please let us know how to retrieve proper logs from ZK cluster.

Version of ZK : 3.4.13

  was:
We are using Zookeeper in our system along with Apache Kafka. However, 
Zookeeper is not producing any relevant logs (even with lower log levels 
specified in log4j.properties) in the log file that could help us in 
identifying what is currently going on in ZK or Kafka cluster. 

Please let us know how to retrieve proper logs from ZK cluster.

Version of ZK : 3.x


> Unable to produce verbose logs of Zookeeper
> ---
>
> Key: ZOOKEEPER-3199
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ankit Kothana
>Priority: Major
> Attachments: log4j.properties, zookeeper.log
>
>
> We are using Zookeeper in our system along with Apache Kafka. However, 
> Zookeeper is not producing any relevant logs (even with lower log levels 
> specified in log4j.properties) in the log file that could help us in 
> identifying what is currently going on in ZK or Kafka cluster. 
> Please let us know how to retrieve proper logs from ZK cluster.
> Version of ZK : 3.4.13



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper

2018-11-24 Thread Ankit Kothana (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Kothana updated ZOOKEEPER-3199:
-
Attachment: log4j.properties
zookeeper.log

> Unable to produce verbose logs of Zookeeper
> ---
>
> Key: ZOOKEEPER-3199
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ankit Kothana
>Priority: Major
> Attachments: log4j.properties, zookeeper.log
>
>
> We are using Zookeeper in our system along with Apache Kafka. However, 
> Zookeeper is not producing any relevant logs (even with lower log levels 
> specified in log4j.properties) in the log file that could help us in 
> identifying what is currently going on in ZK or Kafka cluster. 
> Please let us know how to retrieve proper logs from ZK cluster.
> Version of ZK : 3.x



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper

2018-11-24 Thread Ankit Kothana (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697683#comment-16697683
 ] 

Ankit Kothana edited comment on ZOOKEEPER-3199 at 11/24/18 8:03 AM:


Took reference of https://issues.apache.org/jira/browse/ZOOKEEPER-2170 and 
updated zkEnv.sh to change root level of logging to DEBUG, but it didn't 
resolved the issue.

Attaching log4j.properties and zookeeper.log files

 


was (Author: ankit.kothana):
Took reference of https://issues.apache.org/jira/browse/ZOOKEEPER-2170 and 
updated zkEnv.sh to change root level of logging to DEBUG, but it didn't 
resolved the issue.

zookeeper.log
{quote}ZooKeeper JMX enabled by default
Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
{quote}
log4j.properties
{quote}# Define some default values that can be overridden by system properties
zookeeper.root.logger=DEBUG, CONSOLE
zookeeper.console.threshold=DEBUG
zookeeper.log.dir=/var/log/zookeeper
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=DEBUG
zookeeper.tracelog.dir=.
zookeeper.tracelog.file=zookeeper_trace.log

#
# ZooKeeper Logging Configuration
#

# Format is " (, )+

# DEFAULT: console appender only
log4j.rootLogger=${zookeeper.root.logger}

# Example with rolling log file
#log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE

# Example with rolling log file and tracing
#log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE

#
# Log INFO level and above messages to the console
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold}
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - 
%-5p [%t:%C\{1}@%L] - %m%n

#
# Add ROLLINGFILE to rootLogger to get log file output
# Log DEBUG level and above messages to a log file
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=${zookeeper.log.threshold}
log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file}

# Max log file size of 10MB
log4j.appender.ROLLINGFILE.MaxFileSize=10MB
# uncomment the next line to limit number of backup files
#log4j.appender.ROLLINGFILE.MaxBackupIndex=10

log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d\{ISO8601} 
[myid:%X\{myid}] - %-5p [%t:%C\{1}@%L] - %m%n


#
# Add TRACEFILE to rootLogger to get log file output
# Log DEBUG level and above messages to a log file
log4j.appender.TRACEFILE=org.apache.log4j.FileAppender
log4j.appender.TRACEFILE.Threshold=TRACE
log4j.appender.TRACEFILE.File=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file}

log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x)
log4j.appender.TRACEFILE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] 
- %-5p [%t:%C\{1}@%L][%x] - %m%n
{quote}
 

> Unable to produce verbose logs of Zookeeper
> ---
>
> Key: ZOOKEEPER-3199
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ankit Kothana
>Priority: Major
>
> We are using Zookeeper in our system along with Apache Kafka. However, 
> Zookeeper is not producing any relevant logs (even with lower log levels 
> specified in log4j.properties) in the log file that could help us in 
> identifying what is currently going on in ZK or Kafka cluster. 
> Please let us know how to retrieve proper logs from ZK cluster.
> Version of ZK : 3.x



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper

2018-11-24 Thread Ankit Kothana (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697683#comment-16697683
 ] 

Ankit Kothana commented on ZOOKEEPER-3199:
--

Took reference of https://issues.apache.org/jira/browse/ZOOKEEPER-2170 and 
updated zkEnv.sh to change root level of logging to DEBUG, but it didn't 
resolved the issue.

zookeeper.log
{quote}ZooKeeper JMX enabled by default
Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... STOPPED
ZooKeeper JMX enabled by default
Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
{quote}
log4j.properties
{quote}# Define some default values that can be overridden by system properties
zookeeper.root.logger=DEBUG, CONSOLE
zookeeper.console.threshold=DEBUG
zookeeper.log.dir=/var/log/zookeeper
zookeeper.log.file=zookeeper.log
zookeeper.log.threshold=DEBUG
zookeeper.tracelog.dir=.
zookeeper.tracelog.file=zookeeper_trace.log

#
# ZooKeeper Logging Configuration
#

# Format is " (, )+

# DEFAULT: console appender only
log4j.rootLogger=${zookeeper.root.logger}

# Example with rolling log file
#log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE

# Example with rolling log file and tracing
#log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE

#
# Log INFO level and above messages to the console
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold}
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - 
%-5p [%t:%C\{1}@%L] - %m%n

#
# Add ROLLINGFILE to rootLogger to get log file output
# Log DEBUG level and above messages to a log file
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=${zookeeper.log.threshold}
log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file}

# Max log file size of 10MB
log4j.appender.ROLLINGFILE.MaxFileSize=10MB
# uncomment the next line to limit number of backup files
#log4j.appender.ROLLINGFILE.MaxBackupIndex=10

log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d\{ISO8601} 
[myid:%X\{myid}] - %-5p [%t:%C\{1}@%L] - %m%n


#
# Add TRACEFILE to rootLogger to get log file output
# Log DEBUG level and above messages to a log file
log4j.appender.TRACEFILE=org.apache.log4j.FileAppender
log4j.appender.TRACEFILE.Threshold=TRACE
log4j.appender.TRACEFILE.File=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file}

log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x)
log4j.appender.TRACEFILE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] 
- %-5p [%t:%C\{1}@%L][%x] - %m%n
{quote}
 

> Unable to produce verbose logs of Zookeeper
> ---
>
> Key: ZOOKEEPER-3199
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ankit Kothana
>Priority: Major
>
> We are using Zookeeper in our system along with Apache Kafka. However, 
> Zookeeper is not producing any relevant logs (even with lower log levels 
> specified in log4j.properties) in the log file that could help us in 
> identifying what is currently going on in ZK or Kafka cluster. 
> Please let us know how to retrieve proper logs from ZK cluster.
> Version of ZK : 3.x



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)