[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user tumativ commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236063610 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) { totalDeadWatchers.get() >= maxInProcessingDeadWatchers) { try { RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning"); -synchronized(totalDeadWatchers) { -totalDeadWatchers.wait(100); +synchronized(processingCompletedEvent) { + processingCompletedEvent.wait(100); --- End diff -- Done ---
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user tumativ commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236063614 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -163,8 +165,8 @@ public void doWork() throws Exception { long latency = Time.currentElapsedTime() - startTime; LOG.info("Takes {} to process {} watches", latency, total); totalDeadWatchers.addAndGet(-total); -synchronized(totalDeadWatchers) { -totalDeadWatchers.notifyAll(); +synchronized(processingCompletedEvent) { + processingCompletedEvent.notifyAll(); --- End diff -- Done ---
[GitHub] zookeeper issue #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thread and...
Github user lvfangmin commented on the issue: https://github.com/apache/zookeeper/pull/689 @tumativ thanks for working on this, only a minor comment now, I'll merge this when you updated this. ---
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236063171 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -102,12 +103,13 @@ public void addDeadWatcher(int watcherBit) { totalDeadWatchers.get() >= maxInProcessingDeadWatchers) { try { RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning"); -synchronized(totalDeadWatchers) { -totalDeadWatchers.wait(100); +synchronized(processingCompletedEvent) { + processingCompletedEvent.wait(100); --- End diff -- Can we change this with 4 extra white spaces relative to the synchronized statement? ---
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236063217 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -102,24 +104,24 @@ public void addDeadWatcher(int watcherBit) { totalDeadWatchers.get() >= maxInProcessingDeadWatchers) { try { RATE_LOGGER.rateLimitLog("Waiting for dead watchers cleaning"); -synchronized(totalDeadWatchers) { -totalDeadWatchers.wait(100); -} -} catch (InterruptedException e) { -LOG.info("Got interrupted while waiting for dead watches " + -"queue size"); -} -} -synchronized (this) { -if (deadWatchers.add(watcherBit)) { -totalDeadWatchers.incrementAndGet(); -if (deadWatchers.size() >= watcherCleanThreshold) { -synchronized (cleanEvent) { -cleanEvent.notifyAll(); -} -} -} + synchronized (processingCompletedEvent) { --- End diff -- We don't have a general format wiki yet, we'll work on that, in general it's 4 white space after { line. ---
[GitHub] zookeeper pull request #689: ZOOKEEPER-3183:Notifying the WatcherCleaner thr...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/689#discussion_r236063177 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/server/watch/WatcherCleaner.java --- @@ -163,8 +165,8 @@ public void doWork() throws Exception { long latency = Time.currentElapsedTime() - startTime; LOG.info("Takes {} to process {} watches", latency, total); totalDeadWatchers.addAndGet(-total); -synchronized(totalDeadWatchers) { -totalDeadWatchers.notifyAll(); +synchronized(processingCompletedEvent) { + processingCompletedEvent.notifyAll(); --- End diff -- ditto. ---
[GitHub] zookeeper pull request #692: ZOOKEEPER-3184: Use the same method to generate...
Github user lvfangmin commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/692#discussion_r236063130 --- Diff: README.md --- @@ -1,49 +1,24 @@ ## Generating the static Apache ZooKeeper website -In this directory you will find text files formatted using Markdown, with an `.md` suffix. +In the `src/main/resources/markdown` directory you will find text files formatted using Markdown, with an `.md` suffix. -Building the site requires [Jekyll](http://jekyllrb.com/docs) 3.6.2 or newer. -The easiest way to install jekyll is via a Ruby Gem. Jekyll will create a directory called `_site` -containing `index.html` as well as the rest of the compiled directories and files. _site should not -be committed to git as this is the generated content. - -To install Jekyll and its required dependencies, execute `sudo gem install jekyll pygments.rb` -and `sudo pip install Pygments`. See the Jekyll installation page for more details. +Building the site requires [Maven](http://maven.apache.org/) 3.5.0 or newer. +The easiest way to [install Maven](http://maven.apache.org/install.html) depends on your OS. +The build process will create a directory called `target/html` containing `index.html` as well as the rest of the +compiled directories and files. `target` should not be committed to git as it is generated content. You can generate the static ZooKeeper website by running: -1. `jekyll build` in this directory. -2. `cp -RP _released_docs _site/doc` - this will include the documentation (see "sub-dir" section below) in the generated site. +1. `mvn clean install` in this directory. +2. `cp -RP _released_docs _target/html` - this will include the documentation (see "sub-dir" section below) in the generated site. --- End diff -- Thanks @tamaashu, can you update the readme to point out how to copy those doc? One suggest, the top and bottom layout is a bit strange, they're too wide comparing to the main content on that page, can we make those not that wide? ---
[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698085#comment-16698085 ] Michael K. Edwards commented on ZOOKEEPER-2778: --- Note that the current version of this patch also addresses ZOOKEEPER-2488. > Potential server deadlock between follower sync with leader and follower > receiving external connection requests. > > > Key: ZOOKEEPER-2778 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.5.3 >Reporter: Michael Han >Assignee: Michael K. Edwards >Priority: Blocker > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Time Spent: 6h 20m > Remaining Estimate: 0h > > It's possible to have a deadlock during recovery phase. > Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest > [1]. . Here is a sample thread dump that illustrates the state of the > execution: > {noformat} > [junit] java.lang.Thread.State: BLOCKED > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369) > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642) > [junit] > [junit] java.lang.Thread.State: BLOCKED > [junit] at > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471) > [junit] at > org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520) > [junit] at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88) > [junit] at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133) > {noformat} > The dead lock happens between the quorum peer thread which running the > follower that doing sync with leader work, and the listener of the qcm of the > same quorum peer that doing the receiving connection work. Basically to > finish sync with leader, the follower needs to synchronize on both QV_LOCK > and the qmc object it owns; while in the receiver thread to finish setup an > incoming connection the thread needs to synchronize on both the qcm object > the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here > is the order of acquiring two locks are different, thus depends on timing / > actual execution order, two threads might end up acquiring one lock while > holding another. > [1] > org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] zookeeper issue #528: ZOOKEEPER-3034 Facing issues while building from sourc...
Github user mkedwards commented on the issue: https://github.com/apache/zookeeper/pull/528 I'd like to see this brought back. Not cool to drop autoconf when the CMake setup can't build shared libraries. From `zookeeper-client/zookeeper-client-c/README`: ``` Current limitations of the CMake build system include lack of Solaris support, no shared library option, no explicitly exported symbols (all are exported by default), no versions on the libraries, and no documentation generation. ``` ---
Re: ReconfigInProgress error
I don't often admit defeat; but I can't make heads or tails of the error handling (or lack thereof) in the reconfiguration code paths. If anybody wants to take a stab at explaining which parts of the processAck -> tryToCommit -> processReconfig -> reconfigure call chain should and shouldn't go through if the bind() call fails, maybe I can try to write tests that verify that and modify the code under test to behave accordingly. I've filed ZOOKEEPER-3198 as an umbrella for this work, and pushed what I've got to https://github.com/mkedwards/zookeeper/tree/broken-bind-3.5, in case somebody wants to try to take it forward from there. In the meantime, I'm running tests in parallel inside a Docker container (with a code state that has patches applied for all three 3.5 blocker/critical Jiras). Nothing seems "flaky" yet. We'll deploy this in our QA environment next week, and throw some load at it, and see what happens. (And run the test suite a few hundred times, too.) Alex (or anyone else), do you consider any of the other outstanding Jiras to be obstacles to exercising the reconfiguration features in 3.5.x on a production cluster? How serious is https://issues.apache.org/jira/browse/ZOOKEEPER-2202 ? Is it related to https://issues.apache.org/jira/browse/ZOOKEEPER-2836 ? And how serious is https://issues.apache.org/jira/browse/ZOOKEEPER-1896 ? Does mixing 3.4.x and 3.5.x in the same cluster work? Is it best to disable reconfig while migrating cluster members from 3.4.x to 3.5.x, and then enable reconfig and do a rolling restart? On Sat, Nov 24, 2018 at 12:13 PM Alexander Shraer wrote: > > Hi Michael, > > In general, one reconfig op is allowed at a time, and this error indicates > that one is already in progress. If there are enough peers to form a quorum a > failure to connect to one of them shouldn’t be a problem. If there is not > enough, the leader is supposed to give up leadership. This is true in > general, unrelated to reconfig. A new leader will be elected and complete any > reconfig in progress. That’s the theory at least, there may be a bug in the > case you found. > > Some general flow is described in Sec 3.2 of our paper, > https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf > > There are also the wiki docs but they don’t talk about recovery much. > https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html > > Btw > > > robustness against > Byzantine faults that one is led to expect from Zookeeper? > > ZK is not designed to handle Byzantine faults in general. It’s not to say > that there is no bug In the case you found. > > Thanks, > Alex > > On Sat, Nov 24, 2018 at 11:32 AM Michael K. Edwards > wrote: >> >> I've been experimenting a bit with trying to propagate failures to >> bind() server ports in tests up to where we can do something about it. >> There's at least one category of test cases (callers of >> ReconfigTest.testPortChangeToBlockedPort) where the server is supposed >> to ride through a bind() failure, recovering on a subsequent >> reconfiguration. In my current code state, I'm encountering errors >> like this: >> >> 2018-11-24 11:04:46,252 [myid:] - INFO [ProcessThread(sid:3 >> cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException >> when processing sessionid:0x1002b98aa83 type:reconfig cxid:0x1e >> zxid:0x1002b txntype:-1 reqpath:n/a Error Path:null >> Error:KeeperErrorCode = ReconfigInProgress >> >> I can hack things until this particular test passes, but it raises >> questions about reconfiguration in general. How exactly is the >> cluster supposed to get out of this state? If a cluster member drops >> out of contact with the quorum while there is a reconfiguration in >> flight, is there any recovery path that restores the ability to >> process a reconfigure operation? Is there a design doc for >> reconfiguration that demonstrates the kind of robustness against >> Byzantine faults that one is led to expect from Zookeeper?
[jira] [Commented] (ZOOKEEPER-3198) Handle port-binding failures in a systematic and documented fashion
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698065#comment-16698065 ] Michael K. Edwards commented on ZOOKEEPER-3198: --- An attempt (as yet, not very successful) to plumb BindExceptions up the stack is in https://github.com/mkedwards/zookeeper/tree/broken-bind-3.5 . I'm currently foundering on test cases that call ReconfigTest.testPortChangeToBlockedPort(). > Handle port-binding failures in a systematic and documented fashion > --- > > Key: ZOOKEEPER-3198 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3198 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.5.3, 3.6.0, 3.4.13 >Reporter: Michael K. Edwards >Priority: Major > Fix For: 3.6.0, 3.5.5, 3.4.14 > > > Many test failures appear to result from bind failures due to port conflicts. > This can arise in normal use as well. Presently the code swallows the > exception (with an error log) at a low level. It would probably be useful to > throw the exception far enough up the stack to trigger retry with a new port > (in tests) or a high-level (perhaps even fatal) error message (in normal use). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: ReconfigInProgress error
Hi Michael, In general, one reconfig op is allowed at a time, and this error indicates that one is already in progress. If there are enough peers to form a quorum a failure to connect to one of them shouldn’t be a problem. If there is not enough, the leader is supposed to give up leadership. This is true in general, unrelated to reconfig. A new leader will be elected and complete any reconfig in progress. That’s the theory at least, there may be a bug in the case you found. Some general flow is described in Sec 3.2 of our paper, https://www.usenix.org/system/files/conference/atc12/atc12-final74.pdf There are also the wiki docs but they don’t talk about recovery much. https://zookeeper.apache.org/doc/r3.5.3-beta/zookeeperReconfig.html Btw > robustness against Byzantine faults that one is led to expect from Zookeeper? ZK is not designed to handle Byzantine faults in general. It’s not to say that there is no bug In the case you found. Thanks, Alex On Sat, Nov 24, 2018 at 11:32 AM Michael K. Edwards wrote: > I've been experimenting a bit with trying to propagate failures to > bind() server ports in tests up to where we can do something about it. > There's at least one category of test cases (callers of > ReconfigTest.testPortChangeToBlockedPort) where the server is supposed > to ride through a bind() failure, recovering on a subsequent > reconfiguration. In my current code state, I'm encountering errors > like this: > > 2018-11-24 11:04:46,252 [myid:] - INFO [ProcessThread(sid:3 > cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException > when processing sessionid:0x1002b98aa83 type:reconfig cxid:0x1e > zxid:0x1002b txntype:-1 reqpath:n/a Error Path:null > Error:KeeperErrorCode = ReconfigInProgress > > I can hack things until this particular test passes, but it raises > questions about reconfiguration in general. How exactly is the > cluster supposed to get out of this state? If a cluster member drops > out of contact with the quorum while there is a reconfiguration in > flight, is there any recovery path that restores the ability to > process a reconfigure operation? Is there a design doc for > reconfiguration that demonstrates the kind of robustness against > Byzantine faults that one is led to expect from Zookeeper? >
[jira] [Assigned] (ZOOKEEPER-3113) EphemeralType.get() fails to verify ephemeralOwner when currentElapsedTime() is small enough
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3113: --- Assignee: Andor Molnar > EphemeralType.get() fails to verify ephemeralOwner when currentElapsedTime() > is small enough > > > Key: ZOOKEEPER-3113 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3113 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.4, 3.6.0 >Reporter: Andor Molnar >Assignee: Andor Molnar >Priority: Critical > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5 > > Time Spent: 4.5h > Remaining Estimate: 0h > > EphemeralTypeTest.testServerIds() unit test fails on some systems that > System.nanoTime() is smaller than a certain value. > The test generates ephemeralOwner in the old way (pre ZOOKEEPER-2901) without > enabling the emulation flag and asserts for exception to be thrown when > serverId == 255. This is right. ZooKeeper should fail on this case, because > serverId cannot be larger than 254 if extended types are enabled. In this > case ephemeralOwner with 0xff in the most significant byte indicates an > extended type. > The logic which does the validation is in EphemeralType.get(). > It checks 2 things: > * the extended type byte is set: 0xff, > * reserved bits (next 2 bytes) corresponds to a valid extended type. > Here is the problem: currently we only have 1 extended type: TTL with value > of 0x in the reserved bits. > Logic expects that if we have anything different from it in the reserved > bits, the ephemeralOwner is invalid and exception should be thrown. That's > what the test asserts for and it works on most systems, because the timestamp > part of the sessionId usually have some bits in the reserved bits as well > which eventually will be larger than 0, so the value is unsupported. > I think the problem is twofold: > * Either if we have more extended types, we'll increase the possibility that > this logic will accept invalid sessionIds (as long as reserved bits indicate > a valid extended type), > * Or (which happens on some systems) if the currentElapsedTime (timestamp > part of sessionId) is small enough and doesn't occupy reserved bits, this > logic will accept the invalid sessionId. > Unfortunately I cannot repro the problem yet: it constantly happens on a > specific Jenkins slave, but even with the same distro and same JDK version I > cannot reproduce the same nanoTime() values. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3067) Optionally suppress client environment logging.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3067: --- Assignee: James Peach > Optionally suppress client environment logging. > --- > > Key: ZOOKEEPER-3067 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3067 > Project: ZooKeeper > Issue Type: Task > Components: c client >Reporter: James Peach >Assignee: James Peach >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > It would be helpful to add a {{zookeeper_init}} flag to suppress the client > environment logging. In our deployment, this causes LDAP lookups for the > current user ID, which is otherwise an unnecessary service dependency for > ZooKeeper clients. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3162) Broken lock semantics in C client lock-recipe
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3162: --- Assignee: Andrea Reale > Broken lock semantics in C client lock-recipe > - > > Key: ZOOKEEPER-3162 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3162 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.0.0, 3.4.13 >Reporter: Andrea Reale >Assignee: Andrea Reale >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0, 3.5.5, 3.4.14 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > As reported (but never fixed) in the past by ZOOKEEPER-2409, ZOOKEEPER-2038 > and (partly) ZOOKEEPER-2878, the C client lock-recipe implementation is > broken. > I identified three issues. > The main one (as also reported in the aforementioned reports) is that the > logic that goes through the lock waiting list is broken. child_floor uses > strcmp and compares the full node name (i.e., sessionID-sequence) rather than > only comparing the sequence number. This makes it possible for two different > clients to hold the lock at the same time: assume two clients, one associated > with session A, the other with session B, with A < B lexicographically. Now > assume that at some point a thread in B holds a lock and a thread in A tries > to acquire the same lock. A will manage to get the lock because of the wrong > comparison function, so now two guys hold the lock. > The second issue is a possible deadlock inside zkr_lock_operation. > zkr_lock_operation is always called by holding the mutex associated to the > client lock. In some cases, zkr_lock_operaton may decide to give-up locking > and call zkr_lock_unlock to release the lock. When this happens, it will try > to acquire again the same phtread mutex, which will lead to a deadlock. > The third issue relates to the return value of zkr_lock_lock. According to > the API docs, the functions returns 0 when no errors. Then it is up to the > invoker to check when the lock is held by calling zkr_lock_isowner. However, > the implementation, in case of no error, returns zkr_lock_isowner. This is > wrong because it becomes impossible to distinguish an error condition from a > success (but not ownerhsip). Instead the API (as described in the docs, btw) > should return always 0 when no errors occur. > Shortly I will add the link to a PR fixing the issues. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3072) Race condition in throttling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3072: --- Assignee: Botond Hejj > Race condition in throttling > > > Key: ZOOKEEPER-3072 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3072 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.0, 3.5.1, 3.5.2, 3.5.3, 3.5.4 >Reporter: Botond Hejj >Assignee: Botond Hejj >Priority: Major > Labels: pull-request-available > Fix For: 3.5.4, 3.6.0 > > Time Spent: 2h 40m > Remaining Estimate: 0h > > There is a race condition in the server throttling code. It is possible that > the disableRecv is called after enableRecv. > Basically, the I/O work thread does this in processPacket: > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L1102] > > submitRequest(si); > } > } > cnxn.incrOutstandingRequests(h); > } > > incrOutstandingRequests() checks for limit breach, and potentially turns on > throttling, > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L384] > > submitRequest() will create a logical request and en-queue it so that > Processor thread can pick it up. After being de-queued by Processor thread, > it does necessary handling, and then calls this > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/FinalRequestProcessor.java#L459] > : > > cnxn.sendResponse(hdr, rsp, "response"); > > and in sendResponse(), it first appends to outgoing buffer, and then checks > if un-throttle is needed: > [https://github.com/apache/zookeeper/blob/release-3.5.3/src/java/main/org/apache/zookeeper/server/NIOServerCnxn.java#L708] > > However, if there is a context switch between submitRequest() and > cnxn.incrOutstandingRequests(), so that Processor thread completes > cnxn.sendResponse() call before I/O thread switches back, then enableRecv() > will happen before disableRecv(), and enableRecv() will fail the CAS ops, > while disableRecv() will succeed, resulting in a deadlock: un-throttle is > needed for letting in requests, and sendResponse is needed to trigger > un-throttle, but sendResponse() requires an incoming message. From that point > on, ZK server will no longer select the affected client socket for read, > leading to the observed client-side failure in the subject. > If you would like to reproduce this than setting the globalOutstandingLimit > down to 1 makes this reproducible easier as throttling starts with less > requests. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3061) add more details to 'Unhandled scenario for peer' log.warn message
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3061: --- Assignee: Christine Poerschke > add more details to 'Unhandled scenario for peer' log.warn message > -- > > Key: ZOOKEEPER-3061 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3061 > Project: ZooKeeper > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Attachments: ZOOKEEPER-3061.patch > > Time Spent: 40m > Remaining Estimate: 0h > > A few lines earlier the {{LOG.info("Synchronizing with Follower sid: ...}} > logging already contains most relevant details but it would be convenient to > more directly have full details in the {{LOG.warn("Unhandled scenario for > peer sid: ...}} itself. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3137) add a utility to truncate logs to a zxid
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3137: --- Assignee: Brian Nixon > add a utility to truncate logs to a zxid > > > Key: ZOOKEEPER-3137 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3137 > Project: ZooKeeper > Issue Type: New Feature >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Trivial > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Add a utility that allows an admin to truncate a given transaction log to a > specified zxid. This can be similar to the existent LogFormatter. > Among the benefits, this allows an admin to put together a point-in-time view > of a data tree by manually mutating files from a saved backup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3142) Extend SnapshotFormatter to dump data in json format
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3142: --- Assignee: Brian Nixon > Extend SnapshotFormatter to dump data in json format > > > Key: ZOOKEEPER-3142 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3142 > Project: ZooKeeper > Issue Type: Improvement >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Trivial > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Json format can be chained into other tools such as ncdu. Extend the > SnapshotFormatter functionality to dump json. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-2325: --- Assignee: Andrew Grasso > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3190) Spell check on the Zookeeper server files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Hunt reassigned ZOOKEEPER-3190: --- Assignee: Dinesh Appavoo > Spell check on the Zookeeper server files > - > > Key: ZOOKEEPER-3190 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3190 > Project: ZooKeeper > Issue Type: Improvement > Components: documentation, other >Reporter: Dinesh Appavoo >Assignee: Dinesh Appavoo >Priority: Minor > Labels: newbie, pull-request-available > Fix For: 3.6.0 > > Time Spent: 40m > Remaining Estimate: 0h > > This JIRA is to do spell check on the zookeeper server files [ > zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server ]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
ReconfigInProgress error
I've been experimenting a bit with trying to propagate failures to bind() server ports in tests up to where we can do something about it. There's at least one category of test cases (callers of ReconfigTest.testPortChangeToBlockedPort) where the server is supposed to ride through a bind() failure, recovering on a subsequent reconfiguration. In my current code state, I'm encountering errors like this: 2018-11-24 11:04:46,252 [myid:] - INFO [ProcessThread(sid:3 cport:-1)::PrepRequestProcessor@878] - Got user-level KeeperException when processing sessionid:0x1002b98aa83 type:reconfig cxid:0x1e zxid:0x1002b txntype:-1 reqpath:n/a Error Path:null Error:KeeperErrorCode = ReconfigInProgress I can hack things until this particular test passes, but it raises questions about reconfiguration in general. How exactly is the cluster supposed to get out of this state? If a cluster member drops out of contact with the quorum while there is a reconfiguration in flight, is there any recovery path that restores the ability to process a reconfigure operation? Is there a design doc for reconfiguration that demonstrates the kind of robustness against Byzantine faults that one is led to expect from Zookeeper?
[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...
Github user eolivelli commented on the issue: https://github.com/apache/zookeeper/pull/720 @TyqITstudent For the 'recursive' flag I mean: - true: recurse thw tree - false: count only first level About the build...does the build pass locally on your machine? It seems you are not renaming classes/fields in every point of code ---
[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...
Github user TyqITstudent commented on the issue: https://github.com/apache/zookeeper/pull/720 > At every push CI will retest your work. > Alternatively you can close and reopen the PR When Jenkins check these parts, I change nothing but output are errors. Could you please help me to solve this? ï¼In compile areaï¼ [Jenkins compile result.](https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2745/artifact/patchprocess/trunkJavacWarnings.txt/*view*/) ---
[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...
GitHub user TyqITstudent reopened a pull request: https://github.com/apache/zookeeper/pull/720 add an API to get total count of recursive sub nodes of one node In production environment, there will be always a situation that there are a lot of recursive sub nodes of one node. We need to count total number of it. Now, we can only use API getChildren which returns the List of first level of sub nodes. We need to iterate every sub node to get recursive sub nodes. It will cost a lot of time. In zookeeper server side, it uses Hasp to store node. The key of the map represents the path of the node. We can iterate the map get total number of all levels of sub nodes of one node. You can merge this pull request into a Git repository by running: $ git pull https://github.com/TyqITstudent/zookeeper ZOOKEEPER-3167 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/720.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #720 commit f21dab121f255959032148e6608b84c12ed0bd68 Author: tianyiqun <891707263@...> Date: 2018-11-24T06:39:30Z add an API to get total count of recursive sub nodes of one node commit 1b527726f52499aa943de1ec63de4ce9967300cf Author: tianyiqun <891707263@...> Date: 2018-11-24T06:39:30Z add an API to get total count of recursive sub nodes of one node commit 67760fed151fce49f29fabc577eef19216cef94b Author: tianyiqun <891707263@...> Date: 2018-11-24T11:12:43Z Merge branch 'ZOOKEEPER-3167' of https://github.com/TyqITstudent/zookeeper into ZOOKEEPER-3167 ---
[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...
Github user TyqITstudent closed the pull request at: https://github.com/apache/zookeeper/pull/720 ---
[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...
Github user TyqITstudent commented on the issue: https://github.com/apache/zookeeper/pull/720 > At every push CI will retest your work. > Alternatively you can close and reopen the PR Your advice is good, I will change some parts of my code. ---
[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...
Github user TyqITstudent commented on the issue: https://github.com/apache/zookeeper/pull/720 name the API 'countChildren' add a flag to ask for a recursive traversal or simply have the count without listing add also the async version of the method 1. your name is better, I can change method name. 2. you means that the method returns the number of first level children ï¼if flag is falseï¼? 3. I will add the async method. ---
ZooKeeper_branch34_openjdk8 - Build # 132 - Failure
See https://builds.apache.org/job/ZooKeeper_branch34_openjdk8/132/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 43.45 KB...] [junit] Running org.apache.zookeeper.test.SaslAuthFailDesignatedClientTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.208 sec [junit] Running org.apache.zookeeper.test.SaslAuthFailNotifyTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.57 sec [junit] Running org.apache.zookeeper.test.SaslAuthFailTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.691 sec [junit] Running org.apache.zookeeper.test.SaslAuthMissingClientConfigTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.582 sec [junit] Running org.apache.zookeeper.test.SaslClientTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.077 sec [junit] Running org.apache.zookeeper.test.SessionInvalidationTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.703 sec [junit] Running org.apache.zookeeper.test.SessionTest [junit] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.124 sec [junit] Running org.apache.zookeeper.test.SessionTimeoutTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.954 sec [junit] Running org.apache.zookeeper.test.StandaloneTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.941 sec [junit] Running org.apache.zookeeper.test.StatTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.903 sec [junit] Running org.apache.zookeeper.test.StaticHostProviderTest [junit] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.711 sec [junit] Running org.apache.zookeeper.test.SyncCallTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.655 sec [junit] Running org.apache.zookeeper.test.TruncateTest [junit] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 9.554 sec [junit] Running org.apache.zookeeper.test.UpgradeTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.94 sec [junit] Running org.apache.zookeeper.test.WatchedEventTest [junit] Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.084 sec [junit] Running org.apache.zookeeper.test.WatcherFuncTest [junit] Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.452 sec [junit] Running org.apache.zookeeper.test.WatcherTest [junit] Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 28.087 sec [junit] Running org.apache.zookeeper.test.ZkDatabaseCorruptionTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.891 sec [junit] Running org.apache.zookeeper.test.ZooKeeperQuotaTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.727 sec [junit] Running org.apache.jute.BinaryInputArchiveTest [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.076 sec fail.build.on.test.failure: BUILD FAILED /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk8/build.xml:1408: The following error occurred while executing this line: /home/jenkins/jenkins-slave/workspace/ZooKeeper_branch34_openjdk8/build.xml:1411: Tests failed! Total time: 39 minutes 37 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/ Recording test results Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/ Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/ Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/ Email was triggered for: Failure - Any Sending email for trigger: Failure - Any Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/ Setting OPENJDK_8_ON_UBUNTU_ONLY__HOME=/usr/lib/jvm/java-8-openjdk-amd64/ ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testNewFollowerRestartAfterNewEpoch Error Message: Waiting too long Stack Trace: java.lang.RuntimeException: Waiting too long at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:449) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.waitForAll(QuorumPeerMainTest.java:439) at org.apache.zookeeper.server.quorum.QuorumPeerMainTest.LaunchServers(QuorumPeerMainTest.java:547) at
[GitHub] zookeeper issue #720: add an API to get total count of recursive sub nodes o...
Github user eolivelli commented on the issue: https://github.com/apache/zookeeper/pull/720 At every push CI will retest your work. Alternatively you can close and reopen the PR ---
[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...
Github user TyqITstudent commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/720#discussion_r236039566 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java --- @@ -616,6 +616,14 @@ private void processEvent(Object event) { } else { cb.processResult(rc, clientPath, p.ctx, null); } + } else if (p.response instanceof GetAllChildrenNumberResponse) { --- End diff -- > Also need to import GetAllChildrenNumberResponse I have added the import, and how to re-trigger the Jenkins build to check the code? ---
[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...
Github user nkalmar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/720#discussion_r236036464 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/ZooKeeper.java --- @@ -2495,6 +2495,30 @@ public void setACL(final String path, List acl, int version, return getChildren(path, watch ? watchManager.defaultWatcher : null); } +/* +* Get all children number of one node +* */ +public int getAllChildrenNumber(final String path) +throws KeeperException, InterruptedException { +int totalNumber = 0; +final String clientPath = path; +PathUtils.validatePath(clientPath); +// the watch contains the un-chroot path +WatchRegistration wcb = null; +final String serverPath = prependChroot(clientPath); +RequestHeader h = new RequestHeader(); +h.setType(ZooDefs.OpCode.getAllChildrenNumber); +GetAllChildrenNumberRequest request = new GetAllChildrenNumberRequest(); --- End diff -- You need to import the class GetAllChildrenNumberRequest which jute generates. ---
[GitHub] zookeeper pull request #720: add an API to get total count of recursive sub ...
Github user nkalmar commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/720#discussion_r236036475 --- Diff: zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxn.java --- @@ -616,6 +616,14 @@ private void processEvent(Object event) { } else { cb.processResult(rc, clientPath, p.ctx, null); } + } else if (p.response instanceof GetAllChildrenNumberResponse) { --- End diff -- Also need to import GetAllChildrenNumberResponse ---
[jira] [Updated] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Kothana updated ZOOKEEPER-3199: - Description: We are using Zookeeper in our system along with Apache Kafka. However, Zookeeper is not producing any relevant logs (even with lower log levels specified in log4j.properties) in the log file that could help us in identifying what is currently going on in ZK or Kafka cluster. Please let us know how to retrieve proper logs from ZK cluster. Version of ZK : 3.4.13 was: We are using Zookeeper in our system along with Apache Kafka. However, Zookeeper is not producing any relevant logs (even with lower log levels specified in log4j.properties) in the log file that could help us in identifying what is currently going on in ZK or Kafka cluster. Please let us know how to retrieve proper logs from ZK cluster. Version of ZK : 3.x > Unable to produce verbose logs of Zookeeper > --- > > Key: ZOOKEEPER-3199 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ankit Kothana >Priority: Major > Attachments: log4j.properties, zookeeper.log > > > We are using Zookeeper in our system along with Apache Kafka. However, > Zookeeper is not producing any relevant logs (even with lower log levels > specified in log4j.properties) in the log file that could help us in > identifying what is currently going on in ZK or Kafka cluster. > Please let us know how to retrieve proper logs from ZK cluster. > Version of ZK : 3.4.13 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Kothana updated ZOOKEEPER-3199: - Attachment: log4j.properties zookeeper.log > Unable to produce verbose logs of Zookeeper > --- > > Key: ZOOKEEPER-3199 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ankit Kothana >Priority: Major > Attachments: log4j.properties, zookeeper.log > > > We are using Zookeeper in our system along with Apache Kafka. However, > Zookeeper is not producing any relevant logs (even with lower log levels > specified in log4j.properties) in the log file that could help us in > identifying what is currently going on in ZK or Kafka cluster. > Please let us know how to retrieve proper logs from ZK cluster. > Version of ZK : 3.x -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697683#comment-16697683 ] Ankit Kothana edited comment on ZOOKEEPER-3199 at 11/24/18 8:03 AM: Took reference of https://issues.apache.org/jira/browse/ZOOKEEPER-2170 and updated zkEnv.sh to change root level of logging to DEBUG, but it didn't resolved the issue. Attaching log4j.properties and zookeeper.log files was (Author: ankit.kothana): Took reference of https://issues.apache.org/jira/browse/ZOOKEEPER-2170 and updated zkEnv.sh to change root level of logging to DEBUG, but it didn't resolved the issue. zookeeper.log {quote}ZooKeeper JMX enabled by default Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg Stopping zookeeper ... STOPPED ZooKeeper JMX enabled by default Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED {quote} log4j.properties {quote}# Define some default values that can be overridden by system properties zookeeper.root.logger=DEBUG, CONSOLE zookeeper.console.threshold=DEBUG zookeeper.log.dir=/var/log/zookeeper zookeeper.log.file=zookeeper.log zookeeper.log.threshold=DEBUG zookeeper.tracelog.dir=. zookeeper.tracelog.file=zookeeper_trace.log # # ZooKeeper Logging Configuration # # Format is " (, )+ # DEFAULT: console appender only log4j.rootLogger=${zookeeper.root.logger} # Example with rolling log file #log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE # Example with rolling log file and tracing #log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE # # Log INFO level and above messages to the console # log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold} log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - %-5p [%t:%C\{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=${zookeeper.log.threshold} log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file} # Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize=10MB # uncomment the next line to limit number of backup files #log4j.appender.ROLLINGFILE.MaxBackupIndex=10 log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - %-5p [%t:%C\{1}@%L] - %m%n # # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file} log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout ### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - %-5p [%t:%C\{1}@%L][%x] - %m%n {quote} > Unable to produce verbose logs of Zookeeper > --- > > Key: ZOOKEEPER-3199 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ankit Kothana >Priority: Major > > We are using Zookeeper in our system along with Apache Kafka. However, > Zookeeper is not producing any relevant logs (even with lower log levels > specified in log4j.properties) in the log file that could help us in > identifying what is currently going on in ZK or Kafka cluster. > Please let us know how to retrieve proper logs from ZK cluster. > Version of ZK : 3.x -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3199) Unable to produce verbose logs of Zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697683#comment-16697683 ] Ankit Kothana commented on ZOOKEEPER-3199: -- Took reference of https://issues.apache.org/jira/browse/ZOOKEEPER-2170 and updated zkEnv.sh to change root level of logging to DEBUG, but it didn't resolved the issue. zookeeper.log {quote}ZooKeeper JMX enabled by default Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg Stopping zookeeper ... STOPPED ZooKeeper JMX enabled by default Using config: /usr/local/share/zookeeper/bin/../conf/zoo.cfg Starting zookeeper ... STARTED {quote} log4j.properties {quote}# Define some default values that can be overridden by system properties zookeeper.root.logger=DEBUG, CONSOLE zookeeper.console.threshold=DEBUG zookeeper.log.dir=/var/log/zookeeper zookeeper.log.file=zookeeper.log zookeeper.log.threshold=DEBUG zookeeper.tracelog.dir=. zookeeper.tracelog.file=zookeeper_trace.log # # ZooKeeper Logging Configuration # # Format is " (, )+ # DEFAULT: console appender only log4j.rootLogger=${zookeeper.root.logger} # Example with rolling log file #log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE # Example with rolling log file and tracing #log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE # # Log INFO level and above messages to the console # log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=${zookeeper.console.threshold} log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - %-5p [%t:%C\{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=${zookeeper.log.threshold} log4j.appender.ROLLINGFILE.File=${zookeeper.log.dir}/${zookeeper.log.file} # Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize=10MB # uncomment the next line to limit number of backup files #log4j.appender.ROLLINGFILE.MaxBackupIndex=10 log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - %-5p [%t:%C\{1}@%L] - %m%n # # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=${zookeeper.tracelog.dir}/${zookeeper.tracelog.file} log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout ### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d\{ISO8601} [myid:%X\{myid}] - %-5p [%t:%C\{1}@%L][%x] - %m%n {quote} > Unable to produce verbose logs of Zookeeper > --- > > Key: ZOOKEEPER-3199 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3199 > Project: ZooKeeper > Issue Type: Bug >Reporter: Ankit Kothana >Priority: Major > > We are using Zookeeper in our system along with Apache Kafka. However, > Zookeeper is not producing any relevant logs (even with lower log levels > specified in log4j.properties) in the log file that could help us in > identifying what is currently going on in ZK or Kafka cluster. > Please let us know how to retrieve proper logs from ZK cluster. > Version of ZK : 3.x -- This message was sent by Atlassian JIRA (v7.6.3#76005)