[jira] [Commented] (ZOOKEEPER-3456) Service temporarily unavailable due to an ongoing leader election. Please refresh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880877#comment-16880877 ] maoling commented on ZOOKEEPER-3456: [~Mar_zieh] At the first, I think you should *ping* or *telnet* the server2 from other node to check the network issue > Service temporarily unavailable due to an ongoing leader election. Please > refresh > - > > Key: ZOOKEEPER-3456 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3456 > Project: ZooKeeper > Issue Type: Bug > Components: server > Environment: docker container with Ubuntu 16.04 >Reporter: Marzieh >Priority: Major > Fix For: 3.4.14 > > > Hi > I configured Zookeeper with four nodes for my Mesos cluster with Marathon. > When I ran Flink Json file on Marathon, it was run without problem. But, when > I entered IP of my two slaves, just one slave shew Flink UI and another slave > shew this error: > > Service temporarily unavailable due to an ongoing leader election. Please > refresh > I checked "zookeeper.out" file and it said that : > > 019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading > configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg > 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 0.0.0.0 to address: /0.0.0.0 > 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.3 to address: /10.32.0.3 > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.2 to address: /10.32.0.2 > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.5 to address: /10.32.0.5 > 2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - > Non-optimial configuration, consider an odd number of servers. > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - > Defaulting to majority quorums > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 0 > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - > Purge task is not scheduled. > 2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting > quorum peer > 2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using > org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$ > 2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > > I searched a lot and could not find the solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880833#comment-16880833 ] Hudson commented on ZOOKEEPER-2894: --- SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #606 (See [https://builds.apache.org/job/ZooKeeper-trunk/606/]) ZOOKEEPER-2894: Memory and completions leak on zookeeper_close (hanm: rev f9610cc80173342bbe9766889a1aab1bfd840d1e) * (edit) zookeeper-client/zookeeper-client-c/src/zookeeper.c * (edit) zookeeper-client/zookeeper-client-c/tests/TestOperations.cc > Memory and completions leak on zookeeper_close > -- > > Key: ZOOKEEPER-2894 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.10 > Environment: Linux ubuntu 4.4.0-87-generic > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 > https://github.com/apache/zookeeper.git > branch-3.4 >Reporter: Alexander A. Strelets >Assignee: Alexander A. Strelets >Priority: Critical > Labels: easyfix, pull-request-available > Fix For: 3.6.0, 3.4.15, 3.5.6 > > Attachments: zk-will-free-zh-and-lose-completions.png > > Time Spent: 4h 50m > Remaining Estimate: 0h > > ZooKeeper C Client *+single thread+* build > *The problem:* > First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in > two ways: > a) from a ZooKeeper callback handler (completion or watcher) which in turn is > called through _zookeeper_process()_ > b) and from other places -- i.e., when the call-stack does not pass through > any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_ > The issue described here below is +specific only to the case (b)+. So, it's > Ok with the case (a). > When _zookeeper_close()_ is called in the (b) way, the following happens: > 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, > they all are removed from this queue and each of them is "completed" with > personal fake response having status ZCLOSING. Such fake responses are put > into _zh.completions_to_process_ queue. It's Ok > 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither > completion callbacks are called, nor dynamic memory allocated for fake > responses is freed+* > 3. Different structures within _zh_ are dismissed and finally _zh_ is freed > This is illustrated on the screenshot attached to this ticket: you may see > that the next instruction to execute will be _free(zh)_ while > _zh.completions_to_process_ queue is not empty (see the "Variables" tab to > the right). > Alternatively, the same situation but in the case (a) is handled properly -- > i.e., all completion callback handlers are truly called with ZCLOSING and the > memory is freed, both for subcases (a.1) when there is a failure like > connection-timeout, connection-closed, etc., or (a.2) there is not failure. > The reason is that any callback handler (completion or watcher) in the case > (a) is called from the _process_completions()_ function which runs in the > loop until _zh.completions_to_process_ queue gets empty. So, this function > guarantees this queue to be completely processed even if new completions > occur during reaction on previously queued completions. > *Consequently:* > 1. At least there is definitely the +memory leak+ in the case (b) -- all the > fake responses put into _zh.completions_to_process_ queue are lost after > _free(zh)_ > 2. And it looks like a great misbehavior not to call completions on sent > requests in the case (b) while they are called with ZCLOSING in the case (a) > -- so, I think it's not "by design" but a +completions leak+ > +To reproduce the case (b) do the following:+ > - open ZooKeeper session, connect to a server, receive and process > connected-watch, etc. > - then somewhere +from the main events loop+ call for example _zoo_acreate()_ > with valid arguments -- it shall return ZOK > - then, +immediately after it returned+, call _zookeeper_close()_ > - note that completion callback handler for _zoo_acreate()_ *will not be > called* > +To reproduce the case (a) do the following:+ > - the same as above, open ZooKeeper session, connect to a server, receive and > process connected-watch, etc. > - the same as above, somewhere from the main events loop call for example > _zoo_acreate()_ with valid arguments -- it shall return ZOK > - but now don't call _zookeeper_close()_ immediately -- wait for completion > callback on the commenced request > - when _zoo_acreate()_ completes, +from within its completion callback > handler+, call another _zoo_acreate()_ and immediately after it returned call > _zookeeper_close()_ > - note that completion callback handler for the second
[jira] [Updated] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han updated ZOOKEEPER-2894: --- Fix Version/s: 3.5.6 3.4.15 > Memory and completions leak on zookeeper_close > -- > > Key: ZOOKEEPER-2894 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.10 > Environment: Linux ubuntu 4.4.0-87-generic > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 > https://github.com/apache/zookeeper.git > branch-3.4 >Reporter: Alexander A. Strelets >Assignee: Alexander A. Strelets >Priority: Critical > Labels: easyfix, pull-request-available > Fix For: 3.6.0, 3.4.15, 3.5.6 > > Attachments: zk-will-free-zh-and-lose-completions.png > > Time Spent: 4h 40m > Remaining Estimate: 0h > > ZooKeeper C Client *+single thread+* build > *The problem:* > First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in > two ways: > a) from a ZooKeeper callback handler (completion or watcher) which in turn is > called through _zookeeper_process()_ > b) and from other places -- i.e., when the call-stack does not pass through > any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_ > The issue described here below is +specific only to the case (b)+. So, it's > Ok with the case (a). > When _zookeeper_close()_ is called in the (b) way, the following happens: > 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, > they all are removed from this queue and each of them is "completed" with > personal fake response having status ZCLOSING. Such fake responses are put > into _zh.completions_to_process_ queue. It's Ok > 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither > completion callbacks are called, nor dynamic memory allocated for fake > responses is freed+* > 3. Different structures within _zh_ are dismissed and finally _zh_ is freed > This is illustrated on the screenshot attached to this ticket: you may see > that the next instruction to execute will be _free(zh)_ while > _zh.completions_to_process_ queue is not empty (see the "Variables" tab to > the right). > Alternatively, the same situation but in the case (a) is handled properly -- > i.e., all completion callback handlers are truly called with ZCLOSING and the > memory is freed, both for subcases (a.1) when there is a failure like > connection-timeout, connection-closed, etc., or (a.2) there is not failure. > The reason is that any callback handler (completion or watcher) in the case > (a) is called from the _process_completions()_ function which runs in the > loop until _zh.completions_to_process_ queue gets empty. So, this function > guarantees this queue to be completely processed even if new completions > occur during reaction on previously queued completions. > *Consequently:* > 1. At least there is definitely the +memory leak+ in the case (b) -- all the > fake responses put into _zh.completions_to_process_ queue are lost after > _free(zh)_ > 2. And it looks like a great misbehavior not to call completions on sent > requests in the case (b) while they are called with ZCLOSING in the case (a) > -- so, I think it's not "by design" but a +completions leak+ > +To reproduce the case (b) do the following:+ > - open ZooKeeper session, connect to a server, receive and process > connected-watch, etc. > - then somewhere +from the main events loop+ call for example _zoo_acreate()_ > with valid arguments -- it shall return ZOK > - then, +immediately after it returned+, call _zookeeper_close()_ > - note that completion callback handler for _zoo_acreate()_ *will not be > called* > +To reproduce the case (a) do the following:+ > - the same as above, open ZooKeeper session, connect to a server, receive and > process connected-watch, etc. > - the same as above, somewhere from the main events loop call for example > _zoo_acreate()_ with valid arguments -- it shall return ZOK > - but now don't call _zookeeper_close()_ immediately -- wait for completion > callback on the commenced request > - when _zoo_acreate()_ completes, +from within its completion callback > handler+, call another _zoo_acreate()_ and immediately after it returned call > _zookeeper_close()_ > - note that completion callback handler for the second _zoo_acreate()_ *will > be called with ZCLOSING, unlike the case (b) described above* > *To fix this I propose:* > Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done > in _handle_error(zhandle_t *zh,int rc)_. > This is a proposed fix: https://github.com/apache/zookeeper/pull/1000 > // Previously proposed fix:
[jira] [Resolved] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han resolved ZOOKEEPER-2894. Resolution: Fixed Fix Version/s: (was: 3.4.10) 3.6.0 Issue resolved by pull request 1000 [https://github.com/apache/zookeeper/pull/1000] > Memory and completions leak on zookeeper_close > -- > > Key: ZOOKEEPER-2894 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894 > Project: ZooKeeper > Issue Type: Bug > Components: c client >Affects Versions: 3.4.10 > Environment: Linux ubuntu 4.4.0-87-generic > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609 > https://github.com/apache/zookeeper.git > branch-3.4 >Reporter: Alexander A. Strelets >Assignee: Alexander A. Strelets >Priority: Critical > Labels: easyfix, pull-request-available > Fix For: 3.6.0 > > Attachments: zk-will-free-zh-and-lose-completions.png > > Time Spent: 4.5h > Remaining Estimate: 0h > > ZooKeeper C Client *+single thread+* build > *The problem:* > First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in > two ways: > a) from a ZooKeeper callback handler (completion or watcher) which in turn is > called through _zookeeper_process()_ > b) and from other places -- i.e., when the call-stack does not pass through > any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_ > The issue described here below is +specific only to the case (b)+. So, it's > Ok with the case (a). > When _zookeeper_close()_ is called in the (b) way, the following happens: > 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, > they all are removed from this queue and each of them is "completed" with > personal fake response having status ZCLOSING. Such fake responses are put > into _zh.completions_to_process_ queue. It's Ok > 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither > completion callbacks are called, nor dynamic memory allocated for fake > responses is freed+* > 3. Different structures within _zh_ are dismissed and finally _zh_ is freed > This is illustrated on the screenshot attached to this ticket: you may see > that the next instruction to execute will be _free(zh)_ while > _zh.completions_to_process_ queue is not empty (see the "Variables" tab to > the right). > Alternatively, the same situation but in the case (a) is handled properly -- > i.e., all completion callback handlers are truly called with ZCLOSING and the > memory is freed, both for subcases (a.1) when there is a failure like > connection-timeout, connection-closed, etc., or (a.2) there is not failure. > The reason is that any callback handler (completion or watcher) in the case > (a) is called from the _process_completions()_ function which runs in the > loop until _zh.completions_to_process_ queue gets empty. So, this function > guarantees this queue to be completely processed even if new completions > occur during reaction on previously queued completions. > *Consequently:* > 1. At least there is definitely the +memory leak+ in the case (b) -- all the > fake responses put into _zh.completions_to_process_ queue are lost after > _free(zh)_ > 2. And it looks like a great misbehavior not to call completions on sent > requests in the case (b) while they are called with ZCLOSING in the case (a) > -- so, I think it's not "by design" but a +completions leak+ > +To reproduce the case (b) do the following:+ > - open ZooKeeper session, connect to a server, receive and process > connected-watch, etc. > - then somewhere +from the main events loop+ call for example _zoo_acreate()_ > with valid arguments -- it shall return ZOK > - then, +immediately after it returned+, call _zookeeper_close()_ > - note that completion callback handler for _zoo_acreate()_ *will not be > called* > +To reproduce the case (a) do the following:+ > - the same as above, open ZooKeeper session, connect to a server, receive and > process connected-watch, etc. > - the same as above, somewhere from the main events loop call for example > _zoo_acreate()_ with valid arguments -- it shall return ZOK > - but now don't call _zookeeper_close()_ immediately -- wait for completion > callback on the commenced request > - when _zoo_acreate()_ completes, +from within its completion callback > handler+, call another _zoo_acreate()_ and immediately after it returned call > _zookeeper_close()_ > - note that completion callback handler for the second _zoo_acreate()_ *will > be called with ZCLOSING, unlike the case (b) described above* > *To fix this I propose:* > Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done > in _handle_error(zhandle_t *zh,int rc)_. > This
[jira] [Commented] (ZOOKEEPER-3243) Add server side request throttling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880798#comment-16880798 ] Hudson commented on ZOOKEEPER-3243: --- SUCCESS: Integrated in Jenkins build Zookeeper-trunk-single-thread #440 (See [https://builds.apache.org/job/Zookeeper-trunk-single-thread/440/]) ZOOKEEPER-3243: Add server-side request throttling (hanm: rev 7b3de52cdb15068aa343879ae283f4e456c68f39) * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java * (add) zookeeper-server/src/main/java/org/apache/zookeeper/server/RequestThrottler.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerBean.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxn.java * (add) zookeeper-server/src/test/java/org/apache/zookeeper/server/RequestThrottlerTest.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMXBean.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java * (edit) zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/Request.java * (edit) zookeeper-server/src/test/java/org/apache/zookeeper/server/SessionTrackerTest.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/FinalRequestProcessor.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerCnxn.java > Add server side request throttling > -- > > Key: ZOOKEEPER-3243 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > On-going performance investigation at Facebook has demonstrated that > Zookeeper is easily overwhelmed by spikes in connection rates and/or write > request rates. Zookeeper performance gets progressively worse, clients > timeout and try to reconnect (exacerbating the problem) and things enter a > death spiral. To solve this problem, we need to add load protection to > Zookeeper via rate limiting and work shedding. > This JIRA task adds a new request throttling mechanism (RequestThrottler) to > Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during > request spikes. > > When enabled, the RequestThrottler limits the number Of outstanding requests > currently submitted to the request processor pipeline. > > The throttler augments the limit imposed by the globalOutstandingLimit that > is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The > connection layer limit applies backpressure against the TCP connection by > disabling selection on connections once the request limit is reached. > However, the connection layer always allows a connection to send at least one > request before disabling selection on that connection. Thus, in a scenario > with 4 client connections, the total number of requests inflight may be > as high as 4 even if the globalOustandingLimit was set lower. > > The RequestThrottler addresses this issue by adding additional queueing. When > enabled, client connections no longer submit requests directly to the request > processor pipeline but instead to the RequestThrottler. The RequestThrottler > is then responsible for issuing requests to the request processors, and > enforces a separate maxRequests limit. If the total number of outstanding > requests is higher than maxRequests, the throttler will continually stall for > stallTime milliseconds until under limit. > > The RequestThrottler can also optionally drop stale requests rather than > submit them to the processor pipeline. A stale request is a request sent by a > connection that is already closed, and/or a request whose latency will end up > being higher than its associated session timeout. > To ensure ordering guarantees, if a request is ever dropped from a connection > that connection is closed and flagged as invalid. All subsequent requests > inflight from that connection are then dropped as well. > > The notion of staleness is configurable, both connection staleness and > latency staleness can be individually enabled/disabled. Both these settings > and the various throttle settings (limit, stall time, stale drop) can be > configured via system properties as well as at runtime via JMX. > > The throttler has been tested and benchmarked at Facebook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3458) ZK 3.5.5 : Dynamic SecureClientPort and Server Specs
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880770#comment-16880770 ] Brian Nixon commented on ZOOKEEPER-3458: Related: ZOOKEEPER-3166 > ZK 3.5.5 : Dynamic SecureClientPort and Server Specs > > > Key: ZOOKEEPER-3458 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3458 > Project: ZooKeeper > Issue Type: Improvement > Components: java client >Affects Versions: 3.5.5 >Reporter: Fredrick Eisele >Priority: Major > > ZK 3.5.5 : Dynamic configuration of SecureClientPort and Server Specs > The server specification is ... > {{server. = ::[:role];[ address>:]}} > > The clientPort and clientPortAddress are accomodated but I do not see a > provision for secureClientPort. > > secureClientPort and secureClientPortAddress > were not made part of the dynamic configuration introduced in ZK 3.5.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3243) Add server side request throttling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880759#comment-16880759 ] Hudson commented on ZOOKEEPER-3243: --- FAILURE: Integrated in Jenkins build ZooKeeper-trunk #605 (See [https://builds.apache.org/job/ZooKeeper-trunk/605/]) ZOOKEEPER-3243: Add server-side request throttling (hanm: rev 7b3de52cdb15068aa343879ae283f4e456c68f39) * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerCnxn.java * (edit) zookeeper-server/src/test/java/org/apache/zookeeper/server/SessionTrackerTest.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/Request.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerBean.java * (edit) zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxn.java * (add) zookeeper-server/src/main/java/org/apache/zookeeper/server/RequestThrottler.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/FinalRequestProcessor.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java * (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMXBean.java * (add) zookeeper-server/src/test/java/org/apache/zookeeper/server/RequestThrottlerTest.java > Add server side request throttling > -- > > Key: ZOOKEEPER-3243 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > On-going performance investigation at Facebook has demonstrated that > Zookeeper is easily overwhelmed by spikes in connection rates and/or write > request rates. Zookeeper performance gets progressively worse, clients > timeout and try to reconnect (exacerbating the problem) and things enter a > death spiral. To solve this problem, we need to add load protection to > Zookeeper via rate limiting and work shedding. > This JIRA task adds a new request throttling mechanism (RequestThrottler) to > Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during > request spikes. > > When enabled, the RequestThrottler limits the number Of outstanding requests > currently submitted to the request processor pipeline. > > The throttler augments the limit imposed by the globalOutstandingLimit that > is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The > connection layer limit applies backpressure against the TCP connection by > disabling selection on connections once the request limit is reached. > However, the connection layer always allows a connection to send at least one > request before disabling selection on that connection. Thus, in a scenario > with 4 client connections, the total number of requests inflight may be > as high as 4 even if the globalOustandingLimit was set lower. > > The RequestThrottler addresses this issue by adding additional queueing. When > enabled, client connections no longer submit requests directly to the request > processor pipeline but instead to the RequestThrottler. The RequestThrottler > is then responsible for issuing requests to the request processors, and > enforces a separate maxRequests limit. If the total number of outstanding > requests is higher than maxRequests, the throttler will continually stall for > stallTime milliseconds until under limit. > > The RequestThrottler can also optionally drop stale requests rather than > submit them to the processor pipeline. A stale request is a request sent by a > connection that is already closed, and/or a request whose latency will end up > being higher than its associated session timeout. > To ensure ordering guarantees, if a request is ever dropped from a connection > that connection is closed and flagged as invalid. All subsequent requests > inflight from that connection are then dropped as well. > > The notion of staleness is configurable, both connection staleness and > latency staleness can be individually enabled/disabled. Both these settings > and the various throttle settings (limit, stall time, stale drop) can be > configured via system properties as well as at runtime via JMX. > > The throttler has been tested and benchmarked at Facebook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3437) Improve sync throttling on a learner master
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3437: -- Assignee: Jie Huang > Improve sync throttling on a learner master > --- > > Key: ZOOKEEPER-3437 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3437 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Affects Versions: 3.6.0 >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > As described in ZOOKEEPER-1928, a leader can become overloaded if it sends > too many snapshots concurrently during sync time. Sending too many diffs at > the same time can also cause the overloading issue. > In this JIRA, we will: > # add diff sync throttling in addition to snap sync throttling > # extend the protection to followers that serve observers > # improve the counting of concurrent snap syncs/diff syncs to avoid double > counting or missing counting -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3309) Add sync processor metrics
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3309: -- Assignee: Jie Huang > Add sync processor metrics > -- > > Key: ZOOKEEPER-3309 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3309 > Project: ZooKeeper > Issue Type: Sub-task > Components: metric system >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 7h > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3245) Add useful metrics for ZK pipeline and request/server states
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3245: -- Assignee: Jie Huang > Add useful metrics for ZK pipeline and request/server states > > > Key: ZOOKEEPER-3245 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3245 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Add metrics to track time spent in the commit processor, watch counts and > fire rates, how long a Zookeeper server is unavailable between elections, > quorum packet size and time spent in the queue, aggregate request > states/flow, request throttle, sync processor queue time, per-connection read > and write request counts, commit processor queue sizes(read/write/commit), > final request processor read/write times, watch manager cnxn/path counts, > latencies at different points in pipeline for commits/informs, split up > request type counters for more request types, export sum metrics for all > AvgMinMax counters, per-connection watch fired counts, ack latency for each > follower, percentile metrics to zeus latency counters, proposal count, number > of outstanding changes, snapshot and txns loading time during startup, > number of non-voting followers, leader unavailable time, etc. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3323) Add TxnSnapLog metrics
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3323: -- Assignee: Jie Huang > Add TxnSnapLog metrics > -- > > Key: ZOOKEEPER-3323 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3323 > Project: ZooKeeper > Issue Type: Sub-task > Components: metric system >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3321) Add metrics for Leader
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3321: -- Assignee: Jie Huang > Add metrics for Leader > -- > > Key: ZOOKEEPER-3321 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3321 > Project: ZooKeeper > Issue Type: Sub-task > Components: metric system >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Major > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 3h 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3327) Add unrecoverable error count
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3327: -- Assignee: Jie Huang > Add unrecoverable error count > - > > Key: ZOOKEEPER-3327 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3327 > Project: ZooKeeper > Issue Type: Sub-task > Components: metric system >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3401) Fix metric PROPOSAL_ACK_CREATION_LATENCY
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3401: -- Assignee: Jie Huang > Fix metric PROPOSAL_ACK_CREATION_LATENCY > > > Key: ZOOKEEPER-3401 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3401 > Project: ZooKeeper > Issue Type: Sub-task > Components: metric system >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3243) Add server side request throttling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-3243: -- Assignee: Jie Huang > Add server side request throttling > -- > > Key: ZOOKEEPER-3243 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Jie Huang >Assignee: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 7h 20m > Remaining Estimate: 0h > > On-going performance investigation at Facebook has demonstrated that > Zookeeper is easily overwhelmed by spikes in connection rates and/or write > request rates. Zookeeper performance gets progressively worse, clients > timeout and try to reconnect (exacerbating the problem) and things enter a > death spiral. To solve this problem, we need to add load protection to > Zookeeper via rate limiting and work shedding. > This JIRA task adds a new request throttling mechanism (RequestThrottler) to > Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during > request spikes. > > When enabled, the RequestThrottler limits the number Of outstanding requests > currently submitted to the request processor pipeline. > > The throttler augments the limit imposed by the globalOutstandingLimit that > is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The > connection layer limit applies backpressure against the TCP connection by > disabling selection on connections once the request limit is reached. > However, the connection layer always allows a connection to send at least one > request before disabling selection on that connection. Thus, in a scenario > with 4 client connections, the total number of requests inflight may be > as high as 4 even if the globalOustandingLimit was set lower. > > The RequestThrottler addresses this issue by adding additional queueing. When > enabled, client connections no longer submit requests directly to the request > processor pipeline but instead to the RequestThrottler. The RequestThrottler > is then responsible for issuing requests to the request processors, and > enforces a separate maxRequests limit. If the total number of outstanding > requests is higher than maxRequests, the throttler will continually stall for > stallTime milliseconds until under limit. > > The RequestThrottler can also optionally drop stale requests rather than > submit them to the processor pipeline. A stale request is a request sent by a > connection that is already closed, and/or a request whose latency will end up > being higher than its associated session timeout. > To ensure ordering guarantees, if a request is ever dropped from a connection > that connection is closed and flagged as invalid. All subsequent requests > inflight from that connection are then dropped as well. > > The notion of staleness is configurable, both connection staleness and > latency staleness can be individually enabled/disabled. Both these settings > and the various throttle settings (limit, stall time, stale drop) can be > configured via system properties as well as at runtime via JMX. > > The throttler has been tested and benchmarked at Facebook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ZOOKEEPER-3243) Add server side request throttling
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han resolved ZOOKEEPER-3243. Resolution: Fixed Issue resolved by pull request 986 [https://github.com/apache/zookeeper/pull/986] > Add server side request throttling > -- > > Key: ZOOKEEPER-3243 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Jie Huang >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 7h > Remaining Estimate: 0h > > On-going performance investigation at Facebook has demonstrated that > Zookeeper is easily overwhelmed by spikes in connection rates and/or write > request rates. Zookeeper performance gets progressively worse, clients > timeout and try to reconnect (exacerbating the problem) and things enter a > death spiral. To solve this problem, we need to add load protection to > Zookeeper via rate limiting and work shedding. > This JIRA task adds a new request throttling mechanism (RequestThrottler) to > Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during > request spikes. > > When enabled, the RequestThrottler limits the number Of outstanding requests > currently submitted to the request processor pipeline. > > The throttler augments the limit imposed by the globalOutstandingLimit that > is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The > connection layer limit applies backpressure against the TCP connection by > disabling selection on connections once the request limit is reached. > However, the connection layer always allows a connection to send at least one > request before disabling selection on that connection. Thus, in a scenario > with 4 client connections, the total number of requests inflight may be > as high as 4 even if the globalOustandingLimit was set lower. > > The RequestThrottler addresses this issue by adding additional queueing. When > enabled, client connections no longer submit requests directly to the request > processor pipeline but instead to the RequestThrottler. The RequestThrottler > is then responsible for issuing requests to the request processors, and > enforces a separate maxRequests limit. If the total number of outstanding > requests is higher than maxRequests, the throttler will continually stall for > stallTime milliseconds until under limit. > > The RequestThrottler can also optionally drop stale requests rather than > submit them to the processor pipeline. A stale request is a request sent by a > connection that is already closed, and/or a request whose latency will end up > being higher than its associated session timeout. > To ensure ordering guarantees, if a request is ever dropped from a connection > that connection is closed and flagged as invalid. All subsequent requests > inflight from that connection are then dropped as well. > > The notion of staleness is configurable, both connection staleness and > latency staleness can be individually enabled/disabled. Both these settings > and the various throttle settings (limit, stall time, stale drop) can be > configured via system properties as well as at runtime via JMX. > > The throttler has been tested and benchmarked at Facebook -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3441) OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880708#comment-16880708 ] Enrico Olivelli commented on ZOOKEEPER-3441: thanks Patrick for the heads up. Please go ahead. I will be happy to review and merge > OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814 > --- > > Key: ZOOKEEPER-3441 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3441 > Project: ZooKeeper > Issue Type: Task > Components: build, security >Affects Versions: 3.6.0 >Reporter: Enrico Olivelli >Assignee: Enrico Olivelli >Priority: Critical > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 40m > Remaining Estimate: 0h > > OWASP dependency checker is flagging jackson-databind-2.9.9.jar for > CVE-2019-12814 (https://nvd.nist.gov/vuln/detail/CVE-2019-12814) > We should upgrade the library but we are currently using the latest and > greatest 2.9.9. > {noformat} > A Polymorphic Typing issue was discovered in FasterXML jackson-databind 2.x > through 2.9.9. When Default Typing is enabled (either globally or for a > specific property) for an externally exposed JSON endpoint and the service > has JDOM 1.x or 2.x jar in the classpath, an attacker can send a specifically > crafted JSON message that allows them to read arbitrary local files on the > server. > {noformat} > We don't have jdom on the classpath, so we are not affected directly by this > change, but users that are using ZooKeeper Server in a custom environment > should take note of this issue > this is the issue on Jackson: > https://github.com/FasterXML/jackson-databind/issues/2341 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3441) OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880566#comment-16880566 ] Patrick Hunt commented on ZOOKEEPER-3441: - [~eolivelli] 2.9.9.1 is now posted - do you want to submit a patch or should I? > OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814 > --- > > Key: ZOOKEEPER-3441 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3441 > Project: ZooKeeper > Issue Type: Task > Components: build, security >Affects Versions: 3.6.0 >Reporter: Enrico Olivelli >Assignee: Enrico Olivelli >Priority: Critical > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 40m > Remaining Estimate: 0h > > OWASP dependency checker is flagging jackson-databind-2.9.9.jar for > CVE-2019-12814 (https://nvd.nist.gov/vuln/detail/CVE-2019-12814) > We should upgrade the library but we are currently using the latest and > greatest 2.9.9. > {noformat} > A Polymorphic Typing issue was discovered in FasterXML jackson-databind 2.x > through 2.9.9. When Default Typing is enabled (either globally or for a > specific property) for an externally exposed JSON endpoint and the service > has JDOM 1.x or 2.x jar in the classpath, an attacker can send a specifically > crafted JSON message that allows them to read arbitrary local files on the > server. > {noformat} > We don't have jdom on the classpath, so we are not affected directly by this > change, but users that are using ZooKeeper Server in a custom environment > should take note of this issue > this is the issue on Jackson: > https://github.com/FasterXML/jackson-databind/issues/2341 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3458) ZK 3.5.5 : Dynamic SecureClientPort and Server Specs
Fredrick Eisele created ZOOKEEPER-3458: -- Summary: ZK 3.5.5 : Dynamic SecureClientPort and Server Specs Key: ZOOKEEPER-3458 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3458 Project: ZooKeeper Issue Type: Improvement Components: java client Affects Versions: 3.5.5 Reporter: Fredrick Eisele ZK 3.5.5 : Dynamic configuration of SecureClientPort and Server Specs The server specification is ... {{server. = ::[:role];[:]}} The clientPort and clientPortAddress are accomodated but I do not see a provision for secureClientPort. secureClientPort and secureClientPortAddress were not made part of the dynamic configuration introduced in ZK 3.5.5 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1045) Support Quorum Peer mutual authentication via SASL
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880383#comment-16880383 ] Sunil Kumar commented on ZOOKEEPER-1045: [~rakeshr] Here is the link - https://issues.apache.org/jira/browse/ZOOKEEPER-2433 typed wrong JIRA. > Support Quorum Peer mutual authentication via SASL > -- > > Key: ZOOKEEPER-1045 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1045 > Project: ZooKeeper > Issue Type: New Feature > Components: quorum, security >Reporter: Eugene Koontz >Assignee: Rakesh R >Priority: Critical > Fix For: 3.4.10 > > Attachments: 0001-ZOOKEEPER-1045-br-3-4.patch, > 1045_failing_phunt.tar.gz, HOST_RESOLVER-ZK-1045.patch, QuorumPeer Mutual > Authentication Via Sasl Feature Doc - 2016-Nov-10.pdf, QuorumPeer Mutual > Authentication Via Sasl Feature Doc - 2016-Nov-25.pdf, QuorumPeer Mutual > Authentication Via Sasl Feature Doc - 2016-Nov-29.pdf, QuorumPeer Mutual > Authentication Via Sasl Feature Doc - 2016-Nov-30.pdf, QuorumPeer Mutual > Authentication Via Sasl Feature Doc - 2016-Sep-25.pdf, > TEST-org.apache.zookeeper.server.quorum.auth.QuorumAuthUpgradeTest.txt, > ZK-1045-test-case-failure-logs.zip, ZOOKEEPER-1045 Test Plan.pdf, > ZOOKEEPER-1045-00.patch, ZOOKEEPER-1045-Rolling Upgrade Design Proposal.pdf, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, > ZOOKEEPER-1045TestValidationDesign.pdf, > org.apache.zookeeper.server.quorum.auth.QuorumAuthUpgradeTest.testRollingUpgrade.log > > > ZOOKEEPER-938 addresses mutual authentication between clients and servers. > This bug, on the other hand, is for authentication among quorum peers. > Hopefully much of the work done on SASL integration with Zookeeper for > ZOOKEEPER-938 can be used as a foundation for this enhancement. > Review board: https://reviews.apache.org/r/47354/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ZOOKEEPER-3456) Service temporarily unavailable due to an ongoing leader election. Please refresh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880067#comment-16880067 ] Marzieh edited comment on ZOOKEEPER-3456 at 7/8/19 8:27 AM: Dear Maoling All of four nodes had the same error : "java.net.BindException: Address already in use" I changed client port from 2181 to 5186. Now I have this error in one node: Cannot open channel to 2 at election address /10.32.0.3:5888 java.net.ConnectException: Connection refused (Connection refused) was (Author: mar_zieh): All of four nodes had the same error : "java.net.BindException: Address already in use" I changed client port from 2181 to 5186. Now I have this error in one node: Cannot open channel to 2 at election address /10.32.0.3:5888 java.net.ConnectException: Connection refused (Connection refused) > Service temporarily unavailable due to an ongoing leader election. Please > refresh > - > > Key: ZOOKEEPER-3456 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3456 > Project: ZooKeeper > Issue Type: Bug > Components: server > Environment: docker container with Ubuntu 16.04 >Reporter: Marzieh >Priority: Major > Fix For: 3.4.14 > > > Hi > I configured Zookeeper with four nodes for my Mesos cluster with Marathon. > When I ran Flink Json file on Marathon, it was run without problem. But, when > I entered IP of my two slaves, just one slave shew Flink UI and another slave > shew this error: > > Service temporarily unavailable due to an ongoing leader election. Please > refresh > I checked "zookeeper.out" file and it said that : > > 019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading > configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg > 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 0.0.0.0 to address: /0.0.0.0 > 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.3 to address: /10.32.0.3 > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.2 to address: /10.32.0.2 > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.5 to address: /10.32.0.5 > 2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - > Non-optimial configuration, consider an odd number of servers. > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - > Defaulting to majority quorums > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 0 > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - > Purge task is not scheduled. > 2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting > quorum peer > 2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using > org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$ > 2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > > I searched a lot and could not find the solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3456) Service temporarily unavailable due to an ongoing leader election. Please refresh
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880067#comment-16880067 ] Marzieh commented on ZOOKEEPER-3456: All of four nodes had the same error : "java.net.BindException: Address already in use" I changed client port from 2181 to 5186. Now I have this error in one node: Cannot open channel to 2 at election address /10.32.0.3:5888 java.net.ConnectException: Connection refused (Connection refused) > Service temporarily unavailable due to an ongoing leader election. Please > refresh > - > > Key: ZOOKEEPER-3456 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3456 > Project: ZooKeeper > Issue Type: Bug > Components: server > Environment: docker container with Ubuntu 16.04 >Reporter: Marzieh >Priority: Major > Fix For: 3.4.14 > > > Hi > I configured Zookeeper with four nodes for my Mesos cluster with Marathon. > When I ran Flink Json file on Marathon, it was run without problem. But, when > I entered IP of my two slaves, just one slave shew Flink UI and another slave > shew this error: > > Service temporarily unavailable due to an ongoing leader election. Please > refresh > I checked "zookeeper.out" file and it said that : > > 019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading > configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg > 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 0.0.0.0 to address: /0.0.0.0 > 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.3 to address: /10.32.0.3 > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.2 to address: /10.32.0.2 > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - > Resolved hostname: 10.32.0.5 to address: /10.32.0.5 > 2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - > Non-optimial configuration, consider an odd number of servers. > 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - > Defaulting to majority quorums > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - > autopurge.snapRetainCount set to 3 > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - > autopurge.purgeInterval set to 0 > 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - > Purge task is not scheduled. > 2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting > quorum peer > 2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using > org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$ > 2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - > binding to port 0.0.0.0/0.0.0.0:2181 > 2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - > Unexpected exception, exiting abnormally > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114) > at > org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81) > > I searched a lot and could not find the solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3457) Code optimization in QuorumCnxManager
tom.long created ZOOKEEPER-3457: --- Summary: Code optimization in QuorumCnxManager Key: ZOOKEEPER-3457 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3457 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.5.5 Reporter: tom.long Fix For: 3.5.5 Dear developer: I think the following code in line 623 of the QuorumCnxManager class can be optimized: {code:java} ArrayBlockingQueue bq = new ArrayBlockingQueue( SEND_CAPACITY); ArrayBlockingQueue oldq = queueSendMap.putIfAbsent(sid, bq); if (oldq != null) { addToSendQueue(oldq, b); } else { addToSendQueue(bq, b); } {code} The optimization is as follows: {code:java} ArrayBlockingQueue bq = queueSendMap.computeIfAbsent(sid, serverId -> new ArrayBlockingQueue<>(SEND_CAPACITY)); addToSendQueue(bq, b); {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)