[jira] [Commented] (ZOOKEEPER-3456) Service temporarily unavailable due to an ongoing leader election. Please refresh

2019-07-08 Thread maoling (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880877#comment-16880877
 ] 

maoling commented on ZOOKEEPER-3456:


[~Mar_zieh] 

At the first, I think you should *ping* or *telnet* the server2 from other node 
to check the network issue

> Service temporarily unavailable due to an ongoing leader election. Please 
> refresh
> -
>
> Key: ZOOKEEPER-3456
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3456
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
> Environment: docker container with Ubuntu 16.04
>Reporter: Marzieh
>Priority: Major
> Fix For: 3.4.14
>
>
> Hi
> I configured Zookeeper with four nodes for my Mesos cluster with Marathon. 
> When I ran Flink Json file on Marathon, it was run without problem. But, when 
> I entered IP of my two slaves, just one slave shew Flink UI and another slave 
> shew this error:
>  
> Service temporarily unavailable due to an ongoing leader election. Please 
> refresh
> I checked "zookeeper.out" file and it said that :
>  
> 019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading 
> configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
> 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 0.0.0.0 to address: /0.0.0.0
> 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.3 to address: /10.32.0.3
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.2 to address: /10.32.0.2
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.5 to address: /10.32.0.5
> 2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - 
> Non-optimial configuration, consider an odd number of servers.
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - 
> Defaulting to majority quorums
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - 
> autopurge.snapRetainCount set to 3
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - 
> autopurge.purgeInterval set to 0
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - 
> Purge task is not scheduled.
> 2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting 
> quorum peer
> 2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using 
> org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$
> 2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - 
> binding to port 0.0.0.0/0.0.0.0:2181
> 2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - 
> Unexpected exception, exiting abnormally
> java.net.BindException: Address already in use
>  at sun.nio.ch.Net.bind0(Native Method)
>  at sun.nio.ch.Net.bind(Net.java:433)
>  at sun.nio.ch.Net.bind(Net.java:425)
>  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>  at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>  
> I searched a lot and could not find the solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880833#comment-16880833
 ] 

Hudson commented on ZOOKEEPER-2894:
---

SUCCESS: Integrated in Jenkins build ZooKeeper-trunk #606 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/606/])
ZOOKEEPER-2894: Memory and completions leak on zookeeper_close (hanm: rev 
f9610cc80173342bbe9766889a1aab1bfd840d1e)
* (edit) zookeeper-client/zookeeper-client-c/src/zookeeper.c
* (edit) zookeeper-client/zookeeper-client-c/tests/TestOperations.cc


> Memory and completions leak on zookeeper_close
> --
>
> Key: ZOOKEEPER-2894
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.10
> Environment: Linux ubuntu 4.4.0-87-generic
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
> https://github.com/apache/zookeeper.git
> branch-3.4
>Reporter: Alexander A. Strelets
>Assignee: Alexander A. Strelets
>Priority: Critical
>  Labels: easyfix, pull-request-available
> Fix For: 3.6.0, 3.4.15, 3.5.6
>
> Attachments: zk-will-free-zh-and-lose-completions.png
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> ZooKeeper C Client *+single thread+* build
> *The problem:*
> First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in 
> two ways:
> a) from a ZooKeeper callback handler (completion or watcher) which in turn is 
> called through _zookeeper_process()_
> b) and from other places -- i.e., when the call-stack does not pass through 
> any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_
> The issue described here below is +specific only to the case (b)+. So, it's 
> Ok with the case (a).
> When _zookeeper_close()_ is called in the (b) way, the following happens:
> 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, 
> they all are removed from this queue and each of them is "completed" with 
> personal fake response having status ZCLOSING. Such fake responses are put 
> into _zh.completions_to_process_ queue. It's Ok
> 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither 
> completion callbacks are called, nor dynamic memory allocated for fake 
> responses is freed+*
> 3. Different structures within _zh_ are dismissed and finally _zh_ is freed
> This is illustrated on the screenshot attached to this ticket: you may see 
> that the next instruction to execute will be _free(zh)_ while 
> _zh.completions_to_process_ queue is not empty (see the "Variables" tab to 
> the right).
> Alternatively, the same situation but in the case (a) is handled properly -- 
> i.e., all completion callback handlers are truly called with ZCLOSING and the 
> memory is freed, both for subcases (a.1) when there is a failure like 
> connection-timeout, connection-closed, etc., or (a.2) there is not failure. 
> The reason is that any callback handler (completion or watcher) in the case 
> (a) is called from the _process_completions()_ function which runs in the 
> loop until _zh.completions_to_process_ queue gets empty. So, this function 
> guarantees this queue to be completely processed even if new completions 
> occur during reaction on previously queued completions.
> *Consequently:*
> 1. At least there is definitely the +memory leak+ in the case (b) -- all the 
> fake responses put into _zh.completions_to_process_ queue are lost after 
> _free(zh)_
> 2. And it looks like a great misbehavior not to call completions on sent 
> requests in the case (b) while they are called with ZCLOSING in the case (a) 
> -- so, I think it's not "by design" but a +completions leak+
> +To reproduce the case (b) do the following:+
> - open ZooKeeper session, connect to a server, receive and process 
> connected-watch, etc.
> - then somewhere +from the main events loop+ call for example _zoo_acreate()_ 
> with valid arguments -- it shall return ZOK
> - then, +immediately after it returned+, call _zookeeper_close()_
> - note that completion callback handler for _zoo_acreate()_ *will not be 
> called*
> +To reproduce the case (a) do the following:+
> - the same as above, open ZooKeeper session, connect to a server, receive and 
> process connected-watch, etc.
> - the same as above, somewhere from the main events loop call for example 
> _zoo_acreate()_ with valid arguments -- it shall return ZOK
> - but now don't call _zookeeper_close()_ immediately -- wait for completion 
> callback on the commenced request
> - when _zoo_acreate()_ completes, +from within its completion callback 
> handler+, call another _zoo_acreate()_ and immediately after it returned call 
> _zookeeper_close()_
> - note that completion callback handler for the second 

[jira] [Updated] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-2894:
---
Fix Version/s: 3.5.6
   3.4.15

> Memory and completions leak on zookeeper_close
> --
>
> Key: ZOOKEEPER-2894
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.10
> Environment: Linux ubuntu 4.4.0-87-generic
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
> https://github.com/apache/zookeeper.git
> branch-3.4
>Reporter: Alexander A. Strelets
>Assignee: Alexander A. Strelets
>Priority: Critical
>  Labels: easyfix, pull-request-available
> Fix For: 3.6.0, 3.4.15, 3.5.6
>
> Attachments: zk-will-free-zh-and-lose-completions.png
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> ZooKeeper C Client *+single thread+* build
> *The problem:*
> First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in 
> two ways:
> a) from a ZooKeeper callback handler (completion or watcher) which in turn is 
> called through _zookeeper_process()_
> b) and from other places -- i.e., when the call-stack does not pass through 
> any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_
> The issue described here below is +specific only to the case (b)+. So, it's 
> Ok with the case (a).
> When _zookeeper_close()_ is called in the (b) way, the following happens:
> 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, 
> they all are removed from this queue and each of them is "completed" with 
> personal fake response having status ZCLOSING. Such fake responses are put 
> into _zh.completions_to_process_ queue. It's Ok
> 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither 
> completion callbacks are called, nor dynamic memory allocated for fake 
> responses is freed+*
> 3. Different structures within _zh_ are dismissed and finally _zh_ is freed
> This is illustrated on the screenshot attached to this ticket: you may see 
> that the next instruction to execute will be _free(zh)_ while 
> _zh.completions_to_process_ queue is not empty (see the "Variables" tab to 
> the right).
> Alternatively, the same situation but in the case (a) is handled properly -- 
> i.e., all completion callback handlers are truly called with ZCLOSING and the 
> memory is freed, both for subcases (a.1) when there is a failure like 
> connection-timeout, connection-closed, etc., or (a.2) there is not failure. 
> The reason is that any callback handler (completion or watcher) in the case 
> (a) is called from the _process_completions()_ function which runs in the 
> loop until _zh.completions_to_process_ queue gets empty. So, this function 
> guarantees this queue to be completely processed even if new completions 
> occur during reaction on previously queued completions.
> *Consequently:*
> 1. At least there is definitely the +memory leak+ in the case (b) -- all the 
> fake responses put into _zh.completions_to_process_ queue are lost after 
> _free(zh)_
> 2. And it looks like a great misbehavior not to call completions on sent 
> requests in the case (b) while they are called with ZCLOSING in the case (a) 
> -- so, I think it's not "by design" but a +completions leak+
> +To reproduce the case (b) do the following:+
> - open ZooKeeper session, connect to a server, receive and process 
> connected-watch, etc.
> - then somewhere +from the main events loop+ call for example _zoo_acreate()_ 
> with valid arguments -- it shall return ZOK
> - then, +immediately after it returned+, call _zookeeper_close()_
> - note that completion callback handler for _zoo_acreate()_ *will not be 
> called*
> +To reproduce the case (a) do the following:+
> - the same as above, open ZooKeeper session, connect to a server, receive and 
> process connected-watch, etc.
> - the same as above, somewhere from the main events loop call for example 
> _zoo_acreate()_ with valid arguments -- it shall return ZOK
> - but now don't call _zookeeper_close()_ immediately -- wait for completion 
> callback on the commenced request
> - when _zoo_acreate()_ completes, +from within its completion callback 
> handler+, call another _zoo_acreate()_ and immediately after it returned call 
> _zookeeper_close()_
> - note that completion callback handler for the second _zoo_acreate()_ *will 
> be called with ZCLOSING, unlike the case (b) described above*
> *To fix this I propose:*
> Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done 
> in _handle_error(zhandle_t *zh,int rc)_.
> This is a proposed fix: https://github.com/apache/zookeeper/pull/1000
> // Previously proposed fix: 

[jira] [Resolved] (ZOOKEEPER-2894) Memory and completions leak on zookeeper_close

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-2894.

   Resolution: Fixed
Fix Version/s: (was: 3.4.10)
   3.6.0

Issue resolved by pull request 1000
[https://github.com/apache/zookeeper/pull/1000]

> Memory and completions leak on zookeeper_close
> --
>
> Key: ZOOKEEPER-2894
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2894
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client
>Affects Versions: 3.4.10
> Environment: Linux ubuntu 4.4.0-87-generic
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
> https://github.com/apache/zookeeper.git
> branch-3.4
>Reporter: Alexander A. Strelets
>Assignee: Alexander A. Strelets
>Priority: Critical
>  Labels: easyfix, pull-request-available
> Fix For: 3.6.0
>
> Attachments: zk-will-free-zh-and-lose-completions.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> ZooKeeper C Client *+single thread+* build
> *The problem:*
> First of all, ZooKeeper C Client design allows calling _zookeeper_close()_ in 
> two ways:
> a) from a ZooKeeper callback handler (completion or watcher) which in turn is 
> called through _zookeeper_process()_
> b) and from other places -- i.e., when the call-stack does not pass through 
> any of zookeeper mechanics prior to enter into mentioned _zookeeper_close()_
> The issue described here below is +specific only to the case (b)+. So, it's 
> Ok with the case (a).
> When _zookeeper_close()_ is called in the (b) way, the following happens:
> 1. +If there are requests waiting for responses in _zh.sent_requests_ queue+, 
> they all are removed from this queue and each of them is "completed" with 
> personal fake response having status ZCLOSING. Such fake responses are put 
> into _zh.completions_to_process_ queue. It's Ok
> 2. But then, _zh.completions_to_process_ queue is left unhandled. *+Neither 
> completion callbacks are called, nor dynamic memory allocated for fake 
> responses is freed+*
> 3. Different structures within _zh_ are dismissed and finally _zh_ is freed
> This is illustrated on the screenshot attached to this ticket: you may see 
> that the next instruction to execute will be _free(zh)_ while 
> _zh.completions_to_process_ queue is not empty (see the "Variables" tab to 
> the right).
> Alternatively, the same situation but in the case (a) is handled properly -- 
> i.e., all completion callback handlers are truly called with ZCLOSING and the 
> memory is freed, both for subcases (a.1) when there is a failure like 
> connection-timeout, connection-closed, etc., or (a.2) there is not failure. 
> The reason is that any callback handler (completion or watcher) in the case 
> (a) is called from the _process_completions()_ function which runs in the 
> loop until _zh.completions_to_process_ queue gets empty. So, this function 
> guarantees this queue to be completely processed even if new completions 
> occur during reaction on previously queued completions.
> *Consequently:*
> 1. At least there is definitely the +memory leak+ in the case (b) -- all the 
> fake responses put into _zh.completions_to_process_ queue are lost after 
> _free(zh)_
> 2. And it looks like a great misbehavior not to call completions on sent 
> requests in the case (b) while they are called with ZCLOSING in the case (a) 
> -- so, I think it's not "by design" but a +completions leak+
> +To reproduce the case (b) do the following:+
> - open ZooKeeper session, connect to a server, receive and process 
> connected-watch, etc.
> - then somewhere +from the main events loop+ call for example _zoo_acreate()_ 
> with valid arguments -- it shall return ZOK
> - then, +immediately after it returned+, call _zookeeper_close()_
> - note that completion callback handler for _zoo_acreate()_ *will not be 
> called*
> +To reproduce the case (a) do the following:+
> - the same as above, open ZooKeeper session, connect to a server, receive and 
> process connected-watch, etc.
> - the same as above, somewhere from the main events loop call for example 
> _zoo_acreate()_ with valid arguments -- it shall return ZOK
> - but now don't call _zookeeper_close()_ immediately -- wait for completion 
> callback on the commenced request
> - when _zoo_acreate()_ completes, +from within its completion callback 
> handler+, call another _zoo_acreate()_ and immediately after it returned call 
> _zookeeper_close()_
> - note that completion callback handler for the second _zoo_acreate()_ *will 
> be called with ZCLOSING, unlike the case (b) described above*
> *To fix this I propose:*
> Just call _process_completions()_ from _destroy(zhandle_t *zh)_ as it is done 
> in _handle_error(zhandle_t *zh,int rc)_.
> This 

[jira] [Commented] (ZOOKEEPER-3243) Add server side request throttling

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880798#comment-16880798
 ] 

Hudson commented on ZOOKEEPER-3243:
---

SUCCESS: Integrated in Jenkins build Zookeeper-trunk-single-thread #440 (See 
[https://builds.apache.org/job/Zookeeper-trunk-single-thread/440/])
ZOOKEEPER-3243: Add server-side request throttling (hanm: rev 
7b3de52cdb15068aa343879ae283f4e456c68f39)
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/RequestThrottler.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerBean.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxn.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/RequestThrottlerTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMXBean.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java
* (edit) zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/Request.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/SessionTrackerTest.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/FinalRequestProcessor.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerCnxn.java


> Add server side request throttling
> --
>
> Key: ZOOKEEPER-3243
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> On-going performance investigation at Facebook has demonstrated that 
> Zookeeper is easily overwhelmed by spikes in connection rates and/or write 
> request rates. Zookeeper performance gets progressively worse, clients 
> timeout and try to reconnect (exacerbating the problem) and things enter a 
> death spiral. To solve this problem, we need to add load protection to 
> Zookeeper via rate limiting and work shedding.
> This JIRA task adds a new request throttling mechanism (RequestThrottler) to 
> Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during 
> request spikes.
>  
> When enabled, the RequestThrottler limits the number Of outstanding requests 
> currently submitted to the request processor pipeline. 
>  
> The throttler augments the limit imposed by the globalOutstandingLimit that 
> is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The 
> connection layer limit applies backpressure against the TCP connection by 
> disabling selection on connections once the request limit is reached. 
> However, the connection layer always allows a connection to send at least one 
> request before disabling selection on that connection. Thus, in a scenario 
> with 4 client connections, the total number of requests inflight may be 
> as high as 4 even if the globalOustandingLimit was set lower.
>  
> The RequestThrottler addresses this issue by adding additional queueing. When 
> enabled, client connections no longer submit requests directly to the request 
> processor pipeline but instead to the RequestThrottler. The RequestThrottler 
> is then responsible for issuing requests to the request processors, and 
> enforces a separate maxRequests limit. If the total number of outstanding 
> requests is higher than maxRequests, the throttler will continually stall for 
> stallTime milliseconds until under limit.
>  
> The RequestThrottler can also optionally drop stale requests rather than 
> submit them to the processor pipeline. A stale request is a request sent by a 
> connection that is already closed, and/or a request whose latency will end up 
> being higher than its associated session timeout.
> To ensure ordering guarantees, if a request is ever dropped from a connection 
> that connection is closed and flagged as invalid. All subsequent requests 
> inflight from that connection are then dropped as well.
>  
> The notion of staleness is configurable, both connection staleness and 
> latency staleness can be individually enabled/disabled. Both these settings 
> and the various throttle settings (limit, stall time, stale drop) can be 
> configured via system properties as well as at runtime via JMX.
>  
> The throttler has been tested and benchmarked at Facebook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3458) ZK 3.5.5 : Dynamic SecureClientPort and Server Specs

2019-07-08 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880770#comment-16880770
 ] 

Brian Nixon commented on ZOOKEEPER-3458:


Related: ZOOKEEPER-3166

> ZK 3.5.5 : Dynamic SecureClientPort and Server Specs
> 
>
> Key: ZOOKEEPER-3458
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3458
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: java client
>Affects Versions: 3.5.5
>Reporter: Fredrick Eisele
>Priority: Major
>
> ZK 3.5.5 : Dynamic configuration of SecureClientPort and Server Specs
> The server specification is ...
> {{server. = ::[:role];[ address>:]}}
>  
> The clientPort and clientPortAddress are accomodated but I do not see a 
> provision for secureClientPort.
>  
> secureClientPort and secureClientPortAddress
> were not made part of the dynamic configuration introduced in ZK 3.5.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3243) Add server side request throttling

2019-07-08 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880759#comment-16880759
 ] 

Hudson commented on ZOOKEEPER-3243:
---

FAILURE: Integrated in Jenkins build ZooKeeper-trunk #605 (See 
[https://builds.apache.org/job/ZooKeeper-trunk/605/])
ZOOKEEPER-3243: Add server-side request throttling (hanm: rev 
7b3de52cdb15068aa343879ae283f4e456c68f39)
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerCnxn.java
* (edit) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/SessionTrackerTest.java
* (edit) zookeeper-server/src/main/java/org/apache/zookeeper/server/Request.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerBean.java
* (edit) zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/NIOServerCnxn.java
* (add) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/RequestThrottler.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/FinalRequestProcessor.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ServerMetrics.java
* (edit) 
zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServerMXBean.java
* (add) 
zookeeper-server/src/test/java/org/apache/zookeeper/server/RequestThrottlerTest.java


> Add server side request throttling
> --
>
> Key: ZOOKEEPER-3243
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> On-going performance investigation at Facebook has demonstrated that 
> Zookeeper is easily overwhelmed by spikes in connection rates and/or write 
> request rates. Zookeeper performance gets progressively worse, clients 
> timeout and try to reconnect (exacerbating the problem) and things enter a 
> death spiral. To solve this problem, we need to add load protection to 
> Zookeeper via rate limiting and work shedding.
> This JIRA task adds a new request throttling mechanism (RequestThrottler) to 
> Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during 
> request spikes.
>  
> When enabled, the RequestThrottler limits the number Of outstanding requests 
> currently submitted to the request processor pipeline. 
>  
> The throttler augments the limit imposed by the globalOutstandingLimit that 
> is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The 
> connection layer limit applies backpressure against the TCP connection by 
> disabling selection on connections once the request limit is reached. 
> However, the connection layer always allows a connection to send at least one 
> request before disabling selection on that connection. Thus, in a scenario 
> with 4 client connections, the total number of requests inflight may be 
> as high as 4 even if the globalOustandingLimit was set lower.
>  
> The RequestThrottler addresses this issue by adding additional queueing. When 
> enabled, client connections no longer submit requests directly to the request 
> processor pipeline but instead to the RequestThrottler. The RequestThrottler 
> is then responsible for issuing requests to the request processors, and 
> enforces a separate maxRequests limit. If the total number of outstanding 
> requests is higher than maxRequests, the throttler will continually stall for 
> stallTime milliseconds until under limit.
>  
> The RequestThrottler can also optionally drop stale requests rather than 
> submit them to the processor pipeline. A stale request is a request sent by a 
> connection that is already closed, and/or a request whose latency will end up 
> being higher than its associated session timeout.
> To ensure ordering guarantees, if a request is ever dropped from a connection 
> that connection is closed and flagged as invalid. All subsequent requests 
> inflight from that connection are then dropped as well.
>  
> The notion of staleness is configurable, both connection staleness and 
> latency staleness can be individually enabled/disabled. Both these settings 
> and the various throttle settings (limit, stall time, stale drop) can be 
> configured via system properties as well as at runtime via JMX.
>  
> The throttler has been tested and benchmarked at Facebook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3437) Improve sync throttling on a learner master

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3437:
--

Assignee: Jie Huang

> Improve sync throttling on a learner master
> ---
>
> Key: ZOOKEEPER-3437
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3437
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> As described in ZOOKEEPER-1928, a leader can become overloaded if it sends 
> too many snapshots concurrently during sync time.  Sending too many diffs at 
> the same time can also cause the overloading issue. 
> In this JIRA, we will:
>  # add diff sync throttling in addition to snap sync throttling
>  # extend the protection to followers that serve observers
>  # improve the counting of concurrent snap syncs/diff syncs to avoid double 
> counting or missing counting



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3309) Add sync processor metrics

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3309:
--

Assignee: Jie Huang

> Add sync processor metrics
> --
>
> Key: ZOOKEEPER-3309
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3309
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3245) Add useful metrics for ZK pipeline and request/server states

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3245:
--

Assignee: Jie Huang

> Add useful metrics for ZK pipeline and request/server states
> 
>
> Key: ZOOKEEPER-3245
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3245
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add metrics to track time spent in the commit processor, watch counts and 
> fire rates, how long a Zookeeper server is unavailable between elections, 
> quorum packet size and time spent in the queue, aggregate request 
> states/flow, request throttle, sync processor queue time, per-connection read 
> and write request counts, commit processor queue sizes(read/write/commit), 
> final request processor read/write times, watch manager cnxn/path counts, 
> latencies at different points in pipeline for commits/informs, split up 
> request type counters for more request types, export sum metrics for all 
> AvgMinMax counters, per-connection watch fired counts, ack latency for each 
> follower, percentile metrics to zeus latency counters, proposal count, number 
> of outstanding changes,  snapshot and txns loading time during startup, 
> number of non-voting followers, leader unavailable time, etc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3323) Add TxnSnapLog metrics

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3323:
--

Assignee: Jie Huang

> Add TxnSnapLog metrics
> --
>
> Key: ZOOKEEPER-3323
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3323
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3321) Add metrics for Leader

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3321:
--

Assignee: Jie Huang

> Add metrics for Leader
> --
>
> Key: ZOOKEEPER-3321
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3321
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3327) Add unrecoverable error count

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3327:
--

Assignee: Jie Huang

> Add unrecoverable error count
> -
>
> Key: ZOOKEEPER-3327
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3327
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3401) Fix metric PROPOSAL_ACK_CREATION_LATENCY

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3401:
--

Assignee: Jie Huang

> Fix metric PROPOSAL_ACK_CREATION_LATENCY
> 
>
> Key: ZOOKEEPER-3401
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3401
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: metric system
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3243) Add server side request throttling

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3243:
--

Assignee: Jie Huang

> Add server side request throttling
> --
>
> Key: ZOOKEEPER-3243
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Jie Huang
>Assignee: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> On-going performance investigation at Facebook has demonstrated that 
> Zookeeper is easily overwhelmed by spikes in connection rates and/or write 
> request rates. Zookeeper performance gets progressively worse, clients 
> timeout and try to reconnect (exacerbating the problem) and things enter a 
> death spiral. To solve this problem, we need to add load protection to 
> Zookeeper via rate limiting and work shedding.
> This JIRA task adds a new request throttling mechanism (RequestThrottler) to 
> Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during 
> request spikes.
>  
> When enabled, the RequestThrottler limits the number Of outstanding requests 
> currently submitted to the request processor pipeline. 
>  
> The throttler augments the limit imposed by the globalOutstandingLimit that 
> is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The 
> connection layer limit applies backpressure against the TCP connection by 
> disabling selection on connections once the request limit is reached. 
> However, the connection layer always allows a connection to send at least one 
> request before disabling selection on that connection. Thus, in a scenario 
> with 4 client connections, the total number of requests inflight may be 
> as high as 4 even if the globalOustandingLimit was set lower.
>  
> The RequestThrottler addresses this issue by adding additional queueing. When 
> enabled, client connections no longer submit requests directly to the request 
> processor pipeline but instead to the RequestThrottler. The RequestThrottler 
> is then responsible for issuing requests to the request processors, and 
> enforces a separate maxRequests limit. If the total number of outstanding 
> requests is higher than maxRequests, the throttler will continually stall for 
> stallTime milliseconds until under limit.
>  
> The RequestThrottler can also optionally drop stale requests rather than 
> submit them to the processor pipeline. A stale request is a request sent by a 
> connection that is already closed, and/or a request whose latency will end up 
> being higher than its associated session timeout.
> To ensure ordering guarantees, if a request is ever dropped from a connection 
> that connection is closed and flagged as invalid. All subsequent requests 
> inflight from that connection are then dropped as well.
>  
> The notion of staleness is configurable, both connection staleness and 
> latency staleness can be individually enabled/disabled. Both these settings 
> and the various throttle settings (limit, stall time, stale drop) can be 
> configured via system properties as well as at runtime via JMX.
>  
> The throttler has been tested and benchmarked at Facebook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3243) Add server side request throttling

2019-07-08 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3243.

Resolution: Fixed

Issue resolved by pull request 986
[https://github.com/apache/zookeeper/pull/986]

> Add server side request throttling
> --
>
> Key: ZOOKEEPER-3243
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3243
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Jie Huang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 7h
>  Remaining Estimate: 0h
>
> On-going performance investigation at Facebook has demonstrated that 
> Zookeeper is easily overwhelmed by spikes in connection rates and/or write 
> request rates. Zookeeper performance gets progressively worse, clients 
> timeout and try to reconnect (exacerbating the problem) and things enter a 
> death spiral. To solve this problem, we need to add load protection to 
> Zookeeper via rate limiting and work shedding.
> This JIRA task adds a new request throttling mechanism (RequestThrottler) to 
> Zookeeper in hopes of preventing Zookeeper from becoming overwhelmed during 
> request spikes.
>  
> When enabled, the RequestThrottler limits the number Of outstanding requests 
> currently submitted to the request processor pipeline. 
>  
> The throttler augments the limit imposed by the globalOutstandingLimit that 
> is enforced by the connection layer (NIOServerCnxn, NettyServerCnxn). The 
> connection layer limit applies backpressure against the TCP connection by 
> disabling selection on connections once the request limit is reached. 
> However, the connection layer always allows a connection to send at least one 
> request before disabling selection on that connection. Thus, in a scenario 
> with 4 client connections, the total number of requests inflight may be 
> as high as 4 even if the globalOustandingLimit was set lower.
>  
> The RequestThrottler addresses this issue by adding additional queueing. When 
> enabled, client connections no longer submit requests directly to the request 
> processor pipeline but instead to the RequestThrottler. The RequestThrottler 
> is then responsible for issuing requests to the request processors, and 
> enforces a separate maxRequests limit. If the total number of outstanding 
> requests is higher than maxRequests, the throttler will continually stall for 
> stallTime milliseconds until under limit.
>  
> The RequestThrottler can also optionally drop stale requests rather than 
> submit them to the processor pipeline. A stale request is a request sent by a 
> connection that is already closed, and/or a request whose latency will end up 
> being higher than its associated session timeout.
> To ensure ordering guarantees, if a request is ever dropped from a connection 
> that connection is closed and flagged as invalid. All subsequent requests 
> inflight from that connection are then dropped as well.
>  
> The notion of staleness is configurable, both connection staleness and 
> latency staleness can be individually enabled/disabled. Both these settings 
> and the various throttle settings (limit, stall time, stale drop) can be 
> configured via system properties as well as at runtime via JMX.
>  
> The throttler has been tested and benchmarked at Facebook



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3441) OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814

2019-07-08 Thread Enrico Olivelli (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880708#comment-16880708
 ] 

Enrico Olivelli commented on ZOOKEEPER-3441:


thanks Patrick for the heads up.
Please go ahead. I will be happy to review and merge

> OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814
> ---
>
> Key: ZOOKEEPER-3441
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3441
> Project: ZooKeeper
>  Issue Type: Task
>  Components: build, security
>Affects Versions: 3.6.0
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> OWASP dependency checker is flagging jackson-databind-2.9.9.jar for 
> CVE-2019-12814 (https://nvd.nist.gov/vuln/detail/CVE-2019-12814) 
> We should upgrade the library but we are currently using the latest and 
> greatest 2.9.9.
> {noformat}
> A Polymorphic Typing issue was discovered in FasterXML jackson-databind 2.x 
> through 2.9.9. When Default Typing is enabled (either globally or for a 
> specific property) for an externally exposed JSON endpoint and the service 
> has JDOM 1.x or 2.x jar in the classpath, an attacker can send a specifically 
> crafted JSON message that allows them to read arbitrary local files on the 
> server.
> {noformat}
> We don't have jdom on the classpath, so we are not affected directly by this 
> change, but users that are using ZooKeeper Server in a custom environment 
> should take note of this issue
> this is the issue on Jackson: 
> https://github.com/FasterXML/jackson-databind/issues/2341



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3441) OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814

2019-07-08 Thread Patrick Hunt (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880566#comment-16880566
 ] 

Patrick Hunt commented on ZOOKEEPER-3441:
-

[~eolivelli] 2.9.9.1 is now posted - do you want to submit a patch or should I?

> OWASP is flagging jackson-databind-2.9.9.jar for CVE-2019-12814
> ---
>
> Key: ZOOKEEPER-3441
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3441
> Project: ZooKeeper
>  Issue Type: Task
>  Components: build, security
>Affects Versions: 3.6.0
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> OWASP dependency checker is flagging jackson-databind-2.9.9.jar for 
> CVE-2019-12814 (https://nvd.nist.gov/vuln/detail/CVE-2019-12814) 
> We should upgrade the library but we are currently using the latest and 
> greatest 2.9.9.
> {noformat}
> A Polymorphic Typing issue was discovered in FasterXML jackson-databind 2.x 
> through 2.9.9. When Default Typing is enabled (either globally or for a 
> specific property) for an externally exposed JSON endpoint and the service 
> has JDOM 1.x or 2.x jar in the classpath, an attacker can send a specifically 
> crafted JSON message that allows them to read arbitrary local files on the 
> server.
> {noformat}
> We don't have jdom on the classpath, so we are not affected directly by this 
> change, but users that are using ZooKeeper Server in a custom environment 
> should take note of this issue
> this is the issue on Jackson: 
> https://github.com/FasterXML/jackson-databind/issues/2341



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3458) ZK 3.5.5 : Dynamic SecureClientPort and Server Specs

2019-07-08 Thread Fredrick Eisele (JIRA)
Fredrick Eisele created ZOOKEEPER-3458:
--

 Summary: ZK 3.5.5 : Dynamic SecureClientPort and Server Specs
 Key: ZOOKEEPER-3458
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3458
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client
Affects Versions: 3.5.5
Reporter: Fredrick Eisele


ZK 3.5.5 : Dynamic configuration of SecureClientPort and Server Specs

The server specification is ...

{{server. = ::[:role];[:]}}

 
The clientPort and clientPortAddress are accomodated but I do not see a 
provision for secureClientPort.
 
secureClientPort and secureClientPortAddress
were not made part of the dynamic configuration introduced in ZK 3.5.5



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1045) Support Quorum Peer mutual authentication via SASL

2019-07-08 Thread Sunil Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880383#comment-16880383
 ] 

Sunil Kumar commented on ZOOKEEPER-1045:


[~rakeshr] Here is the link - 
https://issues.apache.org/jira/browse/ZOOKEEPER-2433 typed wrong JIRA.

> Support Quorum Peer mutual authentication via SASL
> --
>
> Key: ZOOKEEPER-1045
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1045
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quorum, security
>Reporter: Eugene Koontz
>Assignee: Rakesh R
>Priority: Critical
> Fix For: 3.4.10
>
> Attachments: 0001-ZOOKEEPER-1045-br-3-4.patch, 
> 1045_failing_phunt.tar.gz, HOST_RESOLVER-ZK-1045.patch, QuorumPeer Mutual 
> Authentication Via Sasl Feature Doc - 2016-Nov-10.pdf, QuorumPeer Mutual 
> Authentication Via Sasl Feature Doc - 2016-Nov-25.pdf, QuorumPeer Mutual 
> Authentication Via Sasl Feature Doc - 2016-Nov-29.pdf, QuorumPeer Mutual 
> Authentication Via Sasl Feature Doc - 2016-Nov-30.pdf, QuorumPeer Mutual 
> Authentication Via Sasl Feature Doc - 2016-Sep-25.pdf, 
> TEST-org.apache.zookeeper.server.quorum.auth.QuorumAuthUpgradeTest.txt, 
> ZK-1045-test-case-failure-logs.zip, ZOOKEEPER-1045 Test Plan.pdf, 
> ZOOKEEPER-1045-00.patch, ZOOKEEPER-1045-Rolling Upgrade Design Proposal.pdf, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045-br-3-4.patch, ZOOKEEPER-1045-br-3-4.patch, 
> ZOOKEEPER-1045TestValidationDesign.pdf, 
> org.apache.zookeeper.server.quorum.auth.QuorumAuthUpgradeTest.testRollingUpgrade.log
>
>
> ZOOKEEPER-938 addresses mutual authentication between clients and servers. 
> This bug, on the other hand, is for authentication among quorum peers. 
> Hopefully much of the work done on SASL integration with Zookeeper for 
> ZOOKEEPER-938 can be used as a foundation for this enhancement.
> Review board: https://reviews.apache.org/r/47354/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3456) Service temporarily unavailable due to an ongoing leader election. Please refresh

2019-07-08 Thread Marzieh (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880067#comment-16880067
 ] 

Marzieh edited comment on ZOOKEEPER-3456 at 7/8/19 8:27 AM:


Dear Maoling

All of four nodes had the same error : "java.net.BindException: Address already 
in use"

I changed client port from 2181 to 5186. Now I have this error in one node:

 

Cannot open channel to 2 at election address /10.32.0.3:5888
 java.net.ConnectException: Connection refused (Connection refused)


was (Author: mar_zieh):
All of four nodes had the same error : "java.net.BindException: Address already 
in use"

I changed client port from 2181 to 5186. Now I have this error in one node:

 

Cannot open channel to 2 at election address /10.32.0.3:5888
java.net.ConnectException: Connection refused (Connection refused)

> Service temporarily unavailable due to an ongoing leader election. Please 
> refresh
> -
>
> Key: ZOOKEEPER-3456
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3456
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
> Environment: docker container with Ubuntu 16.04
>Reporter: Marzieh
>Priority: Major
> Fix For: 3.4.14
>
>
> Hi
> I configured Zookeeper with four nodes for my Mesos cluster with Marathon. 
> When I ran Flink Json file on Marathon, it was run without problem. But, when 
> I entered IP of my two slaves, just one slave shew Flink UI and another slave 
> shew this error:
>  
> Service temporarily unavailable due to an ongoing leader election. Please 
> refresh
> I checked "zookeeper.out" file and it said that :
>  
> 019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading 
> configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
> 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 0.0.0.0 to address: /0.0.0.0
> 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.3 to address: /10.32.0.3
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.2 to address: /10.32.0.2
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.5 to address: /10.32.0.5
> 2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - 
> Non-optimial configuration, consider an odd number of servers.
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - 
> Defaulting to majority quorums
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - 
> autopurge.snapRetainCount set to 3
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - 
> autopurge.purgeInterval set to 0
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - 
> Purge task is not scheduled.
> 2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting 
> quorum peer
> 2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using 
> org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$
> 2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - 
> binding to port 0.0.0.0/0.0.0.0:2181
> 2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - 
> Unexpected exception, exiting abnormally
> java.net.BindException: Address already in use
>  at sun.nio.ch.Net.bind0(Native Method)
>  at sun.nio.ch.Net.bind(Net.java:433)
>  at sun.nio.ch.Net.bind(Net.java:425)
>  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>  at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>  
> I searched a lot and could not find the solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3456) Service temporarily unavailable due to an ongoing leader election. Please refresh

2019-07-08 Thread Marzieh (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16880067#comment-16880067
 ] 

Marzieh commented on ZOOKEEPER-3456:


All of four nodes had the same error : "java.net.BindException: Address already 
in use"

I changed client port from 2181 to 5186. Now I have this error in one node:

 

Cannot open channel to 2 at election address /10.32.0.3:5888
java.net.ConnectException: Connection refused (Connection refused)

> Service temporarily unavailable due to an ongoing leader election. Please 
> refresh
> -
>
> Key: ZOOKEEPER-3456
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3456
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
> Environment: docker container with Ubuntu 16.04
>Reporter: Marzieh
>Priority: Major
> Fix For: 3.4.14
>
>
> Hi
> I configured Zookeeper with four nodes for my Mesos cluster with Marathon. 
> When I ran Flink Json file on Marathon, it was run without problem. But, when 
> I entered IP of my two slaves, just one slave shew Flink UI and another slave 
> shew this error:
>  
> Service temporarily unavailable due to an ongoing leader election. Please 
> refresh
> I checked "zookeeper.out" file and it said that :
>  
> 019-07-07 11:48:43,412 [myid:] - INFO [main:QuorumPeerConfig@136] - Reading 
> configuration from: /home/zookeeper-3.4.14/bin/../conf/zoo.cfg
> 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 0.0.0.0 to address: /0.0.0.0
> 2019-07-07 11:48:43,421 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.3 to address: /10.32.0.3
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.2 to address: /10.32.0.2
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeer$QuorumServer@185] - 
> Resolved hostname: 10.32.0.5 to address: /10.32.0.5
> 2019-07-07 11:48:43,422 [myid:] - WARN [main:QuorumPeerConfig@354] - 
> Non-optimial configuration, consider an odd number of servers.
> 2019-07-07 11:48:43,422 [myid:] - INFO [main:QuorumPeerConfig@398] - 
> Defaulting to majority quorums
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@78] - 
> autopurge.snapRetainCount set to 3
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@79] - 
> autopurge.purgeInterval set to 0
> 2019-07-07 11:48:43,425 [myid:3] - INFO [main:DatadirCleanupManager@101] - 
> Purge task is not scheduled.
> 2019-07-07 11:48:43,432 [myid:3] - INFO [main:QuorumPeerMain@130] - Starting 
> quorum peer
> 2019-07-07 11:48:43,437 [myid:3] - INFO [main:ServerCnxnFactory@117] - Using 
> org.apache.zookeeper.server.NIOServerCnxnFactory as server connect$
> 2019-07-07 11:48:43,439 [myid:3] - INFO [main:NIOServerCnxnFactory@89] - 
> binding to port 0.0.0.0/0.0.0.0:2181
> 2019-07-07 11:48:43,440 [myid:3] - ERROR [main:QuorumPeerMain@92] - 
> Unexpected exception, exiting abnormally
> java.net.BindException: Address already in use
>  at sun.nio.ch.Net.bind0(Native Method)
>  at sun.nio.ch.Net.bind(Net.java:433)
>  at sun.nio.ch.Net.bind(Net.java:425)
>  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>  at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>  at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:133)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
>  at 
> org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
>  
> I searched a lot and could not find the solution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3457) Code optimization in QuorumCnxManager

2019-07-08 Thread tom.long (JIRA)
tom.long created ZOOKEEPER-3457:
---

 Summary: Code optimization in QuorumCnxManager
 Key: ZOOKEEPER-3457
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3457
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.5.5
Reporter: tom.long
 Fix For: 3.5.5



Dear developer:
I think the following code in line 623 of the QuorumCnxManager class can be 
optimized:

{code:java}
ArrayBlockingQueue bq = new ArrayBlockingQueue(
SEND_CAPACITY);
ArrayBlockingQueue oldq = queueSendMap.putIfAbsent(sid, bq);
if (oldq != null) {
addToSendQueue(oldq, b);
} else {
addToSendQueue(bq, b);
}
{code}
The optimization is as follows:
{code:java}
ArrayBlockingQueue bq = queueSendMap.computeIfAbsent(sid, serverId 
 -> new ArrayBlockingQueue<>(SEND_CAPACITY));
addToSendQueue(bq, b);
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)