[jira] [Created] (ZOOKEEPER-3981) Flaky test MultipleAddressTest::testGetValidAddressWithNotValid

2020-10-20 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3981:
--

 Summary: Flaky test 
MultipleAddressTest::testGetValidAddressWithNotValid
 Key: ZOOKEEPER-3981
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3981
 Project: ZooKeeper
  Issue Type: Task
  Components: tests
Reporter: Michael Han
Assignee: Michael Han


Problem:

Test MultipleAddressTest::testGetValidAddressWithNotValid might fail 
deterministically when the address it's using, 10.0.0.1, is reachable, as per 
https://tools.ietf.org/html/rfc5735 10.0.0.1 might be allocatable to private 
network usage. In fact, the router address of my ISP is assigned this IP, 
leading to this test always failing for me. 

Solution:

Replace the address with 240.0.0.0, which is reserved for future use and less 
likely to be reachable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3970) Enable ZooKeeperServerController to expire session

2020-10-14 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3970:
--

 Summary: Enable ZooKeeperServerController to expire session
 Key: ZOOKEEPER-3970
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3970
 Project: ZooKeeper
  Issue Type: Task
  Components: server, tests
Reporter: Michael Han
Assignee: Michael Han


This is a follow up of ZOOKEEPER-3948. Here we enable ZooKeeperServerController 
to be able to expire a global or local session. This is very useful in our 
experience in integration testing when we want a controlled session expiration 
mechanism. This is done by having session tracker exposing both global and 
local session stats, so a zookeeper server can expire the sessions in the 
controller. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3967) Jetty License Update

2020-10-12 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3967:
--

 Summary: Jetty License Update
 Key: ZOOKEEPER-3967
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3967
 Project: ZooKeeper
  Issue Type: Task
  Components: license
Reporter: Michael Han


ZooKeeper server is using Jetty (apache license, v2) for admin server (and for 
more things in future), but we didn't include any of Jetty's copy right / 
notice / license file in ZooKeeper distribution. This ticket is to figure out 
if Jetty license is indeed missing and if so, fix it.

Some previous discussions on Jetty license in ZOOKEEPER-2235 but Jetty somehow 
was not ended up in the patch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3966) Model ZooKeeper data tree using RocksDB primitives to enable on disk data tree storage

2020-10-07 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3966:
--

 Summary: Model ZooKeeper data tree using RocksDB primitives to 
enable on disk data tree storage
 Key: ZOOKEEPER-3966
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3966
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: server
Reporter: Michael Han






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3965) Add documentation for RocksDB Snap feature

2020-10-07 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3965:
--

 Summary: Add documentation for RocksDB Snap feature
 Key: ZOOKEEPER-3965
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3965
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: documentation
Reporter: Michael Han






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3964) Introduce RocksDB snap and implement change data capture to enable incremental snapshot

2020-10-07 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3964:
--

 Summary: Introduce RocksDB snap and implement change data capture 
to enable incremental snapshot
 Key: ZOOKEEPER-3964
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3964
 Project: ZooKeeper
  Issue Type: Sub-task
  Components: rocksdb, server
Reporter: Michael Han
Assignee: Michael Han


This is the first step of enabling on disk storage engine for ZooKeeper by 
extending the existing Snap interface and implement a RocksDB backed snapshot. 
Comparing to file based snapshot, RocksDB based snapshot is superior for big in 
memory data tree as it supports incremental snapshot by only serializing the 
changed data between snapshots. 

High level overview:
 * Extend Snap interface so every thing that's need serialize has a presence on 
the interface.
 * Implement RocksDB based snapshot, and bidirectional conversations between 
File based snapshot and RocksDB snapshot, for back / forward compatibility.
 * Change data capture is implemented by buffering transactions applied to data 
tree, and applied to RocksDB when processing each transaction. An incremental 
snapshot thus only requires RocksDB flush. ZK will always do a full snapshot 
when first loading the data tree during the start process.
 * By default, this feature is disabled. Users need opt in by explicitly 
specify a Java system property to instantiate RocksDBSnap at runtime.

This work is based on top of the patch attached to ZOOKEEPER-3783 (kudos to 
Fangmin and co at FB), with some bug / test fixes and adjustment so it can 
cleanly apply to master branch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3948) Introduce a deterministic runtime behavior injection framework for ZooKeeperServer testing

2020-09-25 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3948:
--

 Summary: Introduce a deterministic runtime behavior injection 
framework for ZooKeeperServer testing
 Key: ZOOKEEPER-3948
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3948
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server, tests
Reporter: Michael Han
Assignee: Michael Han


We'd like to understand how applications built on top of ZooKeeper behave under 
various faulty conditions, which is important to build resilient end to end 
solutions and avoid ZooKeeper being single point of failure. We'd also like to 
achieve this in both unit tests (in process) and integration tests (in and out 
of process). Traditional methods of using external fault injection mechanisms 
are non deterministic and requires non trivial set up and hard to integrate 
with unit tests, so here we introduce the ZooKeeperController service which 
solves both.

The basic idea here is to create a controllable ZooKeeperServer which accepts 
various control commands (such as - delay request, drop request, eat request, 
expire session, shutdown, trigger leader election, and so on), and reacting 
based on incoming commands. The controllable server and production server share 
the same underlying machineries (quorum peers, ZooKeeper server, etc) but code 
paths are separate, thus this feature has no production impact.

This controller system is currently composed of following pieces:

* CommandClient: a convenient HTTP client to send control commands to 
controller service.
* CommandListener: an embed HTTP server listening incoming commands and 
dispatch to controller service.
* Controller Service: the service that's responsible to create controllable ZK 
server and the controller.
* ZooKeeperServerController: the controller that changes the behavior of ZK 
server runtime.
* Controllable Cnx / Factory: controllable connection that accept behavior 
change requests.

In future more control commands and controllable components can be added on top 
of this framework.

This can be used in either unit tests / integration tests as an in process 
embedded controllable ZooKeeper server, or as an out of process stand alone 
controllable ZooKeeper process.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3793) Request throttling is broken when RequestThrottler is disabled or configured incorrectly.

2020-04-10 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3793:
--

 Summary: Request throttling is broken when RequestThrottler is 
disabled or configured incorrectly.
 Key: ZOOKEEPER-3793
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3793
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


When RequestThrottler is not enabled or is enabled but configured incorrectly, 
ZooKeeper server will stop throttling. This is a serious bug as without request 
throttling, it's fairly easy to overwhelm ZooKeeper which leads to all sorts of 
issues. 

This is a regression introduced in ZOOKEEPER-3243, where the total number of 
queued requests in request processing pipeline is not taking into consideration 
when deciding whether to throttle or not, or only taken into consideration 
conditionally based on RequestThrottler's configurations. We should make sure 
always taking into account the number of queued requests in request processing 
pipeline before making throttling decisions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3561) Generalize target authentication scheme for ZooKeeper authentication enforcement.

2019-09-26 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3561:
--

 Summary: Generalize target authentication scheme for ZooKeeper 
authentication enforcement.
 Key: ZOOKEEPER-3561
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3561
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han


ZOOKEEPER-1634 introduced an option to allow user enforce authentication for 
ZooKeeper clients, but the enforced authentication scheme in committed 
implementation was SASL only. 

This JIRA is to generalize the authentication scheme such that the 
authentication enforcement on ZooKeeper clients could work with any supported 
authentication scheme.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3560) Add response cache to serve get children (2) requests.

2019-09-26 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3560:
--

 Summary: Add response cache to serve get children (2) requests.
 Key: ZOOKEEPER-3560
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3560
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Michael Han
Assignee: Michael Han


ZOOKEEPER-3180 introduces response cache but it only covers getData requests. 
This JIRA is to extend the response cache based on the infrastructure set up by 
ZOOKEEPER-3180 to so the response of get children requests can also be served 
out of cache. Some design decisions:

* Only OpCode.getChildren2 is supported, as OpCode.getChildren does not have 
associated stats and current cache infra relies on stats to invalidate cache.

* The children list is stored in a separate response cache object so it does 
not pollute the existing data cache that's serving getData requests, and this 
separation also allows potential separate tuning of each cache based on 
workload characteristics.

* As a result of cache object separation, new server metrics is added to 
measure cache hit / miss for get children requests, that's separated from get 
data requests.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ZOOKEEPER-3548) Redundant zxid check in SnapStream.isValidSnapshot

2019-09-16 Thread Michael Han (Jira)
Michael Han created ZOOKEEPER-3548:
--

 Summary: Redundant zxid check in SnapStream.isValidSnapshot
 Key: ZOOKEEPER-3548
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3548
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Michael Han
Assignee: Michael Han


getZxidFromName is called twice in isValidSnapshot, and the second call is 
redundant and should be removed.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (ZOOKEEPER-3483) Flaky test: org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats

2019-08-01 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3483:
--

 Summary: Flaky test: 
org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats
 Key: ZOOKEEPER-3483
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3483
 Project: ZooKeeper
  Issue Type: Test
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


Test 
org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats
 consistently pass on local dev environment but frequently failing on Jenkins 
pre-commit build.

For now disable the test to unblock a couple of pull request acquiring a green 
build, before it's completely addressed.

Error for reference:

{code:java}
Error Message
expected:<845466> but was:<111>
Stacktrace
java.lang.AssertionError: expected:<845466> but was:<111>
at 
org.apache.zookeeper.server.util.RequestPathMetricsCollectorTest.testCollectStats(RequestPathMetricsCollectorTest.java:248)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (ZOOKEEPER-3448) Introduce MessageTracker to assist debug leader and leaner connectivity issues

2019-06-28 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3448:
--

 Summary: Introduce MessageTracker to assist debug leader and 
leaner connectivity issues
 Key: ZOOKEEPER-3448
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3448
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


We want to have better insight on the state of the world when learners lost 
connection with leader, so we need capture more information when that happens. 
We capture more information through MessageTracker which will record the last 
few sent and received messages at various protocol stage, and these information 
will be dumped to log files for further analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3439) Observability improvements on client / server connection close

2019-06-21 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3439:
--

 Summary: Observability improvements on client / server connection 
close
 Key: ZOOKEEPER-3439
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3439
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


Currently when server close a client connection there is not enough information 
recorded (except few exception logs) which makes it hard to do postmortems. On 
the other side, having a complete view of the aggregated connection closing 
reason will provide more signals based on which we can better operate the 
clusters (e.g. predicate an incident might happen based on the trending of the 
connection closing reasons).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3430) Observability improvement: provide top N read / write path queries

2019-06-17 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3430:
--

 Summary: Observability improvement: provide top N read / write 
path queries
 Key: ZOOKEEPER-3430
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3430
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


We would like to have a better understanding of the type of workloads hit ZK, 
and one aspect of such understanding is to be able to answer queries of top N 
read and top N write request path. Knowing the hot request paths will allow us 
better optimize for such workloads, for example, enabling path specific 
caching, or change the path structure (e.g. break a long path to hierarchical 
paths).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3427) Introduce SnapshotComparer that assists debugging with snapshots.

2019-06-11 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3427:
--

 Summary: Introduce SnapshotComparer that assists debugging with 
snapshots.
 Key: ZOOKEEPER-3427
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3427
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


SnapshotComparer is a tool that loads and compares two snapshots, with 
configurable threshold and various filters. It's useful in use cases that 
involve snapshot analysis, such as offline data consistency checking, and data 
trending analysis (e.g. what's growing under which zNode path during when). 

A sample output of the tool (actual numbers removed, due to sensitivity).
{code:java}
Successfully parsed options!

Deserialized snapshot in snapshot.0 in  seconds

Processed data tree in seconds

Deserialized snapshot in snapshot.1 in  seconds

Processed data tree in seconds

Node count: 

Total size: 

Max depth: 

Count of nodes at depth 1: 

Count of nodes at depth 2: 

Count of nodes at depth 3: 

Count of nodes at depth 4: 

Count of nodes at depth 5: 

Count of nodes at depth 6: 

Count of nodes at depth 7: 

Count of nodes at depth 8: 

Count of nodes at depth 9: 

Count of nodes at depth 10: 

Count of nodes at depth 11: 


Node count: 

Total size: 

Max depth: 

Count of nodes at depth 1: 

Count of nodes at depth 2: 

Count of nodes at depth 3: 

Count of nodes at depth 4: 

Count of nodes at depth 5:

Count of nodes at depth 6:

Count of nodes at depth 7: 

Count of nodes at depth 8: 

Count of nodes at depth 9: 

Count of nodes at depth 10: 

Count of nodes at depth 11: 




Analysis for depth 0

Analysis for depth 1

Analysis for depth 2

Analysis for depth 3

Analysis for depth 4

Analysis for depth 5

Analysis for depth 6

Analysis for depth 7

Analysis for depth 8

Analysis for depth 9

Analysis for depth 10
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3419) Backup and recovery support

2019-06-06 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3419:
--

 Summary: Backup and recovery support
 Key: ZOOKEEPER-3419
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3419
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


Historically ZooKeeper has no intrinsic support for backup and restore. The 
usual approach of doing backup and restore is through customized scripts to 
copy data around, or through some 3rd party tools (exhibitor, etc), which 
introduces operation burden. 

This Jira will introduce another option: a direct support of backup and restore 
from ZooKeeper itself. It's completely built into ZooKeeper, support point in 
time recovery of an entire tree rooted after an oops event, support recovery 
partial tree for test/dev purpose, and can help replay history for bug 
investigation. It will try to provide a generic interface so the backups can be 
directed to different data storage systems (S3, Kafka, HDFS, etc).

This same system has been in production at Twitter for X years and proved to be 
quite helpful for various use cases mentioned earlier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3418) Improve quorum throughput through eager ACL checks of requests on local servers

2019-06-06 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3418:
--

 Summary: Improve quorum throughput through eager ACL checks of 
requests on local servers
 Key: ZOOKEEPER-3418
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3418
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


Serving write requests that change the state of the system requires quorum 
operations, and in some cases, the quorum operations can be avoided if the 
requests are doomed to fail. ACL check failure is such a case. To optimize for 
this case, we elevate the ACL check logic and perform eager ACL check on local 
server (where the requests are received), and fail fast, before sending the 
requests to leader. 

As with any features, there is a feature flag that can control this feature on, 
or off (default). This feature is also forward compatible in that for new any 
new Op code (and some existing Op code we did not explicit check against), they 
will pass the check and (potentially) fail on leader side, instead of being 
prematurely filtered out on local server.

The end result is better throughput and stability of the quorum for certain 
workloads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3416) Remove redundant ServerCnxnFactoryAccessor

2019-06-05 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3416:
--

 Summary: Remove redundant ServerCnxnFactoryAccessor
 Key: ZOOKEEPER-3416
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3416
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


We have two ways to access the private zkServer inside ServerCnxnFactory, and 
there is really no need to keep maintaining both. We could remove 
ServerCnxnFactoryAccessor when we added the public accessor for 
ServerCnxnFactory in ZOOKEEPER-1346, but we did not.

The solution is to consolidate all access of the zkServer through the public 
accessor of ServerCnxnFactory. The end result is cleaner code base and less 
confusion.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.

2019-05-21 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-1000.

Resolution: Duplicate

> Provide SSL in zookeeper to be able to run cross colos.
> ---
>
> Key: ZOOKEEPER-1000
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Major
> Fix For: 3.6.0, 3.5.6
>
>
> This jira is to track SSL for zookeeper. The inter zookeeper server 
> communication and the client to server communication should be over ssl so 
> that zookeeper can be deployed over WAN's. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.

2019-05-21 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845426#comment-16845426
 ] 

Michael Han commented on ZOOKEEPER-1000:


Agreed, this sounds like a dup we can close for now. If one day we find the 
current plain socket based solution is not good enough feature / performance 
wise, we can revisit this issue which is based on SSL on top of Netty.

> Provide SSL in zookeeper to be able to run cross colos.
> ---
>
> Key: ZOOKEEPER-1000
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Major
> Fix For: 3.6.0, 3.5.6
>
>
> This jira is to track SSL for zookeeper. The inter zookeeper server 
> communication and the client to server communication should be over ssl so 
> that zookeeper can be deployed over WAN's. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3399) Remove logging in getGlobalOutstandingLimit for optimal performance.

2019-05-21 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3399:
--

 Summary: Remove logging in getGlobalOutstandingLimit for optimal 
performance.
 Key: ZOOKEEPER-3399
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3399
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


Recently we have moved some of our production clusters to the top of the trunk. 
One issue we found is a performance regression on read and write latency on the 
clusters where the quorum is also serving traffic. The average read latency 
increased by 50x, p99 read latency increased by 300x. 

The root cause is a log statement introduced in ZOOKEEPER-3177 (PR711), where 
we added a LOG.info statement in getGlobalOutstandingLimit. 
getGlobalOutstandingLimit is on the critical code path for request processing 
and for each request, it will be called twice (one at processing the packet, 
one at finalizing the request response). This not only degrades performance of 
the server, but also bloated the log file, when the QPS of a server is high.

This only impacts clusters when the quorum (leader + follower) is serving 
traffic. For clusters where only observers are serving traffic no impact is 
observed.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3352) Use LevelDB For Backend

2019-04-09 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814023#comment-16814023
 ] 

Michael Han commented on ZOOKEEPER-3352:


I don't see an obvious gain of using a LSM tree backend just for snapshot and 
txn log. For zk clients, the read path will be served directly from memory 
(think zk data tree as a 'memtable' that never flushes); the write path is 
already sequential for both snapshot and txn log. Reading snapshot and txn log 
out of LSM tree instead of flat files might reduce recovery time, but I doubt 
the difference is substantial.

That said, having a LSM tree backend and build the zk data tree on top of it 
will make it possible to store a much larger scale of data set per single node, 
as we will not store all data in memory only.

> Use LevelDB For Backend
> ---
>
> Key: ZOOKEEPER-3352
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3352
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
>
> Use LevelDB for managing data stored in ZK (transaction logs and snapshots).
> https://stackoverflow.com/questions/6779669/does-leveldb-support-java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket

2019-01-13 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741714#comment-16741714
 ] 

Michael Han commented on ZOOKEEPER-3240:


[~nixon] :
bq.  so the Leader is unable to sense the change in Learner status through the 
status of the network connection

A plausible theory :)

The ping packet between leader and learners is designed to solve this exact 
problem - to be able to detect liveness of the other side. Basically for each 
learner, leader will constantly read packets out of the socket associated with 
the learner in the corresponding LearnerHandler thread. And this read has a 
timeout configured on the socket on leader side, so even if the sockets on both 
side are valid, but there is no traffic (such as in this case, where learner 
leaks sockets by not properly closing them after shutting down), leader's read 
should eventually time out after sync limit check. Unless:

* Leader's socket read time out has no effect. So leader will block on reading 
a socket indefinitely because there is no traffic from learner.
* Learner process, after restarted, somehow ended up with reusing the old 
Learner socket that's leaked so the corresponding LearnerHandler thread can't 
detect any difference (which is expected.). I am not sure how possible this 
case is in practice.

In any case, it seems that our Ping mechanism failed to detect the network 
change in this case.

bq. the learner queue size keeps growing

Do you mind elaborate a little bit on which exact queue this is and what caused 
it growing?





> Close socket on Learner shutdown to avoid dangling socket
> -
>
> Key: ZOOKEEPER-3240
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There was a Learner that had two connections to the Leader after that Learner 
> hit an unexpected exception during flush txn to disk, which will shutdown 
> previous follower instance and restart a new one.
>  
> {quote}2018-10-26 02:31:35,568 ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Input/output error
>     at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>     at 
> java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
>     at 
> java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
>     at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - 
> Thread SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - 
> SyncRequestProcessor exited!{quote}
>  
> It is supposed to close the previous socket, but it doesn't seem to be done 
> anywhere in the code. This leaves the socket open with no one reading from 
> it, and caused the queue full and blocked on sender.
>  
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size 
> keeps growing, the JVM heap size on leader keeps growing and added pressure 
> to the GC, and cause high GC time and latency in the quorum.
>  
> The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket

2019-01-11 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740161#comment-16740161
 ] 

Michael Han commented on ZOOKEEPER-3240:


[~nixon] Good catch, the fix looks reasonable. 

I've seen similar issue in my production environment, the fix I did was on 
Leader side where I tracked the LearnerHandler threads associated with server 
ids, and make sure each server id only has a single LearnerHandler thread. This 
also work in cases where the learners don't have a chance to close their 
sockets, or they did but due to some reasons the TCP reset never made it to 
leader. But in any case, it's good to fix the resource leaking on learner side.

I also wonder why we could get into such case on Leader side in first place. On 
leader, we do have socket read timeout set via setSoTimeout for leaner handler 
threads (after the socket was created via serverSocket.accept), and each 
learner handler would constantly polling / trying read from the socket 
afterwards. If, on a learner it dies but left a valid socket open, I was 
expecting one leader side the LearnerHandler thread that trying to read from 
that died learner socket will eventually timeout, which, will throw 
SocketTimeOutException and cause the LearnerHandler thread on the leader kill 
itself. This though does not seem to be the case I observed. Do you have any 
insights on this?

> Close socket on Learner shutdown to avoid dangling socket
> -
>
> Key: ZOOKEEPER-3240
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There was a Learner that had two connections to the Leader after that Learner 
> hit an unexpected exception during flush txn to disk, which will shutdown 
> previous follower instance and restart a new one.
>  
> {quote}2018-10-26 02:31:35,568 ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Input/output error
>     at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>     at 
> java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
>     at 
> java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
>     at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - 
> Thread SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - 
> SyncRequestProcessor exited!{quote}
>  
> It is supposed to close the previous socket, but it doesn't seem to be done 
> anywhere in the code. This leaves the socket open with no one reading from 
> it, and caused the queue full and blocked on sender.
>  
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size 
> keeps growing, the JVM heap size on leader keeps growing and added pressure 
> to the GC, and cause high GC time and latency in the quorum.
>  
> The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2018-12-12 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719752#comment-16719752
 ] 

Michael Han commented on ZOOKEEPER-3180:


My experience with JVM GC and ZooKeeper is GC is rarely a real issue in 
production if tuned correctly (I ran fairly large ZK fleet which kind push ZK 
to its limit). Most GC issue I had is software bugs - such as leaking 
connections. For this cache case, the current implementation is good enough for 
my use case, though I do have interests on off heap solutions as well. My 
concern around off heap solution is it's probably going to be more complicated, 
and has overhead of serialization / deserialization between heap / off heap. 
I'd say we get this patch landed, have more people tested it out, then improve 
it with more options.

 

And for caching in general, obviously it depends a lot on workload and actual 
use case, so it's kind hard to provide a cache solution that works for everyone 
in first place...

> Add response cache to improve the throughput of read heavy traffic 
> ---
>
> Key: ZOOKEEPER-3180
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> On read heavy use case with large response data size, the serialization of 
> response takes time and added overhead to the GC.
> Add response cache helps improving the throughput we can support, which also 
> reduces the latency in general.
> This Jira is going to implement a LRU cache for the response, which shows 
> some performance gain on some of our production ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3211) zookeeper standalone mode,found a high level bug in kernel of centos7.0 ,zookeeper Server's tcp/ip socket connections(default 60 ) are CLOSE_WAIT ,this lead to zk

2018-12-12 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719736#comment-16719736
 ] 

Michael Han commented on ZOOKEEPER-3211:


{quote}Have similar defects been solved in 3.4.13?
{quote}
Previously there were reports about CLOSE_WAIT, but if I remember correctly, 
most of those cases ended up no actions taken because it was hard to reproduce. 
{quote}It looks like zk Server is deadlocked
{quote}
The thread dump in 1.log file indicates some threads are blocked, but that 
seems a symptom rather than the cause. If we run out available sockets then 
some zookeeper threads that involves file IO / socket IO will be blocked. 

 
{quote}Does this cause CLOSE_WAIT for zk?
{quote}
Most of time, long living CLOSE_WAIT connections indicate an application side 
bug instead of kernel bug - that the connection should be closed but for some 
reasons the application, after receiving TCP reset from clients can't close the 
connection - which effectively leaks connections. The upgrade of kernel could 
be a trigger though. 

 

I am interested to know if any other folks can reproduce this. I currently 
don't have the environment to reproduce this.

 

Also, [~yss] can you please use zip file instead of rar file for uploading log 
files? 

Another thing to try is to increase your limit of open file descriptors - seems 
its currently set as 60? If you increase it (ulimit), you could end up still 
leaking connections but the server should be available before its running out 
of sockets.

> zookeeper standalone mode,found a high level bug in kernel of centos7.0 
> ,zookeeper Server's  tcp/ip socket connections(default 60 ) are CLOSE_WAIT 
> ,this lead to zk can't work for client any more
> --
>
> Key: ZOOKEEPER-3211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3211
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
> Environment: 1.zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel
> kernel:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
>Reporter: yeshuangshuang
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: 1.log, zklog.rar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1.config--zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel version
> version:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
> 3.bug details:
> Occasionally,But the recurrence probability is extremely high. At first, the 
> read-write timeout takes about 6s, and after a few minutes, all connections 
> (including long ones) will be CLOSE_WAIT state.
> 4.:Circumvention scheme: it is found that all connections become close_wait 
> to restart the zookeeper server side actively



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3211) zookeeper standalone mode,found a high level bug in kernel of centos7.0 ,zookeeper Server's tcp/ip socket connections(default 60 ) are CLOSE_WAIT ,this lead to zk

2018-12-12 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719680#comment-16719680
 ] 

Michael Han commented on ZOOKEEPER-3211:


[~yss] Have you tried newer version of stable zookeeper release (e.g. 3.4.13), 
as well as different versions of OS? 3.4.5 is a pretty old version. 

> zookeeper standalone mode,found a high level bug in kernel of centos7.0 
> ,zookeeper Server's  tcp/ip socket connections(default 60 ) are CLOSE_WAIT 
> ,this lead to zk can't work for client any more
> --
>
> Key: ZOOKEEPER-3211
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3211
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.5
> Environment: 1.zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel
> kernel:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
>Reporter: yeshuangshuang
>Priority: Blocker
> Fix For: 3.4.5
>
> Attachments: 1.log, zklog.rar
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> 1.config--zoo.cfg
> server.1=127.0.0.1:2902:2903
> 2.kernel version
> version:Linux localhost.localdomain 3.10.0-123.el7.x86_64 #1 SMP Tue Feb 12 
> 19:44:50 EST 2019 x86_64 x86_64 x86_64 GNU/Linux
> JDK:
> java version "1.7.0_181"
> OpenJDK Runtime Environment (rhel-2.6.14.5.el7-x86_64 u181-b00)
> OpenJDK 64-Bit Server VM (build 24.181-b00, mixed mode)
> zk: 3.4.5
> 3.bug details:
> Occasionally,But the recurrence probability is extremely high. At first, the 
> read-write timeout takes about 6s, and after a few minutes, all connections 
> (including long ones) will be CLOSE_WAIT state.
> 4.:Circumvention scheme: it is found that all connections become close_wait 
> to restart the zookeeper server side actively



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3214) Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter

2018-12-12 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16719671#comment-16719671
 ] 

Michael Han commented on ZOOKEEPER-3214:


Thanks for reporting the flaky test. It's important to keep eyes on flaky tests 
which is an important signal on quality.

 

For this specific issue, it was reported before, so I am resolving this Jira 
and move discussions in the original Jira.

> Flaky test: 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter
> -
>
> Key: ZOOKEEPER-3214
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3214
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Reporter: maoling
>Priority: Minor
>
> more details in:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2901/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-12-12 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reopened ZOOKEEPER-3141:


Reopen this issue because this test was observed similar symptom recently as 
reported  in ZOOKEEPER-3124 
[https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2901/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter/].
 

> testLeaderElectionWithDisloyalVoter is flaky
> 
>
> Key: ZOOKEEPER-3141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: leaderElection, server, tests
>Affects Versions: 3.6.0
>Reporter: Michael Han
>Priority: Major
>
> The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.
> See 
> [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]
> Recent failure builds:
> [https://builds.apache.org/job/ZooKeeper-trunk//181] 
> [https://builds.apache.org/job/ZooKeeper-trunk//179] 
> [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
>  
>  
> Snapshot of the failure:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority
> Error Message
> Server 0 should have joined quorum by now
> Stacktrace
> junit.framework.AssertionFailedError: Server 0 should have joined quorum by 
> now
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3214) Flaky test: org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter

2018-12-12 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3214.

Resolution: Duplicate

> Flaky test: 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter
> -
>
> Key: ZOOKEEPER-3214
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3214
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: tests
>Reporter: maoling
>Priority: Minor
>
> more details in:
> https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2901/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3188) Improve resilience to network

2018-12-06 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16712162#comment-16712162
 ] 

Michael Han commented on ZOOKEEPER-3188:


Appreciate detailed reply, agree on replies on 1 and 2.

bq. Such changes should be handled exactly the way they are now and there 
should be no interactions with the changes to the networking stack. 

Agreed. I think I was just looking for more elaborated use cases around using 
reconfig to manipulate multiple server addresses, as the proposal does not go 
into details other than 'support dynamic reconfiguration.'. I expect dynamic 
reconfiguration will just work out of box with proper abstractions, without 
touching too much part of reconfiguration code path, but there are some 
subtleties to consider. A couple of examples:

* Proper rebalance client connections - this was discussed on dev mailing list.
* Avoid unnecessary leader elections during reconfig - this change will 
probably change the abstraction of server addresses (QuorumServer) and we 
should be careful how the QuorumServers will be compared, to avoid unnecessary 
leader elections in cases where the server set is the same but some servers 
have new server addresses.
There might be more cases to consider...

bq. The documentation, in particular, should be essentially identical except 
that an example of adding an address might be nice

I am thinking at least 
[this|https://zookeeper.apache.org/doc/r3.5.4-beta/zookeeperReconfig.html#sc_reconfig_clientport]
 should be updated to reflect the fact that 1. the config format is changed and 
2. the multiple server addresses can be manipulated via reconfig.


> Improve resilience to network
> -
>
> Key: ZOOKEEPER-3188
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3188
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We propose to add network level resiliency to Zookeeper. The ideas that we 
> have on the topic have been discussed on the mailing list and via a 
> specification document that is located at 
> [https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]
> That document is copied to this issue which is being created to report the 
> results of experimental implementations.
> h1. Zookeeper Network Resilience
> h2. Background
> Zookeeper is designed to help in building distributed systems. It provides a 
> variety of operations for doing this and all of these operations have rather 
> strict guarantees on semantics. Zookeeper itself is a distributed system made 
> up of cluster containing a leader and a number of followers. The leader is 
> designated in a process known as leader election in which a majority of all 
> nodes in the cluster must agree on a leader. All subsequent operations are 
> initiated by the leader and completed when a majority of nodes have confirmed 
> the operation. Whenever an operation cannot be confirmed by a majority or 
> whenever the leader goes missing for a time, a new leader election is 
> conducted and normal operations proceed once a new leader is confirmed.
>  
> The details of this are not important relative to this discussion. What is 
> important is that the semantics of the operations conducted by a Zookeeper 
> cluster and the semantics of how client processes communicate with the 
> cluster depend only on the basic fact that messages sent over TCP connections 
> will never appear out of order or missing. Central to the design of ZK is 
> that a server to server network connection is used as long as it works to use 
> it and a new connection is made when it appears that the old connection isn't 
> working.
>  
> As currently implemented, however, each member of a Zookeeper cluster can 
> have only a single address as viewed from some other process. This means, 
> absent network link bonding, that the loss of a single switch or a few 
> network connections could completely stop the operations of a the Zookeeper 
> cluster. It is the goal of this work to address this issue by allowing each 
> server to listen on multiple network interfaces and to connect to other 
> servers any of several addresses. The effect will be to allow servers to 
> communicate over redundant network paths to improve resiliency to network 
> failures without changing any core algorithms.
> h2. Proposed Change
> Interestingly, the correct operations of a Zookeeper cluster do not depend on 
> _how_ a TCP connection was made. There is no reason at all not to advertise 
> multiple addresses for members of a Zookeeper cluster. 
>  
> Connections between members of a Zookeeper cluster and between a client and a 
> cluster member are established by referencing a 

[jira] [Commented] (ZOOKEEPER-2778) Potential server deadlock between follower sync with leader and follower receiving external connection requests.

2018-11-20 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693936#comment-16693936
 ] 

Michael Han commented on ZOOKEEPER-2778:


I have to refresh my memory on this issue, but now looking back I think the 
gist of the issue is:
We want to guarantee to get a consistent membership view of the ensemble thus 
we need to lock (QV_LOCK) on the quorum peer when we access (read/write) to it. 
Meanwhile we need another lock on QCM itself and the order of acquiring both 
locks in different path is not consistent, thus causing dead lock.

The fix I proposed earlier and PR 707 did it by removing the QV lock on the 
read path. The problem is I am not sure how to validate its correctness given 
the intertwined code path :) - I previously while working on this was convinced 
removing QV_LOCK on read path of the three addresses is sound, now I am not 
sure.

We could also try remove one or both synchronized on the connectOne of QCM - it 
seems OK at least for the first connectOne (with two parameters) and this 
should fix this specific dead lock.

Another idea is to avoid QV lock completely by abstract the quorum verifier as 
an AtomicrReference similar as 707 did for three address field, if it's 
feasible to do so.

> Potential server deadlock between follower sync with leader and follower 
> receiving external connection requests.
> 
>
> Key: ZOOKEEPER-2778
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2778
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.5.3
>Reporter: Michael Han
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It's possible to have a deadlock during recovery phase. 
> Found this issue by analyzing thread dumps of "flaky" ReconfigRecoveryTest 
> [1]. . Here is a sample thread dump that illustrates the state of the 
> execution:
> {noformat}
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.getElectionAddress(QuorumPeer.java:686)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:265)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:445)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.receiveConnection(QuorumCnxManager.java:369)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager$Listener.run(QuorumCnxManager.java:642)
> [junit] 
> [junit]  java.lang.Thread.State: BLOCKED
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:472)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.connectNewPeers(QuorumPeer.java:1438)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.setLastSeenQuorumVerifier(QuorumPeer.java:1471)
> [junit] at  
> org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:520)
> [junit] at  
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:88)
> [junit] at  
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1133)
> {noformat}
> The dead lock happens between the quorum peer thread which running the 
> follower that doing sync with leader work, and the listener of the qcm of the 
> same quorum peer that doing the receiving connection work. Basically to 
> finish sync with leader, the follower needs to synchronize on both QV_LOCK 
> and the qmc object it owns; while in the receiver thread to finish setup an 
> incoming connection the thread needs to synchronize on both the qcm object 
> the quorum peer owns, and the same QV_LOCK. It's easy to see the problem here 
> is the order of acquiring two locks are different, thus depends on timing / 
> actual execution order, two threads might end up acquiring one lock while 
> holding another.
> [1] 
> org.apache.zookeeper.server.quorum.ReconfigRecoveryTest.testCurrentServersAreObserversInNextConfig



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3190) Spell check on the Zookeeper server files

2018-11-17 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3190.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 702
[https://github.com/apache/zookeeper/pull/702]

> Spell check on the Zookeeper server files
> -
>
> Key: ZOOKEEPER-3190
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3190
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: documentation, other
>Reporter: Dinesh Appavoo
>Priority: Minor
>  Labels: newbie, pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This JIRA is to do spell check on the zookeeper server files [ 
> zookeeper/zookeeper-server/src/main/java/org/apache/zookeeper/server ]. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3177) Refactor request throttle logic in NIO and Netty to keep the same behavior and make the code easier to maintain

2018-11-17 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3177.

Resolution: Fixed

Issue resolved by pull request 673
[https://github.com/apache/zookeeper/pull/673]

> Refactor request throttle logic in NIO and Netty to keep the same behavior 
> and make the code easier to maintain
> ---
>
> Key: ZOOKEEPER-3177
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3177
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> There is shouldThrottle logic in zkServer, we should use it in NIO as well, 
> refactor the code to make it cleaner and easier to maintain in the future.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3188) Improve resilience to network

2018-11-13 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685929#comment-16685929
 ] 

Michael Han commented on ZOOKEEPER-3188:


A couple of comments on the high level design:

* Did we consider the compatibility requirement here? Will the new 
configuration format be backward compatible? One concrete use case is if a 
customer upgrades to new version with this multiple address per server 
capability but wants to roll back without rewriting the config files to older 
version.

* Did we evaluate the impact of this feature on existing server to server 
mutual authentication and authorization feature (e.g. ZOOKEEPER-1045 for 
Kerberos, ZOOKEEPER-236 for SSL), and also the impact on operations? For 
example how to configure Kerberos principals and / or SSL certs per host given 
multiple potential IP address and / or FQDN names per server?

* Could we provide more details on expected level of support with regards to 
dynamic reconfiguration feature? Examples would be great - for example: we 
would support adding, removing, or updating server address that's appertained 
to a given server via dynamic reconfiguration, and also the expected behavior 
in each case. For example, adding a new address to an existing ensemble member 
should not cause any disconnect / reconnect but removing an in use address of a 
server should cause a disconnect. Likely the dynamic reconfig API / CLI / doc 
should be updated because of this.

> Improve resilience to network
> -
>
> Key: ZOOKEEPER-3188
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3188
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Ted Dunning
>Priority: Major
>
> We propose to add network level resiliency to Zookeeper. The ideas that we 
> have on the topic have been discussed on the mailing list and via a 
> specification document that is located at 
> [https://docs.google.com/document/d/1iGVwxeHp57qogwfdodCh9b32P2_kOQaJZ2GDo7j36fI/edit?usp=sharing]
> That document is copied to this issue which is being created to report the 
> results of experimental implementations.
> h1. Zookeeper Network Resilience
> h2. Background
> Zookeeper is designed to help in building distributed systems. It provides a 
> variety of operations for doing this and all of these operations have rather 
> strict guarantees on semantics. Zookeeper itself is a distributed system made 
> up of cluster containing a leader and a number of followers. The leader is 
> designated in a process known as leader election in which a majority of all 
> nodes in the cluster must agree on a leader. All subsequent operations are 
> initiated by the leader and completed when a majority of nodes have confirmed 
> the operation. Whenever an operation cannot be confirmed by a majority or 
> whenever the leader goes missing for a time, a new leader election is 
> conducted and normal operations proceed once a new leader is confirmed.
>  
> The details of this are not important relative to this discussion. What is 
> important is that the semantics of the operations conducted by a Zookeeper 
> cluster and the semantics of how client processes communicate with the 
> cluster depend only on the basic fact that messages sent over TCP connections 
> will never appear out of order or missing. Central to the design of ZK is 
> that a server to server network connection is used as long as it works to use 
> it and a new connection is made when it appears that the old connection isn't 
> working.
>  
> As currently implemented, however, each member of a Zookeeper cluster can 
> have only a single address as viewed from some other process. This means, 
> absent network link bonding, that the loss of a single switch or a few 
> network connections could completely stop the operations of a the Zookeeper 
> cluster. It is the goal of this work to address this issue by allowing each 
> server to listen on multiple network interfaces and to connect to other 
> servers any of several addresses. The effect will be to allow servers to 
> communicate over redundant network paths to improve resiliency to network 
> failures without changing any core algorithms.
> h2. Proposed Change
> Interestingly, the correct operations of a Zookeeper cluster do not depend on 
> _how_ a TCP connection was made. There is no reason at all not to advertise 
> multiple addresses for members of a Zookeeper cluster. 
>  
> Connections between members of a Zookeeper cluster and between a client and a 
> cluster member are established by referencing a configuration file (for 
> cluster members) that specifies the address of all of the nodes in a cluster 
> or by using a connection string containing possible addresses of Zookeeper 
> cluster members. As soon as a connection is made, any desired authentication 
> or encryption 

[jira] [Commented] (ZOOKEEPER-1441) Some test cases are failing because Port bind issue.

2018-11-09 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682124#comment-16682124
 ] 

Michael Han commented on ZOOKEEPER-1441:


PortAssignment itself is fine and if everyone is using it, they should not get 
conflicts because PortAssignment is the single source of truth of port 
allocation. However, the problem here is not every processes running on test 
machine using PortAssignment, despite most, if not all of ZK unit tests do use 
it. So if there are heavy workloads running on the test machine while ZK unit 
tests were running, potential port conflicts would occur.

>> I never actually got why PortAssigment tries to bind the port before returns

What PortAssignment implemented is a "reserve and release" pattern for port 
allocation, and this is better than "choose a port but not reserver" approach, 
because it is very unlikely the OS, regardless of how it allocates actual ports 
to the processes, will yield two consecutive port for two socket bind calls. 
Thus, by creating the socket via bind, and the immediately close it, we buy us 
sometime during which OS will not reuse this same socket for a successive 
socket call. This time however varies, thus there could be race conditions that 
by the time we actually going to bind this port again, it's already grabbed by 
another process. For ZK server, it requires an unbinded port number pass to it 
(otherwise it can't bind the port), but due to the same race condition it's 
possible when the server tries to bind, the port was taken already. The only 
way to guarantee atomicity in this case is to have ZK server asking a port from 
OS and bind immediately.


> Some test cases are failing because Port bind issue.
> 
>
> Key: ZOOKEEPER-1441
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1441
> Project: ZooKeeper
>  Issue Type: Test
>  Components: server, tests
>Reporter: kavita sharma
>Assignee: Michael Han
>Priority: Major
>  Labels: flaky, flaky-test
>
> very frequently testcases are failing because of :
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind(Native Method)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:111)
>   at 
> org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:112)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.(QuorumPeer.java:514)
>   at 
> org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:156)
>   at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103)
>   at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67)
> may be because of Port Assignment so please give me some suggestions if 
> someone is also facing same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1441) Some test cases are failing because Port bind issue.

2018-11-09 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682112#comment-16682112
 ] 

Michael Han commented on ZOOKEEPER-1441:


PortAssignment itself could also be more flaky under Java 11 because it can't 
guarantee atomicity between the time of allocation of a port and the time of 
actually binding the port inside a ZK server. I remember [~lvfangmin] mentioned 
that in FB they improved PortAssignment by using random ports rather than 
sequential port, which might help here. Alternatively we could also let ZK 
server to atomically allocate and bind a port inside it and then return the 
binded port number to caller, for testing purpose, rather than having to pass a 
port in, which will fix the root cause of the issue.

> Some test cases are failing because Port bind issue.
> 
>
> Key: ZOOKEEPER-1441
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1441
> Project: ZooKeeper
>  Issue Type: Test
>  Components: server, tests
>Reporter: kavita sharma
>Assignee: Michael Han
>Priority: Major
>  Labels: flaky, flaky-test
>
> very frequently testcases are failing because of :
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind(Native Method)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:111)
>   at 
> org.apache.zookeeper.server.ServerCnxnFactory.createFactory(ServerCnxnFactory.java:112)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.(QuorumPeer.java:514)
>   at 
> org.apache.zookeeper.test.QuorumBase.startServers(QuorumBase.java:156)
>   at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:103)
>   at org.apache.zookeeper.test.QuorumBase.setUp(QuorumBase.java:67)
> may be because of Port Assignment so please give me some suggestions if 
> someone is also facing same problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3173) Quorum TLS - support PEM trust/key stores

2018-10-25 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664684#comment-16664684
 ] 

Michael Han commented on ZOOKEEPER-3173:


[~ilyam] just added your alias into Contributor role group in JIRA. feel free 
to assign issues to yourself. 

> Quorum TLS - support PEM trust/key stores
> -
>
> Key: ZOOKEEPER-3173
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3173
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.6.0, 3.5.5
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-236 is landed so there is some TLS support in Zookeeper now, but 
> only JKS trust stores are supported. JKS is not really used by non-Java 
> software, where PKCS12 and PEM are more standard. Let's add support for PEM 
> trust / key stores to make Quorum TLS easier to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3174) Quorum TLS - support reloading trust/key store

2018-10-25 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3174:
--

Assignee: Ilya Maykov

> Quorum TLS - support reloading trust/key store
> --
>
> Key: ZOOKEEPER-3174
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3174
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.6.0, 3.5.5
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The Quorum TLS feature recently added in ZOOKEEPER-236 doesn't support 
> reloading a trust/key store from disk when it changes. In an environment 
> where short-lived certificates are used and are refreshed by some background 
> daemon / cron job, this is a problem. Let's support reloading a trust/key 
> store from disk when the file on disk changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3172) Quorum TLS - fix port unification to allow rolling upgrades

2018-10-25 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3172:
--

Assignee: Ilya Maykov

> Quorum TLS - fix port unification to allow rolling upgrades
> ---
>
> Key: ZOOKEEPER-3172
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3172
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: security, server
>Affects Versions: 3.6.0, 3.5.5
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-236 was committed with port unification support disabled, because 
> of various issues with the implementation. These issues should be fixed so 
> port unification can be enabled again. Port unification is necessary to 
> upgrade an ensemble from plaintext to TLS quorum connections without downtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3173) Quorum TLS - support PEM trust/key stores

2018-10-25 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3173:
--

Assignee: Ilya Maykov

> Quorum TLS - support PEM trust/key stores
> -
>
> Key: ZOOKEEPER-3173
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3173
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.6.0, 3.5.5
>Reporter: Ilya Maykov
>Assignee: Ilya Maykov
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> ZOOKEEPER-236 is landed so there is some TLS support in Zookeeper now, but 
> only JKS trust stores are supported. JKS is not really used by non-Java 
> software, where PKCS12 and PEM are more standard. Let's add support for PEM 
> trust / key stores to make Quorum TLS easier to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2018-10-25 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664668#comment-16664668
 ] 

Michael Han commented on ZOOKEEPER-3180:


This will be a very useful feature for my prod env as well, where some of our 
read heavy workloads require serialize large payload from (an almost immutable 
part of) the data tree - in our case it's not the data stored but the 
getChildren call with tens of thousands children under the zNode. I'll be glad 
to review and test the patch in our prod env.

> Add response cache to improve the throughput of read heavy traffic 
> ---
>
> Key: ZOOKEEPER-3180
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Priority: Minor
> Fix For: 3.6.0
>
>
> On read heavy use case with large response data size, the serialization of 
> response takes time and added overhead to the GC.
> Add response cache helps improving the throughput we can support, which also 
> reduces the latency in general.
> This Jira is going to implement a LRU cache for the response, which shows 
> some performance gain on some of our production ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3163) Use session map to improve the performance when closing session in Netty

2018-10-23 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3163.

Resolution: Fixed

Issue resolved by pull request 665
[https://github.com/apache/zookeeper/pull/665]

> Use session map to improve the performance when closing session in Netty
> 
>
> Key: ZOOKEEPER-3163
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3163
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Previously, it needs to go through all the cnxns to find out the session to 
> close, which is O(N), N is the total connections we have.
> This will affect the performance of close session or renew session if there 
> are lots of connections on this server, this JIRA is going to reuse the 
> session map code in NIO implementation to improve the performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2018-10-23 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661452#comment-16661452
 ] 

Michael Han commented on ZOOKEEPER-3180:


What will be we caching here? Is it the byte buffers that holding the 
(serialized) response body that going to write out to socket?

> Add response cache to improve the throughput of read heavy traffic 
> ---
>
> Key: ZOOKEEPER-3180
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Priority: Minor
> Fix For: 3.6.0
>
>
> On read heavy use case with large response data size, the serialization of 
> response takes time and added overhead to the GC.
> Add response cache helps improving the throughput we can support, which also 
> reduces the latency in general.
> This Jira is going to implement a LRU cache for the response, which shows 
> some performance gain on some of our production ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3179) Add snapshot compression to reduce the disk IO

2018-10-23 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16661432#comment-16661432
 ] 

Michael Han commented on ZOOKEEPER-3179:


Good feature. 

We can also consider provide the option to offload compression / decompression 
to dedicated hardware - e.g. FPGA. 

> Add snapshot compression to reduce the disk IO
> --
>
> Key: ZOOKEEPER-3179
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3179
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Fangmin Lv
>Priority: Major
> Fix For: 3.6.0
>
>
> When the snapshot becomes larger, the periodically snapshot after certain 
> number of txns will be more expensive. Which will in turn affect the maximum 
> throughput we can support within SLA, because of the disk contention between 
> snapshot and txn when they're on the same drive.
>  
> With compression like zstd/snappy/gzip, the actual snapshot size could be 
> much smaller, the compress ratio depends on the actual data. It might make 
> the recovery time (loading from disk) faster in some cases, but will take 
> longer sometimes because of the extra time used to compress/decompress.
>  
> Based on the production traffic, the performance various with different 
> compress method as well, that's why we provided different implementations, we 
> can select different compress method for different use cases.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-30 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-2847.

Resolution: Fixed

Issue resolved by pull request 649
[https://github.com/apache/zookeeper/pull/649]

> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-1177) Enabling a large number of watches for a large number of clients

2018-09-28 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-1177.

   Resolution: Fixed
Fix Version/s: (was: 3.5.5)

Issue resolved by pull request 590
[https://github.com/apache/zookeeper/pull/590]

> Enabling a large number of watches for a large number of clients
> 
>
> Key: ZOOKEEPER-1177
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1177
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Fangmin Lv
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
> Attachments: ZOOKEEPER-1177.patch, ZOOKEEPER-1177.patch, 
> ZooKeeper-with-fix-for-findbugs-warning.patch, ZooKeeper.patch, 
> Zookeeper-after-resolving-merge-conflicts.patch
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> In my ZooKeeper, I see watch manager consuming several GB of memory and I dug 
> a bit deeper.
> In the scenario I am testing, I have 10K clients connected to an observer. 
> There are about 20K znodes in ZooKeeper, each is about 1K - so about 20M data 
> in total.
> Each client fetches and puts watches on all the znodes. That is 200 million 
> watches.
> It seems a single watch takes about 100  bytes. I am currently at 14528037 
> watches and according to the yourkit profiler, WatchManager has 1.2 G 
> already. This is not going to work as it might end up needing 20G of RAM just 
> for the watches.
> So we need a more compact way of storing watches. Here are the possible 
> solutions.
> 1. Use a bitmap instead of the current hashmap. In this approach, each znode 
> would get a unique id when its gets created. For every session, we can keep 
> track of a bitmap that indicates the set of znodes this session is watching. 
> A bitmap, assuming a 100K znodes, would be 12K. For 10K sessions, we can keep 
> track of watches using 120M instead of 20G.
> 2. This second idea is based on the observation that clients watch znodes in 
> sets (for example all znodes under a folder). Multiple clients watch the same 
> set and the total number of sets is a couple of orders of magnitude smaller 
> than the total number of znodes. In my scenario, there are about 100 sets. So 
> instead of keeping track of watches at the znode level, keep track of it at 
> the set level. It may mean that get may also need to be implemented at the 
> set level. With this, we can save the watches in 100M.
> Are there any other suggestions of solutions?
> Thanks
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3157) Improve FuzzySnapshotRelatedTest to avoid flaky due to issues like connection loss

2018-09-28 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632139#comment-16632139
 ] 

Michael Han commented on ZOOKEEPER-3157:


[~lvfangmin] thanks for making a fix on this.

For this specific flaky test, we could either do what I suggested there (by 
wrapping the getData with some retry logic), or apply junit.RetryRule for this 
specific test case only since we know the cause and the fix should be retry 
anyway. I suggest we should not add junit.RetryRule to all test cases / 
ZKTestCase for reasons I mentioned here 
https://github.com/apache/zookeeper/pull/605#issuecomment-425496416.

> Improve FuzzySnapshotRelatedTest to avoid flaky due to issues like connection 
> loss
> --
>
> Key: ZOOKEEPER-3157
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3157
> Project: ZooKeeper
>  Issue Type: Test
>  Components: tests
>Affects Versions: 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
> Fix For: 3.6.0
>
>
> [~hanm] noticed that the test might failure because of ConnectionLoss when 
> trying to getData, [here is an 
> example|https://builds.apache.org/job/ZooKeepertrunk/198/testReport/junit/org.apache.zookeeper.server.quorum/FuzzySnapshotRelatedTest/testPZxidUpdatedWhenLoadingSnapshot],
>  we should catch this and retry to avoid flaky.
> Internally, we 'fixed' flaky test by adding junit.RetryRule in ZKTestCase, 
> which is the base class for most of the tests. I'm not sure this is the right 
> way to go or not, since it's actually 'hiding' the flaky tests, but this will 
> help reducing the flaky tests a lot if we're not going to tackle it in the 
> near time, and we can check the testing history to find out which tests are 
> flaky and deal with them separately. So let me know if this seems to provide 
> any benefit in short term, if it is I'll provide a patch to do that.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-28 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16632131#comment-16632131
 ] 

Michael Han commented on ZOOKEEPER-2847:


not sure, this test passed in the pre-commit check build (2237)

[https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2237/testReport/org.apache.zookeeper.server.quorum/ReconfigLegacyTest/testReconfigRemoveClientFromStatic/]
 

though it fails deterministically on my local box (and now on jenkins). 

> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-24 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626539#comment-16626539
 ] 

Michael Han commented on ZOOKEEPER-2847:


[~yisong-yue] Thanks for quick response. 
bq. by that you mean open another issue, right

We can reuse this JIRA issue for the fix. But if you like, create a new issue 
is also OK. up to you :)

> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-24 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626486#comment-16626486
 ] 

Michael Han commented on ZOOKEEPER-2847:


[~yisong-yue] Do you want to put up another patch to fix the unit test failure 
of testReconfigRemoveClientFromStatic? 

btw I am not sure why a previous old passes 
https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2237/ - that 
is why i did not catch this until today ran the build both locally and on 
jenkins.

> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-24 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reopened ZOOKEEPER-2847:


> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-24 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16626485#comment-16626485
 ] 

Michael Han commented on ZOOKEEPER-2847:


looks like this patch breaks recent trunk build. There is at least one reconfig 
test failing deterministically. To reproduce:

{code:java}
ant test -Dtestcase=ReconfigLegacyTest 
-Dtest.method=testReconfigRemoveClientFromStatic test-core-java
{code}

This test was broken because it expected that if "clientPort" was available in 
static config file it should be kept there, for compatibility reasons 
(ZOOKEEPER-1992). The code checks if the condition is met using 
QuorumServer.clientAddr, which is null iff there is no clientPort in the static 
config file. The fix in this patch broke this assumption, because now 
QuorumServer.clientAddr will always be assigned a value, there is no way to 
differentiate if the value was assigned by reading from the server config 
portion of the static config file, or from the code (if (qs != null && 
qs.clientAddr == null) qs.clientAddr = clientPortAddress;). As a result, 
needEraseClientInfoFromStaticConfig now return true even if clientPort 
configuration is available in static config file, which leads to this test 
fails.

I think we can use a dedicated field to represent the state of "should erase" 
for static config file to fix this.


> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3146) Limit the maximum client connections per IP in NettyServerCnxnFactory

2018-09-21 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3146.

Resolution: Fixed

Issue resolved by pull request 623
[https://github.com/apache/zookeeper/pull/623]

>  Limit the maximum client connections per IP in NettyServerCnxnFactory
> --
>
> Key: ZOOKEEPER-3146
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3146
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> There is maximum connections per IP limit in NIOServerCnxnFactory 
> implementation, but not exist in Netty, this is useful to avoid spamming 
> happened on prod ensembles. 
> This Jira is going to add similar throttling logic in NettyServerCnxnFactory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-2847) Cannot bind to client port when reconfig based on old static config

2018-09-21 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-2847.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 620
[https://github.com/apache/zookeeper/pull/620]

> Cannot bind to client port when reconfig based on old static config
> ---
>
> Key: ZOOKEEPER-2847
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2847
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Yisong Yue
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When started the ensemble with old static config that the server string 
> doesn't have client port, dynamically remove and add the same server from the 
> ensemble will cause that server cannot bind to client port, and the ZooKeeper 
> server cannot serve client requests anymore.
> From the code, we'll set the clientAddr to null when start up with old static 
> config, and dynamic config forces to have  part, which will 
> trigger the following rebind code in QuorumPeer#processReconfig, and cause 
> the address already in used issue.
> public boolean processReconfig(QuorumVerifier qv, Long suggestedLeaderId, 
> Long zxid, boolean restartLE) {
> ...
> if (myNewQS != null && myNewQS.clientAddr != null
> && !myNewQS.clientAddr.equals(oldClientAddr)) {
> cnxnFactory.reconfigure(myNewQS.clientAddr);
> updateThreadName();
> }
> ...
> }



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3151) Jenkins github integration is broken if retriggering the precommit job through Jenkins admin web page.

2018-09-19 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3151.

Resolution: Workaround

Provided work around in 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/HowToContribute


> Jenkins github integration is broken if retriggering the precommit job 
> through Jenkins admin web page.
> --
>
> Key: ZOOKEEPER-3151
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3151
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: build-infrastructure
>Reporter: Michael Han
>Assignee: Michael Han
>Priority: Minor
>  Labels: pull-request-available
> Attachments: screen.png
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When trigger a precommit check Jenkins job directly through the [web 
> interface|https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/] 
> , the result can't be relayed back on github, after the job finished. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3151) Jenkins github integration is broken if retriggering the precommit job through Jenkins admin web page.

2018-09-19 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3151:
--

 Summary: Jenkins github integration is broken if retriggering the 
precommit job through Jenkins admin web page.
 Key: ZOOKEEPER-3151
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3151
 Project: ZooKeeper
  Issue Type: Bug
  Components: build-infrastructure
Reporter: Michael Han
Assignee: Michael Han
 Attachments: screen.png

When trigger a precommit check Jenkins job directly through the [web 
interface|https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/] , 
the result can't be relayed back on github, after the job finished. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3098) Add additional server metrics

2018-09-17 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3098.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 580
[https://github.com/apache/zookeeper/pull/580]

> Add additional server metrics
> -
>
> Key: ZOOKEEPER-3098
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3098
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Joseph Blomstedt
>Assignee: Joseph Blomstedt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> This patch adds several new server-side metrics as well as makes it easier to 
> add new metrics in the future. This patch also includes a handful of other 
> minor metrics-related changes.
> Here's a high-level summary of the changes.
>  # This patch extends the request latency tracked in {{ServerStats}} to track 
> {{read}} and {{update}} latency separately. Updates are any request that must 
> be voted on and can change data, reads are all requests that can be handled 
> locally and don't change data.
>  # This patch adds the {{ServerMetrics}} logic and the related 
> {{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to 
> make it incredibly easy to add new metrics. To add a new metric you just add 
> one line to {{ServerMetrics}} and then directly reference that new metric 
> anywhere in the code base. The {{ServerMetrics}} logic handles creating the 
> metric, properly adding the metric to the JSON output of the {{/monitor}} 
> admin command, and properly resetting the metric when necessary. The 
> motivation behind {{ServerMetrics}} is to make things easy enough that it 
> encourages new metrics to be added liberally. Lack of in-depth 
> metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most 
> of our internal changes build on {{ServerMetrics}} and we have nearly 100 
> internal metrics at this time – all of which we'll be upstreaming in the 
> coming months as we publish more internal patches.
>  # This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
>  # This patch replaces some uses of {{synchronized}} in {{ServerStats}} with 
> atomic operations.
> Here's a list of new metrics added in this patch:
>  - {{uptime}}: time that a peer has been in a stable 
> leading/following/observing state
>  - {{leader_uptime}}: uptime for peer in leading state
>  - {{global_sessions}}: count of global sessions
>  - {{local_sessions}}: count of local sessions
>  - {{quorum_size}}: configured ensemble size
>  - {{synced_observers}}: similar to existing `synced_followers` but for 
> observers
>  - {{fsynctime}}: time to fsync transaction log (avg/min/max)
>  - {{snapshottime}}: time to write a snapshot (avg/min/max)
>  - {{dbinittime}}: time to reload database – read snapshot + apply 
> transactions (avg/min/max)
>  - {{readlatency}}: read request latency (avg/min/max)
>  - {{updatelatency}}: update request latency (avg/min/max)
>  - {{propagation_latency}}: end-to-end latency for updates, from proposal on 
> leader to committed-to-datatree on a given host (avg/min/max)
>  - {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
>  - {{election_time}}: time between entering and leaving election (avg/min/max)
>  - {{looking_count}}: number of transitions into looking state
>  - {{diff_count}}: number of diff syncs performed
>  - {{snap_count}}: number of snap syncs performed
>  - {{commit_count}}: number of commits performed on leader
>  - {{connection_request_count}}: number of incoming client connection requests
>  - {{bytes_received_count}}: similar to existing `packets_received` but 
> tracks bytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-09-14 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615502#comment-16615502
 ] 

Michael Han commented on ZOOKEEPER-3141:


Thanks [~lvfangmin], appreciate your help on fixing this.

To identify a build where this test fail, you can start at flaky test dashboard:

[https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html]
 

You will see this test is currently ranking on of the top flaky tests. Then 
click the show/hide label of the right most column it will expand and list the 
builds. 

Currently, these builds can be used to triage

[179|https://builds.apache.org/job/ZooKeeper-trunk//179]   
[181|https://builds.apache.org/job/ZooKeeper-trunk//181]   
[189|https://builds.apache.org/job/ZooKeeper-trunk//189] 

> testLeaderElectionWithDisloyalVoter is flaky
> 
>
> Key: ZOOKEEPER-3141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server, tests
>Affects Versions: 3.6.0
>Reporter: Michael Han
>Priority: Major
>
> The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.
> See 
> [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]
> Recent failure builds:
> [https://builds.apache.org/job/ZooKeeper-trunk//181] 
> [https://builds.apache.org/job/ZooKeeper-trunk//179] 
> [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
>  
>  
> Snapshot of the failure:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority
> Error Message
> Server 0 should have joined quorum by now
> Stacktrace
> junit.framework.AssertionFailedError: Server 0 should have joined quorum by 
> now
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-09-14 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16615502#comment-16615502
 ] 

Michael Han edited comment on ZOOKEEPER-3141 at 9/14/18 11:23 PM:
--

Thanks [~lvfangmin], appreciate your help on fixing this.

To identify a build where this test fail, you can start at flaky test dashboard:

[https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html]
 

You will see this test is currently ranking as one of the top flaky tests. Then 
click the show/hide label of the right most column it will expand and list the 
builds. 

Currently, these builds can be used to triage

[179|https://builds.apache.org/job/ZooKeeper-trunk//179]   
[181|https://builds.apache.org/job/ZooKeeper-trunk//181]   
[189|https://builds.apache.org/job/ZooKeeper-trunk//189] 


was (Author: hanm):
Thanks [~lvfangmin], appreciate your help on fixing this.

To identify a build where this test fail, you can start at flaky test dashboard:

[https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/lastSuccessfulBuild/artifact/report.html]
 

You will see this test is currently ranking on of the top flaky tests. Then 
click the show/hide label of the right most column it will expand and list the 
builds. 

Currently, these builds can be used to triage

[179|https://builds.apache.org/job/ZooKeeper-trunk//179]   
[181|https://builds.apache.org/job/ZooKeeper-trunk//181]   
[189|https://builds.apache.org/job/ZooKeeper-trunk//189] 

> testLeaderElectionWithDisloyalVoter is flaky
> 
>
> Key: ZOOKEEPER-3141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server, tests
>Affects Versions: 3.6.0
>Reporter: Michael Han
>Priority: Major
>
> The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.
> See 
> [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]
> Recent failure builds:
> [https://builds.apache.org/job/ZooKeeper-trunk//181] 
> [https://builds.apache.org/job/ZooKeeper-trunk//179] 
> [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
>  
>  
> Snapshot of the failure:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority
> Error Message
> Server 0 should have joined quorum by now
> Stacktrace
> junit.framework.AssertionFailedError: Server 0 should have joined quorum by 
> now
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3140) Allow Followers to host Observers

2018-09-14 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3140:
--

Assignee: Brian Nixon

> Allow Followers to host Observers
> -
>
> Key: ZOOKEEPER-3140
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3140
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Observers function simple as non-voting members of the ensemble, sharing the 
> Learner interface with Followers and holding only a slightly difference 
> internal pipeline. Both maintain connections along the quorum port with the 
> Leader by which they learn of all new proposals on the ensemble. 
>  
>  There are benefits to allowing Observers to connect to the Followers to plug 
> into the commit stream in addition to connecting to the Leader. It shifts the 
> burden of supporting Observers off the Leader and allow it to focus on 
> coordinating the commit of writes. This means better performance when the 
> Leader is under high load, particularly high network load such as can happen 
> after a leader election when many Learners need to sync. It also reduces the 
> total network connections maintained on the Leader when there are a high 
> number of observers. One the other end, Observer availability is improved 
> since it will take shorter time for a high number of Observers to finish 
> syncing and start serving client traffic.
>  
>  The current implementation only supports scaling the number of Observers 
> into the hundreds before performance begins to degrade. By opening up 
> Followers to also host Observers, over a thousand observers can be hosted on 
> a typical ensemble without major negative impact under both normal operation 
> and during post-leader election sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3142) Extend SnapshotFormatter to dump data in json format

2018-09-14 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3142.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 619
[https://github.com/apache/zookeeper/pull/619]

> Extend SnapshotFormatter to dump data in json format
> 
>
> Key: ZOOKEEPER-3142
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3142
> Project: ZooKeeper
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Json format can be chained into other tools such as ncdu. Extend the 
> SnapshotFormatter functionality to dump json.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-2122) Impplement SSL support in the Zookeeper C client library

2018-09-14 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-2122:
--

Assignee: shuoshi  (was: Ashish Amarnath)

> Impplement SSL support in the Zookeeper C client library
> 
>
> Key: ZOOKEEPER-2122
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2122
> Project: ZooKeeper
>  Issue Type: Sub-task
>  Components: c client
>Affects Versions: 3.5.0
>Reporter: Ashish Amarnath
>Assignee: shuoshi
>Priority: Trivial
>  Labels: build, pull-request-available, security
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Implement SSL support in the Zookeeper C client library to work with the 
> secure server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3144) Potential ephemeral nodes inconsistent due to global session inconsistent with fuzzy snapshot

2018-09-14 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3144.

Resolution: Fixed

Issue resolved by pull request 621
[https://github.com/apache/zookeeper/pull/621]

> Potential ephemeral nodes inconsistent due to global session inconsistent 
> with fuzzy snapshot
> -
>
> Key: ZOOKEEPER-3144
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3144
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Found this issue recently when checking another prod issue, the problem is 
> that the current code will update lastProcessedZxid before it's actually 
> making change for the global sessions in the DataTree.
>  
> In case there is a snapshot taking in progress, and there is a small time 
> stall between set lastProcessedZxid and update the session in DataTree due to 
> reasons like thread context switch or GC, etc, then it's possible the 
> lastProcessedZxid is actually set to the future which doesn't include the 
> global session change (add or remove).
>  
> When reload this snapshot and it's txns, it will replay txns from 
> lastProcessedZxid + 1, so it won't create the global session anymore, which 
> could cause data inconsistent.
>  
> When global sessions are inconsistent, it might have ephemeral inconsistent 
> as well, since the leader will delete all the ephemerals locally if there is 
> no global sessions associated with it, and if someone have snapshot sync with 
> it then that server will not have that ephemeral as well, but others will. It 
> will also have global session renew issue for that problematic session.
>  
> The same issue exist for the closeSession txn, we need to move these global 
> session update logic before processTxn, so the lastProcessedZxid will not 
> miss the global session here.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3137) add a utility to truncate logs to a zxid

2018-09-14 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3137.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 615
[https://github.com/apache/zookeeper/pull/615]

> add a utility to truncate logs to a zxid
> 
>
> Key: ZOOKEEPER-3137
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3137
> Project: ZooKeeper
>  Issue Type: New Feature
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Add a utility that allows an admin to truncate a given transaction log to a 
> specified zxid. This can be similar to the existent LogFormatter. 
> Among the benefits, this allows an admin to put together a point-in-time view 
> of a data tree by manually mutating files from a saved backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-09-12 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613021#comment-16613021
 ] 

Michael Han commented on ZOOKEEPER-3141:


The failure does not reproduce in my stress test job which just ran this single 
test (https://builds.apache.org/job/Zookeeper_UT_sTRESS/). It can be reproduced 
on nightly build on trunk. Likely caused by interference from another test 
case. 

> testLeaderElectionWithDisloyalVoter is flaky
> 
>
> Key: ZOOKEEPER-3141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server, tests
>Affects Versions: 3.6.0
>Reporter: Michael Han
>Priority: Major
>
> The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.
> See 
> [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]
> Recent failure builds:
> [https://builds.apache.org/job/ZooKeeper-trunk//181] 
> [https://builds.apache.org/job/ZooKeeper-trunk//179] 
> [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
>  
>  
> Snapshot of the failure:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority
> Error Message
> Server 0 should have joined quorum by now
> Stacktrace
> junit.framework.AssertionFailedError: Server 0 should have joined quorum by 
> now
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3147) Enable server tracking client information

2018-09-12 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3147:
--

 Summary: Enable server tracking client information
 Key: ZOOKEEPER-3147
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3147
 Project: ZooKeeper
  Issue Type: Improvement
  Components: java client, server
Affects Versions: 3.6.0
Reporter: Michael Han
Assignee: Michael Han


We should consider add fine grained tracking information for clients and send 
these information to server side, which will be useful for debugging and in 
future multi-tenancy support / enforced quota.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3127) Fixing potential data inconsistency due to update last processed zxid with partial multi-op txn

2018-09-12 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16612780#comment-16612780
 ] 

Michael Han commented on ZOOKEEPER-3127:


If there is a conflict, it's expected that the author of the original PR create 
a new PR targeting the new branch. If there is no conflict then committer can 
cherry pick the patch to different branches. It's because the author has best 
knowledge of how to deal with the conflict regardless of it's trivial or not, 
plus a separate PR will test the patch again through precommit jenkins. I think 
this is consistent with what committers was doing in old days when 
contributions coming in as patches; it's expected the original author uploaded 
a new patch to Jira if there was a conflict. 

 

 

> Fixing potential data inconsistency due to update last processed zxid with 
> partial multi-op txn
> ---
>
> Key: ZOOKEEPER-3127
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3127
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Found this issue while checking the code for another issue, this is a 
> relatively rare case which we haven't seen it on prod so far.
> Currently, the lastProcessedZxid is updated when applying the first txn of 
> multi-op, if there is a snapshot in progress, it's possible that the zxid 
> associated with the snapshot only include partial of the multi op.
> When loading snapshot, it will only load the txns after the zxid associated 
> with snapshot file, which could data inconsistency due to missing sub txns.
> To avoid this, we only update the lastProcessedZxid when the whole multi-op 
> txn is applied to DataTree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-2261) When only secureClientPort is configured connections, configuration, connection_stat_reset, and stats admin commands throw NullPointerException

2018-09-11 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-2261:
---
Fix Version/s: 3.5.5

> When only secureClientPort is configured connections, configuration, 
> connection_stat_reset, and stats admin commands throw NullPointerException
> ---
>
> Key: ZOOKEEPER-2261
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2261
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Mohammad Arshad
>Assignee: Andor Molnar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When only secureClientPort is configured connections, configuration, 
> connection_stat_reset and stats admin commands throw NullPointerException. 
> Here is stack trace one of the connections command.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.admin.Commands$ConsCommand.run(Commands.java:177)
>   at 
> org.apache.zookeeper.server.admin.Commands.runCommand(Commands.java:92)
>   at 
> org.apache.zookeeper.server.admin.JettyAdminServer$CommandServlet.doGet(JettyAdminServer.java:166)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2261) When only secureClientPort is configured connections, configuration, connection_stat_reset, and stats admin commands throw NullPointerException

2018-09-11 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16611321#comment-16611321
 ] 

Michael Han commented on ZOOKEEPER-2261:


I cherry picked the commit to 3.5 just now 
([https://github.com/apache/zookeeper/commit/14d0aaab853d535be268a1d7a234c9c47bf4cd25).]
 

> When only secureClientPort is configured connections, configuration, 
> connection_stat_reset, and stats admin commands throw NullPointerException
> ---
>
> Key: ZOOKEEPER-2261
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2261
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.5.0
>Reporter: Mohammad Arshad
>Assignee: Andor Molnar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When only secureClientPort is configured connections, configuration, 
> connection_stat_reset and stats admin commands throw NullPointerException. 
> Here is stack trace one of the connections command.
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.zookeeper.server.admin.Commands$ConsCommand.run(Commands.java:177)
>   at 
> org.apache.zookeeper.server.admin.Commands.runCommand(Commands.java:92)
>   at 
> org.apache.zookeeper.server.admin.JettyAdminServer$CommandServlet.doGet(JettyAdminServer.java:166)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-09-10 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610124#comment-16610124
 ] 

Michael Han commented on ZOOKEEPER-3141:


The address `fee.fii.foo.fum` was from another test case in same file: 
[testBadPeerAddressInQuorum|
https://github.com/apache/zookeeper/blob/master/src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java#L597].
 One possibility is apache Jenkins was running multiple test cases and for some 
reasons, one test case (testBadPeerAddressInQuorum) interferes the other 
(testLeaderElectionWithDisloyalVoter_stillHasMajority). I've seen some flaky 
tests caused by interference between test cases, but this one is new to me.

I set up a stress test on apache jenkins just to run 
testLeaderElectionWithDisloyalVoter_stillHasMajority alone and if the failure 
does not reproduce then it's likely the interferences between test cases are 
the cause. 


> testLeaderElectionWithDisloyalVoter is flaky
> 
>
> Key: ZOOKEEPER-3141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server, tests
>Affects Versions: 3.6.0
>Reporter: Michael Han
>Priority: Major
>
> The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.
> See 
> [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]
> Recent failure builds:
> [https://builds.apache.org/job/ZooKeeper-trunk//181] 
> [https://builds.apache.org/job/ZooKeeper-trunk//179] 
> [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
>  
>  
> Snapshot of the failure:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority
> Error Message
> Server 0 should have joined quorum by now
> Stacktrace
> junit.framework.AssertionFailedError: Server 0 should have joined quorum by 
> now
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-09-10 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609863#comment-16609863
 ] 

Michael Han commented on ZOOKEEPER-3141:


[~lvfangmin] Do you want to take a look at this flaky test introduced by your 
patch in ZOOKEEPER-3109?

> testLeaderElectionWithDisloyalVoter is flaky
> 
>
> Key: ZOOKEEPER-3141
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection, server, tests
>Affects Versions: 3.6.0
>Reporter: Michael Han
>Priority: Major
>
> The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.
> See 
> [https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]
> Recent failure builds:
> [https://builds.apache.org/job/ZooKeeper-trunk//181] 
> [https://builds.apache.org/job/ZooKeeper-trunk//179] 
> [https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
>  
>  
> Snapshot of the failure:
> {code:java}
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority
> Error Message
> Server 0 should have joined quorum by now
> Stacktrace
> junit.framework.AssertionFailedError: Server 0 should have joined quorum by 
> now
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
>   at 
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3141) testLeaderElectionWithDisloyalVoter is flaky

2018-09-10 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3141:
--

 Summary: testLeaderElectionWithDisloyalVoter is flaky
 Key: ZOOKEEPER-3141
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3141
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection, server, tests
Affects Versions: 3.6.0
Reporter: Michael Han


The unit test added in ZOOKEEPER-3109 turns out to be quite flaky.

See 
[https://builds.apache.org/job/zOOkeeper-Find-Flaky-Tests/511/artifact/report.html]

Recent failure builds:

[https://builds.apache.org/job/ZooKeeper-trunk//181] 

[https://builds.apache.org/job/ZooKeeper-trunk//179] 

[https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/2123/testReport/junit/org.apache.zookeeper.server.quorum/QuorumPeerMainTest/testLeaderElectionWithDisloyalVoter_stillHasMajority/]
 

 

Snapshot of the failure:
{code:java}
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority

Error Message
Server 0 should have joined quorum by now
Stacktrace
junit.framework.AssertionFailedError: Server 0 should have joined quorum by now
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElection(QuorumPeerMainTest.java:1482)
at 
org.apache.zookeeper.server.quorum.QuorumPeerMainTest.testLeaderElectionWithDisloyalVoter_stillHasMajority(QuorumPeerMainTest.java:1431)
at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:79)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3129) Improve ZK Client resiliency by throwing a jute.maxbuffer size exception before sending a request to server

2018-09-06 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606608#comment-16606608
 ] 

Michael Han commented on ZOOKEEPER-3129:


I think it's a good ida. I am leaning towards using a new property for this. I 
think this will be a client only property, correct? 

> Improve ZK Client resiliency by throwing a jute.maxbuffer size exception 
> before sending a request to server
> ---
>
> Key: ZOOKEEPER-3129
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3129
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Priority: Major
>
> Zookeeper is mostly operated in controlled environments and the client/server 
> properties are usually known. With this Jira, I would like to propose a new 
> property on client side that represents the max jute buffer size server is 
> going to accept.
> On the ZKClient, in case of multi Op, the request is serialized and hence we 
> know the size of complete packet that will be sent. We can use this new 
> property to determine if the we are exceeding the limit and throw some form 
> of KeeperException. This would be fail fast mechanism and the application can 
> potentially retry by chunking up the request or serializing.
> Since the same properties are now present in two locations, over time, two 
> possibilities can happen.
> -- Server jutebuffer accepts value is more than what is specified on client 
> side
> The application might end up serializing it or zkclient can be made 
> configurable to retry even when it gets this exception
> -- Server jutebuffer accepts value is lower than what is specified on client 
> side
> That would have failed previously as well, so there is no change in behavior
> This would help silent failures like HBASE-18549 getting avoided. 
> Thoughts [~apurtell] [~xucang] [~anmolnar] [~hanm]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3136) Reduce log in ClientBase in case of ConnectException

2018-09-06 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3136.

Resolution: Fixed

Issue resolved by pull request 614
[https://github.com/apache/zookeeper/pull/614]

> Reduce log in ClientBase in case of ConnectException
> 
>
> Key: ZOOKEEPER-3136
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3136
> Project: ZooKeeper
>  Issue Type: Task
>  Components: tests
>Reporter: Enrico Olivelli
>Assignee: Enrico Olivelli
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> While running tests you will always see spammy log lines like the ones below.
> As we are expecting the server to be up, it is not useful to log such 
> stacktraces.
> The patch simply reduce the log in this specific case, because it adds no 
> value and it is very annoying.
>  
> {code:java}
>  [junit] 2018-08-31 23:31:49,173 [myid:] - INFO  [main:ClientBase@292] - 
> server 127.0.0.1:11222 not up
>     [junit] java.net.ConnectException: Connection refused (Connection refused)
>     [junit]     at java.net.PlainSocketImpl.socketConnect(Native Method)
>     [junit]     at 
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
>     [junit]     at 
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
>     [junit]     at 
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>     [junit]     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>     [junit]     at java.net.Socket.connect(Socket.java:589)
>     [junit]     at 
> org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:101)
>     [junit]     at 
> org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:71)
>     [junit]     at 
> org.apache.zookeeper.test.ClientBase.waitForServerUp(ClientBase.java:285)
>     [junit]     at 
> org.apache.zookeeper.test.ClientBase.waitForServerUp(ClientBase.java:276)
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3131) org.apache.zookeeper.server.WatchManager resource leak

2018-09-06 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3131.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 612
[https://github.com/apache/zookeeper/pull/612]

> org.apache.zookeeper.server.WatchManager resource leak
> --
>
> Key: ZOOKEEPER-3131
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3131
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4, 3.6.0
> Environment: -Xmx512m 
>Reporter: ChaoWang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In some cases, the variable _watch2Paths_ in _Class WatchManager_ does not 
> remove the entry, even if the associated value "HashSet" is empty already. 
> The type of key in Map _watch2Paths_ is Watcher, instance of 
> _NettyServerCnxn._ If it is not removed when the associated set of paths is 
> empty, it will cause the memory increases little by little, and 
> OutOfMemoryError triggered finally. 
>  
> {color:#FF}*Possible Solution:*{color}
> In the following function, the logic should be added to remove the entry.
> org.apache.zookeeper.server.WatchManager#removeWatcher(java.lang.String, 
> org.apache.zookeeper.Watcher)
> if (paths.isEmpty())
> { watch2Paths.remove(watcher); }
> For the following function as well:
> org.apache.zookeeper.server.WatchManager#triggerWatch(java.lang.String, 
> org.apache.zookeeper.Watcher.Event.EventType, 
> java.util.Set)
>  
> Please confirm this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3127) Fixing potential data inconsistency due to update last processed zxid with partial multi-op txn

2018-09-05 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3127.

   Resolution: Fixed
Fix Version/s: (was: 3.4.14)
   (was: 3.5.5)

Issue resolved by pull request 606
[https://github.com/apache/zookeeper/pull/606]

> Fixing potential data inconsistency due to update last processed zxid with 
> partial multi-op txn
> ---
>
> Key: ZOOKEEPER-3127
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3127
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.6.0, 3.4.13
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Found this issue while checking the code for another issue, this is a 
> relatively rare case which we haven't seen it on prod so far.
> Currently, the lastProcessedZxid is updated when applying the first txn of 
> multi-op, if there is a snapshot in progress, it's possible that the zxid 
> associated with the snapshot only include partial of the multi op.
> When loading snapshot, it will only load the txns after the zxid associated 
> with snapshot file, which could data inconsistency due to missing sub txns.
> To avoid this, we only update the lastProcessedZxid when the whole multi-op 
> txn is applied to DataTree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3109) Avoid long unavailable time due to voter changed mind when activating the leader during election

2018-08-28 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3109.

Resolution: Fixed

Issue resolved by pull request 588
[https://github.com/apache/zookeeper/pull/588]

> Avoid long unavailable time due to voter changed mind when activating the 
> leader during election
> 
>
> Key: ZOOKEEPER-3109
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3109
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Occasionally, we'll find it takes long time to elect a leader, might longer 
> then 1 minute, depends on how big the initLimit and tickTime are set.
>   
>  This exposes an issue in leader election protocol. During leader election, 
> before the voter goes to the LEADING/FOLLOWING state, it will wait for a 
> finalizeWait time before changing its state. Depends on the order of 
> notifications, some voter might change mind just after it voting for a 
> server. If the server it was previous voting for has majority of votes after 
> considering this one, then that server will goto LEADING state. In some 
> corner cases, the leader may end up with timeout waiting for epoch ACK from 
> majority, because of the changed mind voter. This usually happen when there 
> are even number of servers in the ensemble (either because one of the server 
> is down or being restarted and it takes long time to restart). If there are 5 
> servers in the ensemble, then we'll find two of them in LEADING/FOLLOWING 
> state, another two in LOOKING state, but the LOOKING servers cannot join the 
> quorum since they're waiting for majority servers FOLLOWING the current 
> leader before changing to FOLLOWING as well.
>   
>  As far as we know, this voter will change mind if it received a vote from 
> another host which just started and start to vote itself, or there is a 
> server takes long time to shutdown it's previous ZK server and start to vote 
> itself when starting the leader election process.
>   
>  Also the follower may abandon the leader if the leader is not ready for 
> accepting learner connection when the follower tried to connect to it.
>   
>  To solve this issue, there are multiple options: 
> 1. increase the finalizeWait time
> 2. smartly detect this state on leader and quit earlier
>  
>  The 1st option is straightforward and easier to change, but it will cause 
> longer leader election time in common cases.
>   
>  The 2nd option is more complexity, but it can efficiently solve the problem 
> without sacrificing the performance in common cases. It remembers the first 
> majority servers voting for it, checking if there is anyone changed mind 
> while it's waiting for epoch ACK. The leader will wait for sometime before 
> quitting LEADING state, since one voter changed may not be a problem if there 
> are still majority voters voting for it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3108) deprecated myid file and use a new property "server.id" in the zoo.cfg

2018-08-28 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595906#comment-16595906
 ] 

Michael Han commented on ZOOKEEPER-3108:


I think we can provide an option to move unique identifier of the server from 
myid file to zoo.cfg - thus avoiding creating myid file, but I don't feel this 
approach is much more convenient comparing to the current approach of putting 
the server id in myid file - the unique id still has to be created for each 
server it's just put into a different place. [~maoling] comments?

As others also mentioned, regardless of what additional options we are going to 
add, please keep the current myid approach. It would be a pain for those who 
operates ZK to upgrade if we just abandon the myid file completely. 

>  deprecated myid file and use a new property "server.id" in the zoo.cfg
> ---
>
> Key: ZOOKEEPER-3108
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3108
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>
> When use zk in distributional model,we need to touch a myid file in 
> dataDir.then write a unique number to it.It is inconvenient and not 
> user-friendly,Look at an example from other distribution system such as 
> kafka:it just uses broker.id=0 in the server.properties to indentify a unique 
> server node.This issue is going to abandon the myid file and use a new 
> property such as server.id=0 in the zoo.cfg. this fix will be applied to 
> master branch,branch-3.5+,
> keep branch-3.4 unchaged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3129) Improve ZK Client resiliency by throwing a jute.maxbuffer size exception before sending a request to server

2018-08-28 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594570#comment-16594570
 ] 

Michael Han commented on ZOOKEEPER-3129:


We already have jute.maxbuffer property for client side (and server side). 
Would this property fit the needs here?

> Improve ZK Client resiliency by throwing a jute.maxbuffer size exception 
> before sending a request to server
> ---
>
> Key: ZOOKEEPER-3129
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3129
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Karan Mehta
>Priority: Major
>
> Zookeeper is mostly operated in controlled environments and the client/server 
> properties are usually known. With this Jira, I would like to propose a new 
> property on client side that represents the max jute buffer size server is 
> going to accept.
> On the ZKClient, in case of multi Op, the request is serialized and hence we 
> know the size of complete packet that will be sent. We can use this new 
> property to determine if the we are exceeding the limit and throw some form 
> of KeeperException. This would be fail fast mechanism and the application can 
> potentially retry by chunking up the request or serializing.
> Since the same properties are now present in two locations, over time, two 
> possibilities can happen.
> -- Server jutebuffer accepts value is more than what is specified on client 
> side
> The application might end up serializing it or zkclient can be made 
> configurable to retry even when it gets this exception
> -- Server jutebuffer accepts value is lower than what is specified on client 
> side
> That would have failed previously as well, so there is no change in behavior
> This would help silent failures like HBASE-18549 getting avoided. 
> Thoughts [~apurtell] [~xucang] [~anmolnar] [~hanm]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3124) Remove special logic to handle cversion and pzxid in DataTree.processTxn

2018-08-22 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589477#comment-16589477
 ] 

Michael Han edited comment on ZOOKEEPER-3124 at 8/22/18 11:33 PM:
--

[~lvfangmin] That code was actually added ZOOKEEPER-1046. I think it fixes the 
issue of incorrect cversion of parent caused by deleting some of its children 
after taking snapshot (so the deleted nodes never made into the snapshot which 
caused problems later while replying tx logs); rather than adding children 
after snapshot is serialized. [~fournc] had a detailed analysis on this. Does 
that make sense for you?

 

The comment in the patch that finally landed sounds confusing to me as well.

ZOOKEEPER-1269  just moved the same code from one place to the other.


was (Author: hanm):
[~lvfangmin] That code was actually added ZOOKEEPER-1046. I think it fixes the 
issue of incorrect cversion of parent caused by deleting some of its children 
after taking snapshot (so the deleted nodes never made into the snapshot which 
caused problems later while replying tx logs); rather than adding children 
after snapshot is serialized. [~fournc] had a [detailed 
analysis|https://issues.apache.org/jira/browse/ZOOKEEPER-1046?focusedCommentId=13020441=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13020441]
 on this. Does that make sense for you?

 

The comment in the path that finally landed sounds confusing to me as well.

ZOOKEEPER-1269  just moved the same code from one place to the other.

> Remove special logic to handle cversion and pzxid in DataTree.processTxn
> 
>
> Key: ZOOKEEPER-3124
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3124
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
> Fix For: 3.6.0
>
>
> There is special logic in the DataTree.processTxn to handle the NODEEXISTS 
> when createNode, which is used to handle the cversion and pzxid not being 
> updated due to fuzzy snapshot: 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/DataTree.java#L962-L994.
>  
> But is this a real issue, or is it still an issue for now?
> In the current code, when serializing a parent node, we'll lock on it, and 
> take a children snapshot at that time. If the child added after the parent is 
> serialized to disk, then it won't be written out, so we shouldn't hit the 
> issue where the child is in the snapshot but parent cversion and pzxid is not 
> changed.
>  
>  
> I checked the JIRA ZOOKEEPER-1269 which added this code, it won't hit this 
> issue as well, I'm not sure why we added this, am I missing anything? Can we 
> just get rid of it?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3124) Remove special logic to handle cversion and pzxid in DataTree.processTxn

2018-08-22 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589477#comment-16589477
 ] 

Michael Han edited comment on ZOOKEEPER-3124 at 8/22/18 11:32 PM:
--

[~lvfangmin] That code was actually added ZOOKEEPER-1046. I think it fixes the 
issue of incorrect cversion of parent caused by deleting some of its children 
after taking snapshot (so the deleted nodes never made into the snapshot which 
caused problems later while replying tx logs); rather than adding children 
after snapshot is serialized. [~fournc] had a [detailed 
analysis|https://issues.apache.org/jira/browse/ZOOKEEPER-1046?focusedCommentId=13020441=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-13020441]
 on this. Does that make sense for you?

 

The comment in the path that finally landed sounds confusing to me as well.

ZOOKEEPER-1269  just moved the same code from one place to the other.


was (Author: hanm):
[~lvfangmin] That code was actually added ZOOKEEPER-1046. I think it fixes the 
issue of incorrect cversion of parent caused by deleting some of its children 
after taking snapshot (so the deleted nodes never made into the snapshot which 
caused problems later while replying tx logs); rather than adding children 
after snapshot is serialized. [~fournc] had a detailed analysis on this. Does 
that make sense for you?

 

The comment in the path that finally landed sounds confusing to me as well.

ZOOKEEPER-1269  just moved the same code from one place to the other.

> Remove special logic to handle cversion and pzxid in DataTree.processTxn
> 
>
> Key: ZOOKEEPER-3124
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3124
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
> Fix For: 3.6.0
>
>
> There is special logic in the DataTree.processTxn to handle the NODEEXISTS 
> when createNode, which is used to handle the cversion and pzxid not being 
> updated due to fuzzy snapshot: 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/DataTree.java#L962-L994.
>  
> But is this a real issue, or is it still an issue for now?
> In the current code, when serializing a parent node, we'll lock on it, and 
> take a children snapshot at that time. If the child added after the parent is 
> serialized to disk, then it won't be written out, so we shouldn't hit the 
> issue where the child is in the snapshot but parent cversion and pzxid is not 
> changed.
>  
>  
> I checked the JIRA ZOOKEEPER-1269 which added this code, it won't hit this 
> issue as well, I'm not sure why we added this, am I missing anything? Can we 
> just get rid of it?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3124) Remove special logic to handle cversion and pzxid in DataTree.processTxn

2018-08-22 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16589477#comment-16589477
 ] 

Michael Han commented on ZOOKEEPER-3124:


[~lvfangmin] That code was actually added ZOOKEEPER-1046. I think it fixes the 
issue of incorrect cversion of parent caused by deleting some of its children 
after taking snapshot (so the deleted nodes never made into the snapshot which 
caused problems later while replying tx logs); rather than adding children 
after snapshot is serialized. [~fournc] had a detailed analysis on this. Does 
that make sense for you?

 

The comment in the path that finally landed sounds confusing to me as well.

ZOOKEEPER-1269  just moved the same code from one place to the other.

> Remove special logic to handle cversion and pzxid in DataTree.processTxn
> 
>
> Key: ZOOKEEPER-3124
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3124
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Major
> Fix For: 3.6.0
>
>
> There is special logic in the DataTree.processTxn to handle the NODEEXISTS 
> when createNode, which is used to handle the cversion and pzxid not being 
> updated due to fuzzy snapshot: 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/DataTree.java#L962-L994.
>  
> But is this a real issue, or is it still an issue for now?
> In the current code, when serializing a parent node, we'll lock on it, and 
> take a children snapshot at that time. If the child added after the parent is 
> serialized to disk, then it won't be written out, so we shouldn't hit the 
> issue where the child is in the snapshot but parent cversion and pzxid is not 
> changed.
>  
>  
> I checked the JIRA ZOOKEEPER-1269 which added this code, it won't hit this 
> issue as well, I'm not sure why we added this, am I missing anything? Can we 
> just get rid of it?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3110) Improve the closeSession throughput in PrepRequestProcessor

2018-08-07 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3110.

Resolution: Fixed

> Improve the closeSession throughput in PrepRequestProcessor
> ---
>
> Key: ZOOKEEPER-3110
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3110
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On leader every expired global session will add 3 lines of logs, which is 
> pretty heavy and if the log file is more than a few GB, the log for the 
> closeSession in PrepRequestProcessor will slow down the whole ensemble's 
> throughput. 
> From some use case, we found the prep request processor will be a bottleneck 
> when there are constantly high number of expired session or closing session 
> explicitly.
> This JIra is going to remove one of the useless log when prepare close 
> session txns, which should give us higher throughput during processing large 
> number of expire sessions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3110) Improve the closeSession throughput in PrepRequestProcessor

2018-08-07 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-3110:
---
Fix Version/s: 3.5.5

> Improve the closeSession throughput in PrepRequestProcessor
> ---
>
> Key: ZOOKEEPER-3110
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3110
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> On leader every expired global session will add 3 lines of logs, which is 
> pretty heavy and if the log file is more than a few GB, the log for the 
> closeSession in PrepRequestProcessor will slow down the whole ensemble's 
> throughput. 
> From some use case, we found the prep request processor will be a bottleneck 
> when there are constantly high number of expired session or closing session 
> explicitly.
> This JIra is going to remove one of the useless log when prepare close 
> session txns, which should give us higher throughput during processing large 
> number of expire sessions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-2926) Data inconsistency issue due to the flaw in the session management

2018-08-07 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-2926.

   Resolution: Fixed
Fix Version/s: 3.6.0

Issue resolved by pull request 447
[https://github.com/apache/zookeeper/pull/447]

> Data inconsistency issue due to the flaw in the session management
> --
>
> Key: ZOOKEEPER-2926
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2926
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The local session upgrading feature will upgrade the session locally before 
> receving a quorum commit of creating global session. It's possible that the 
> server shutdown before the creating session request being sent to leader, if 
> we retained the ZKDatabase or there is Snapshot happened just before 
> shutdown, then only this server will have the global session. 
> If that server didn't become leader, then it will have more global sessions 
> than others, and those global sessions won't expire as the leader doesn't 
> know it's existence. If the server became leader, it will accept the client 
> renew session request and the client is allowed to create ephemeral nodes, 
> which means other servers only have ephemeral nodes but not that global 
> session. If there is follower going to have SNAP sync with it, then it will 
> also have the global session. If the server without that global session 
> becomes new leader, it will check and delete those dangling ephemeral node 
> before serving traffic. These could introduce the issues that the ephemeral 
> node being exist on some servers but not others. 
> There is dangling global session issue even without local session feature, 
> because on leader it will update the ZKDatabase when processing 
> ConnectionRequest and in the PrepRequestProcessor before it's quorum 
> committed, which also has this risk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3036) Unexpected exception in zookeeper

2018-07-29 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-3036:
---
Component/s: (was: jmx)
 server
 quorum

> Unexpected exception in zookeeper
> -
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
>Reporter: Oded
>Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR 
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected 
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>     at java.net.SocketInputStream.read(SocketInputStream.java:171)
>     at java.net.SocketInputStream.read(SocketInputStream.java:141)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
>     at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>     at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>     at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE 
> /192.168.0.91:42490 
>  
> We would expect that zookeeper will choose another Leader and the Kafka 
> cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3036) Unexpected exception in zookeeper

2018-07-29 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561417#comment-16561417
 ] 

Michael Han commented on ZOOKEEPER-3036:


What is the issue related to ZooKeeper in this case? When a learner thread dies 
the leader should be able to start another learner thread once the follower / 
observer corresponding to the died learner thread comes back. 

> Unexpected exception in zookeeper
> -
>
> Key: ZOOKEEPER-3036
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3036
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.10
> Environment: 3 Zookeepers, 5 kafka servers
>Reporter: Oded
>Priority: Critical
>
> We got an issue with one of the zookeeprs (Leader), causing the entire kafka 
> cluster to fail:
> 2018-05-09 02:29:01,730 [myid:3] - ERROR 
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@648] - Unexpected 
> exception causing shutdown while sock still open
> java.net.SocketTimeoutException: Read timed out
>     at java.net.SocketInputStream.socketRead0(Native Method)
>     at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>     at java.net.SocketInputStream.read(SocketInputStream.java:171)
>     at java.net.SocketInputStream.read(SocketInputStream.java:141)
>     at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>     at java.io.DataInputStream.readInt(DataInputStream.java:387)
>     at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>     at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>     at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>     at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:559)
> 2018-05-09 02:29:01,730 [myid:3] - WARN  
> [LearnerHandler-/192.168.0.91:42490:LearnerHandler@661] - *** GOODBYE 
> /192.168.0.91:42490 
>  
> We would expect that zookeeper will choose another Leader and the Kafka 
> cluster will continue to work as expected, but that was not the case.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space

2018-07-29 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16561414#comment-16561414
 ] 

Michael Han commented on ZOOKEEPER-3082:


Committed to 3.6. Merge conflicts with branch-3.5, need a separate pull request 
to get this in 3.5.

> Fix server snapshot behavior when out of disk space
> ---
>
> Key: ZOOKEEPER-3082
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.4.12, 3.5.5
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When the ZK server tries to make a snapshot and the machine is out of disk 
> space, the snapshot creation fails and throws an IOException. An empty 
> snapshot file is created, (probably because the server is able to create an 
> entry in the dir) but is not able to write to the file.
>  
> If snapshot creation fails, the server commits suicide. When it restarts, it 
> will do so from the last known good snapshot. However, when it tries to make 
> a snapshot again, the same thing happens. This results in lots of empty 
> snapshot files being created. If eventually the DataDirCleanupManager garbage 
> collects the good snapshot files then only the empty files remain. At this 
> point, the server is well and truly screwed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space

2018-07-29 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3082.

   Resolution: Fixed
Fix Version/s: 3.6.0

> Fix server snapshot behavior when out of disk space
> ---
>
> Key: ZOOKEEPER-3082
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.4.12, 3.5.5
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> When the ZK server tries to make a snapshot and the machine is out of disk 
> space, the snapshot creation fails and throws an IOException. An empty 
> snapshot file is created, (probably because the server is able to create an 
> entry in the dir) but is not able to write to the file.
>  
> If snapshot creation fails, the server commits suicide. When it restarts, it 
> will do so from the last known good snapshot. However, when it tries to make 
> a snapshot again, the same thing happens. This results in lots of empty 
> snapshot files being created. If eventually the DataDirCleanupManager garbage 
> collects the good snapshot files then only the empty files remain. At this 
> point, the server is well and truly screwed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3094) Make BufferSizeTest reliable

2018-07-26 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-3094:
---
Fix Version/s: 3.4.14

> Make BufferSizeTest reliable
> 
>
> Key: ZOOKEEPER-3094
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3094
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.4.0
>Reporter: Mohamed Jeelani
>Assignee: Mohamed Jeelani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Improve reliability of BufferSizeTest. 
> Changes made to the testStartupFailure test to remember the old directory and 
> switch back to it after the test has completed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3094) Make BufferSizeTest reliable

2018-07-26 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3094.

   Resolution: Fixed
Fix Version/s: 3.5.5

Issue resolved by pull request 577
[https://github.com/apache/zookeeper/pull/577]

> Make BufferSizeTest reliable
> 
>
> Key: ZOOKEEPER-3094
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3094
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: tests
>Affects Versions: 3.4.0
>Reporter: Mohamed Jeelani
>Assignee: Mohamed Jeelani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Improve reliability of BufferSizeTest. 
> Changes made to the testStartupFailure test to remember the old directory and 
> switch back to it after the test has completed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3009) Potential NPE in NIOServerCnxnFactory

2018-07-26 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han updated ZOOKEEPER-3009:
---
Fix Version/s: 3.4.14

> Potential NPE in NIOServerCnxnFactory
> -
>
> Key: ZOOKEEPER-3009
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3009
> Project: ZooKeeper
>  Issue Type: Bug
>Affects Versions: 3.6.0, 3.4.12
>Reporter: lujie
>Assignee: lujie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.6.0, 3.5.5, 3.4.14
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Inspired by ZK-3006 , I develop a simple static analysis tool to find other 
> Potential NPE like ZK-3006. This bug is found by this tool ,and I have 
> carefully studied it.  But i am a newbie at here so i may be wrong, hope 
> someone could confirm it and help me improve this tool.
> h2. Bug description:
>  class NIOServerCnxn has three method 
> :getSocketAddress,getRemoteSocketAddress can return null just like :
> {code:java}
> // code placeholder
> if (sock.isOpen() == false) {
>   return null;
> }
> {code}
> some of their caller give null checker, some(total 3 list in below) are not. 
> {code:java}
> // ServerCnxn#getConnectionInfo
> Map info = new LinkedHashMap();
> info.put("remote_socket_address", getRemoteSocketAddress());// Map.put will 
> throw NPE if parameter is null
> //IPAuthenticationProvider#handleAuthentication
> tring id = cnxn.getRemoteSocketAddress().getAddress().getHostAddress();
> cnxn.addAuthInfo(new Id(getScheme(), id));// finally call Set.add(it will 
> throw NPE if parameter is null )
> //NIOServerCnxnFactory#addCnxn
> InetAddress addr = cnxn.getSocketAddress();
> Set set = ipMap.get(addr);// Map.get will throw NPE if 
> parameter is null{code}
> I think we should add null check in above three caller .
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3097) Use Runnable instead of Thread for working items in WorkerService to improve the throughput of CommitProcessor

2018-07-26 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han resolved ZOOKEEPER-3097.

   Resolution: Fixed
Fix Version/s: 3.5.5

Issue resolved by pull request 578
[https://github.com/apache/zookeeper/pull/578]

> Use Runnable instead of Thread for working items in WorkerService to improve 
> the throughput of CommitProcessor
> --
>
> Key: ZOOKEEPER-3097
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3097
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Fangmin Lv
>Assignee: Fangmin Lv
>Priority: Minor
>  Labels: performance, pull-request-available
> Fix For: 3.6.0, 3.5.5
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> CommitProcessor is using this to submit read/write tasks, each task is 
> initialized as a thread, which is heavy, change it to a lighter Runnable 
> object to avoid the overhead of initializing the thread, it shows promised 
> improvement in the CommitProcessor.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3098) Add additional server metrics

2018-07-20 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-3098:
--

Assignee: Joseph Blomstedt

> Add additional server metrics
> -
>
> Key: ZOOKEEPER-3098
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3098
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Joseph Blomstedt
>Assignee: Joseph Blomstedt
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This patch adds several new server-side metrics as well as makes it easier to 
> add new metrics in the future. This patch also includes a handful of other 
> minor metrics-related changes.
> Here's a high-level summary of the changes.
>  # This patch extends the request latency tracked in {{ServerStats}} to track 
> {{read}} and {{update}} latency separately. Updates are any request that must 
> be voted on and can change data, reads are all requests that can be handled 
> locally and don't change data.
>  # This patch adds the {{ServerMetrics}} logic and the related 
> {{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to 
> make it incredibly easy to add new metrics. To add a new metric you just add 
> one line to {{ServerMetrics}} and then directly reference that new metric 
> anywhere in the code base. The {{ServerMetrics}} logic handles creating the 
> metric, properly adding the metric to the JSON output of the {{/monitor}} 
> admin command, and properly resetting the metric when necessary. The 
> motivation behind {{ServerMetrics}} is to make things easy enough that it 
> encourages new metrics to be added liberally. Lack of in-depth 
> metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most 
> of our internal changes build on {{ServerMetrics}} and we have nearly 100 
> internal metrics at this time – all of which we'll be upstreaming in the 
> coming months as we publish more internal patches.
>  # This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
>  # This patch replaces some uses of {{synchronized}} in {{ServerStats}} with 
> atomic operations.
> Here's a list of new metrics added in this patch:
>  - {{uptime}}: time that a peer has been in a stable 
> leading/following/observing state
>  - {{leader_uptime}}: uptime for peer in leading state
>  - {{global_sessions}}: count of global sessions
>  - {{local_sessions}}: count of local sessions
>  - {{quorum_size}}: configured ensemble size
>  - {{synced_observers}}: similar to existing `synced_followers` but for 
> observers
>  - {{fsynctime}}: time to fsync transaction log (avg/min/max)
>  - {{snapshottime}}: time to write a snapshot (avg/min/max)
>  - {{dbinittime}}: time to reload database – read snapshot + apply 
> transactions (avg/min/max)
>  - {{readlatency}}: read request latency (avg/min/max)
>  - {{updatelatency}}: update request latency (avg/min/max)
>  - {{propagation_latency}}: end-to-end latency for updates, from proposal on 
> leader to committed-to-datatree on a given host (avg/min/max)
>  - {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
>  - {{election_time}}: time between entering and leaving election (avg/min/max)
>  - {{looking_count}}: number of transitions into looking state
>  - {{diff_count}}: number of diff syncs performed
>  - {{snap_count}}: number of snap syncs performed
>  - {{commit_count}}: number of commits performed on leader
>  - {{connection_request_count}}: number of incoming client connection requests
>  - {{bytes_received_count}}: similar to existing `packets_received` but 
> tracks bytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3098) Add additional server metrics

2018-07-20 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551364#comment-16551364
 ] 

Michael Han commented on ZOOKEEPER-3098:


[~eolivelli] My thought is we still need a metric interface to hook with 
external reporters - and also we need metrics type definition more than counter 
which is the only type presented in the patch. ZOOKEEPER-3092 is more about the 
general metrics framework infrastructure and the work in this Jira is more 
about actual instrumentation and metrics collection. 

> Add additional server metrics
> -
>
> Key: ZOOKEEPER-3098
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3098
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Joseph Blomstedt
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This patch adds several new server-side metrics as well as makes it easier to 
> add new metrics in the future. This patch also includes a handful of other 
> minor metrics-related changes.
> Here's a high-level summary of the changes.
>  # This patch extends the request latency tracked in {{ServerStats}} to track 
> {{read}} and {{update}} latency separately. Updates are any request that must 
> be voted on and can change data, reads are all requests that can be handled 
> locally and don't change data.
>  # This patch adds the {{ServerMetrics}} logic and the related 
> {{AvgMinMaxCounter}} and {{SimpleCounter}} classes. This code is designed to 
> make it incredibly easy to add new metrics. To add a new metric you just add 
> one line to {{ServerMetrics}} and then directly reference that new metric 
> anywhere in the code base. The {{ServerMetrics}} logic handles creating the 
> metric, properly adding the metric to the JSON output of the {{/monitor}} 
> admin command, and properly resetting the metric when necessary. The 
> motivation behind {{ServerMetrics}} is to make things easy enough that it 
> encourages new metrics to be added liberally. Lack of in-depth 
> metrics/visibility is a long-standing ZooKeeper weakness. At Facebook, most 
> of our internal changes build on {{ServerMetrics}} and we have nearly 100 
> internal metrics at this time – all of which we'll be upstreaming in the 
> coming months as we publish more internal patches.
>  # This patch adds 20 new metrics, 14 which are handled by {{ServerMetrics}}.
>  # This patch replaces some uses of {{synchronized}} in {{ServerStats}} with 
> atomic operations.
> Here's a list of new metrics added in this patch:
>  - {{uptime}}: time that a peer has been in a stable 
> leading/following/observing state
>  - {{leader_uptime}}: uptime for peer in leading state
>  - {{global_sessions}}: count of global sessions
>  - {{local_sessions}}: count of local sessions
>  - {{quorum_size}}: configured ensemble size
>  - {{synced_observers}}: similar to existing `synced_followers` but for 
> observers
>  - {{fsynctime}}: time to fsync transaction log (avg/min/max)
>  - {{snapshottime}}: time to write a snapshot (avg/min/max)
>  - {{dbinittime}}: time to reload database – read snapshot + apply 
> transactions (avg/min/max)
>  - {{readlatency}}: read request latency (avg/min/max)
>  - {{updatelatency}}: update request latency (avg/min/max)
>  - {{propagation_latency}}: end-to-end latency for updates, from proposal on 
> leader to committed-to-datatree on a given host (avg/min/max)
>  - {{follower_sync_time}}: time for follower to sync with leader (avg/min/max)
>  - {{election_time}}: time between entering and leaving election (avg/min/max)
>  - {{looking_count}}: number of transitions into looking state
>  - {{diff_count}}: number of diff syncs performed
>  - {{snap_count}}: number of snap syncs performed
>  - {{commit_count}}: number of commits performed on leader
>  - {{connection_request_count}}: number of incoming client connection requests
>  - {{bytes_received_count}}: similar to existing `packets_received` but 
> tracks bytes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-2504) Enforce that server ids are unique in a cluster

2018-07-20 Thread Michael Han (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Han reassigned ZOOKEEPER-2504:
--

Assignee: Michael Han  (was: Dan Benediktson)

> Enforce that server ids are unique in a cluster
> ---
>
> Key: ZOOKEEPER-2504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Dan Benediktson
>Assignee: Michael Han
>Priority: Major
> Attachments: ZOOKEEPER-2504.patch
>
>
> The leader will happily accept connections from learners that have the same 
> server id (i.e., due to misconfiguration). This can lead to various issues 
> including non-unique session_ids being generated by these servers.
> The leader can enforce that all learners come in with unique server IDs; if a 
> learner attempts to connect with an id that is already in use, it should be 
> denied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2504) Enforce that server ids are unique in a cluster

2018-07-20 Thread Michael Han (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16551122#comment-16551122
 ] 

Michael Han commented on ZOOKEEPER-2504:


This patch still has its value, I am taking it over here.

> Enforce that server ids are unique in a cluster
> ---
>
> Key: ZOOKEEPER-2504
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2504
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Dan Benediktson
>Assignee: Dan Benediktson
>Priority: Major
> Attachments: ZOOKEEPER-2504.patch
>
>
> The leader will happily accept connections from learners that have the same 
> server id (i.e., due to misconfiguration). This can lead to various issues 
> including non-unique session_ids being generated by these servers.
> The leader can enforce that all learners come in with unique server IDs; if a 
> learner attempts to connect with an id that is already in use, it should be 
> denied.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3096) Leader should not leak LearnerHandler threads

2018-07-20 Thread Michael Han (JIRA)
Michael Han created ZOOKEEPER-3096:
--

 Summary: Leader should not leak LearnerHandler threads
 Key: ZOOKEEPER-3096
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3096
 Project: ZooKeeper
  Issue Type: Bug
  Components: quorum, server
Affects Versions: 3.4.13, 3.5.4, 3.6.0
Reporter: Michael Han
Assignee: Michael Han


Currently we don't track LearnerHandler threads in leader; we rely on the 
socket timeout to raise an exception and use that exception as a signal to let 
the LearnerHandler thread kills itself. In cases where the learners restarts, 
if the time between restart beginning to finishing is less than the socket 
timeout value (currently hardcoded as initLimit * tickTime), then there will be 
no exception raised and the previous LearnerHandler thread corresponding to 
this learner will leak.

I have a test case and a proposed fix which I will submit later.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   4   5   6   7   8   9   10   >