[jira] [Created] (ZOOKEEPER-3459) Add admin command to display synced state of peer

2019-07-09 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3459:
--

 Summary: Add admin command to display synced state of peer
 Key: ZOOKEEPER-3459
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3459
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon
Assignee: Brian Nixon


Add another command to the admin server that will respond with the current 
phase of the Zab protocol that a given peer is running. This will help with 
understanding what is going on in an ensemble while it is settling after a 
leader election and with programmatically checking for a healthy "broadcast" 
state.


 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3421) Better insight into Observer connections

2019-06-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3421:
--

 Summary: Better insight into Observer connections
 Key: ZOOKEEPER-3421
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3421
 Project: ZooKeeper
  Issue Type: Wish
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


With the introduction of the Learner Master feature in ZOOKEEPER-3140, tracking 
the state of the Observers synced with the voting quorum became more difficult 
from an operational perspective. Observers could now be synced with any voting 
member and not just the leader and to discover where an observer was being 
hosted required digging in to the server logs or complex jmx queries.

 

Add commands that externalize the state of observers from the point of view of 
the voting quorum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3415) convert internal logic to use java 8 streams

2019-06-05 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3415:
--

 Summary: convert internal logic to use java 8 streams
 Key: ZOOKEEPER-3415
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3415
 Project: ZooKeeper
  Issue Type: Wish
Affects Versions: 3.6.0
Reporter: Brian Nixon


There are a number of places in the code where for loops are used to perform 
basic filtering and collection. The java 8 stream api's make these operations 
much more polished. Since the master branch has been at this language level for 
a while, I'd wish for a (series of) refactor(s) to convert more of these loops 
to streams.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1523) Better logging during instance loading/syncing

2019-05-23 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847110#comment-16847110
 ] 

Brian Nixon commented on ZOOKEEPER-1523:


[~Yohan123] , I have some code stashed by the side that might address the 4LTR 
word portion of this ticket. Give me a chance to clean it up and add it as a PR.

 

Maybe it can be used to bootstrap the requesting logging as well.

> Better logging during instance loading/syncing
> --
>
> Key: ZOOKEEPER-1523
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1523
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum, server
>Affects Versions: 3.3.5
>Reporter: Jordan Zimmerman
>Priority: Critical
>
> When an instance is coming up and loading from snapshot, better logging is 
> needed so an operator knows how long until completion. Also, when syncing 
> with the leader, better logging is needed to know how long until success.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2019-05-22 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846245#comment-16846245
 ] 

Brian Nixon commented on ZOOKEEPER-1147:


[~larsfrancke] - just created ZOOKEEPER-3400 to create some documentation.

> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>Priority: Major
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3400) Add documentation on local sessions

2019-05-22 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3400:
--

 Summary: Add documentation on local sessions
 Key: ZOOKEEPER-3400
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3400
 Project: ZooKeeper
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.6.0, 3.5.6
Reporter: Brian Nixon


ZOOKEEPER-1147 added local sessions (client sessions not ratified by the 
leader) to ZooKeeper as a lightweight augmentation of the existing global 
sessions.

 

Add some outward facing documentation that describes this feature 
([https://zookeeper.apache.org/doc/r3.5.5/zookeeperProgrammers.html#ch_zkSessions]
 seems like a reasonable place).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions

2019-05-20 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844382#comment-16844382
 ] 

Brian Nixon commented on ZOOKEEPER-1147:


Checking the usual places, I don't see any good documentation. The only good 
description of the feature is on this ticket.

 

Seems like an obvious oversight - a new ticket should be created 
([https://zookeeper.apache.org/doc/r3.5.5/zookeeperProgrammers.html#ch_zkSessions]
 seems like a reasonable place to land the feature description).

> Add support for local sessions
> --
>
> Key: ZOOKEEPER-1147
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.3.3
>Reporter: Vishal Kathuria
>Assignee: Thawan Kooburat
>Priority: Major
>  Labels: api-change, scaling
> Fix For: 3.5.0
>
> Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, 
> ZOOKEEPER-1147.patch
>
>   Original Estimate: 840h
>  Remaining Estimate: 840h
>
> This improvement is in the bucket of making ZooKeeper work at a large scale. 
> We are planning on having about a 1 million clients connect to a ZooKeeper 
> ensemble through a set of 50-100 observers. Majority of these clients are 
> read only - ie they do not do any updates or create ephemeral nodes.
> In ZooKeeper today, the client creates a session and the session creation is 
> handled like any other update. In the above use case, the session create/drop 
> workload can easily overwhelm an ensemble. The following is a proposal for a 
> "local session", to support a larger number of connections.
> 1.   The idea is to introduce a new type of session - "local" session. A 
> "local" session doesn't have a full functionality of a normal session.
> 2.   Local sessions cannot create ephemeral nodes.
> 3.   Once a local session is lost, you cannot re-establish it using the 
> session-id/password. The session and its watches are gone for good.
> 4.   When a local session connects, the session info is only maintained 
> on the zookeeper server (in this case, an observer) that it is connected to. 
> The leader is not aware of the creation of such a session and there is no 
> state written to disk.
> 5.   The pings and expiration is handled by the server that the session 
> is connected to.
> With the above changes, we can make ZooKeeper scale to a much larger number 
> of clients without making the core ensemble a bottleneck.
> In terms of API, there are two options that are being considered
> 1. Let the client specify at the connect time which kind of session do they 
> want.
> 2. All sessions connect as local sessions and automatically get promoted to 
> global sessions when they do an operation that requires a global session 
> (e.g. creating an ephemeral node)
> Chubby took the approach of lazily promoting all sessions to global, but I 
> don't think that would work in our case, where we want to keep sessions which 
> never create ephemeral nodes as always local. Option 2 would make it more 
> broadly usable but option 1 would be easier to implement.
> We are thinking of implementing option 1 as the first cut. There would be a 
> client flag, IsLocalSession (much like the current readOnly flag) that would 
> be used to determine whether to create a local session or a global session.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused

2019-05-20 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon resolved ZOOKEEPER-3349.

   Resolution: Not A Problem
Fix Version/s: 3.6.0

> QuorumCnxManager socketTimeout unused
> -
>
> Key: ZOOKEEPER-3349
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
> class. It's clear from the context that it should either be removed entirely 
> or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit 
> can be changed by jmx, I'm thinking that the former is the better solution.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused

2019-05-20 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon reassigned ZOOKEEPER-3349:
--

Assignee: Brian Nixon

> QuorumCnxManager socketTimeout unused
> -
>
> Key: ZOOKEEPER-3349
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
> class. It's clear from the context that it should either be removed entirely 
> or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit 
> can be changed by jmx, I'm thinking that the former is the better solution.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused

2019-05-20 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844371#comment-16844371
 ] 

Brian Nixon commented on ZOOKEEPER-3349:


This parameter is being used again as of ZOOKEEPER-3378. Nothing to do here.

> QuorumCnxManager socketTimeout unused
> -
>
> Key: ZOOKEEPER-3349
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Trivial
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
> class. It's clear from the context that it should either be removed entirely 
> or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit 
> can be changed by jmx, I'm thinking that the former is the better solution.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.

2019-05-20 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844370#comment-16844370
 ] 

Brian Nixon commented on ZOOKEEPER-1000:


That's my take as well. Not sure what else there would be to do here.

> Provide SSL in zookeeper to be able to run cross colos.
> ---
>
> Key: ZOOKEEPER-1000
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Mahadev konar
>Assignee: Mahadev konar
>Priority: Major
> Fix For: 3.6.0, 3.5.6
>
>
> This jira is to track SSL for zookeeper. The inter zookeeper server 
> communication and the client to server communication should be over ssl so 
> that zookeeper can be deployed over WAN's. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3311) Allow a delay to the transaction log flush

2019-05-20 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon reassigned ZOOKEEPER-3311:
--

Assignee: Brian Nixon

> Allow a delay to the transaction log flush 
> ---
>
> Key: ZOOKEEPER-3311
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3311
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The SyncRequestProcessor flushes writes to disk either when 1000 writes are 
> pending to be flushed or when the processor fails to retrieve another write 
> from its incoming queue. The "flush when queue empty" condition operates 
> poorly under many workloads as it can quickly degrade into flushing after 
> every write -- losing all benefits of batching and leading to a continuous 
> stream of flushes + fsyncs which overwhelm the underlying disk.
>  
> A configurable flush delay would ensure flushes do not happen more frequently 
> than once every X milliseconds. This can be used in-place of or jointly with 
> batch size triggered flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-3378) Set the quorum cnxn timeout independently from syncLimit

2019-05-20 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon reassigned ZOOKEEPER-3378:
--

Assignee: Brian Nixon

> Set the quorum cnxn timeout independently from syncLimit
> 
>
> Key: ZOOKEEPER-3378
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If an ensemble requires a high sync limit to support a large data tree or 
> transaction rate, it can cause the QuorumCxnManager to hang over-long in 
> response to quorum events. Using the sync limit for this timeout is a 
> convenience in terms of keeping all failure detection mechanisms in sync but 
> it is not strictly required for correct behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3396) Flaky test in RestoreCommittedLogTest

2019-05-15 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3396:
--

 Summary: Flaky test in RestoreCommittedLogTest
 Key: ZOOKEEPER-3396
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3396
 Project: ZooKeeper
  Issue Type: Improvement
  Components: tests
Affects Versions: 3.6.0
Reporter: Brian Nixon


The patch for ZOOKEEPER-3244 ([https://github.com/apache/zookeeper/pull/770)] 
introduced a flaky test 
RestoreCommittedLogTest::testRestoreCommittedLogWithSnapSize.

 

Get it running consistently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3395) Document individual admin commands in markdown

2019-05-14 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3395:
--

 Summary: Document individual admin commands in markdown
 Key: ZOOKEEPER-3395
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3395
 Project: ZooKeeper
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.6.0, 3.5.6
Reporter: Brian Nixon


The "ZooKeeper Commands" section of the ZooKeeper Administrator's Guide takes 
time to document each four letter command individually but when it comes to the 
admin commands, it just directs the user to query a live peer in order to get 
the supported list (e.g. curl http://localhost:8080/commands). While such a 
query will provide the best source for the admin commands available on a given 
ZooKeeper version, it's not replacement for the role that the central guide 
provides.

Create an enumerated list of the supported admin commands in the section "The 
AdminServer" in the style that the four letter commands are documented in "The 
Four Letter Words".

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3394) Delay observer reconnect when all learner masters have been tried

2019-05-13 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3394:
--

 Summary: Delay observer reconnect when all learner masters have 
been tried
 Key: ZOOKEEPER-3394
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3394
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.6.0
Reporter: Brian Nixon


Observers will disconnect when the voting peers perform a leader election and 
reconnect after. The delay zookeeper.observer.reconnectDelayMs was added to 
insulate the voting peers from the observers returning. With a large number of 
peers and the observerMaster feature active, this delay is mostly detrimental 
as it means that the observer is more likely to get hung up on connecting to a 
bad (down/corrupt) peer and it would be better off switching to a new one 
quickly.

To retain the protective virtue of the delay, it makes sense to add a delay 
that after all observer master's in the list have been tried before iterating 
through the list again. In the case where observer master's are not active, 
this degenerates to a delay between connection attempts on the leader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3392) Add admin command to display last snapshot information

2019-05-13 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3392:
--

 Summary: Add admin command to display last snapshot information
 Key: ZOOKEEPER-3392
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3392
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


Basic systems to backup ZooKeeper data will maintain snapshot files of the data 
tree. In order to understand the health of these systems, they need a way to 
determine how in or out of date their files are to the current state in the 
ensemble.

Add an admin command that exposes the zxid and timestamp of the last 
saved/restored snapshot of the server. This will let such a backup system know 
when it can update and when it is stale.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3388) Allow client port to support plaintext and encrypted connections simultaneously

2019-05-12 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3388:
--

 Summary: Allow client port to support plaintext and encrypted 
connections simultaneously
 Key: ZOOKEEPER-3388
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3388
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


ZOOKEEPER-2125 extended the ZooKeeper server-side to handle encrypted client 
connections by allowing the server to open a second client port (the secure 
client port) to manage this new style of traffic. A server is able to handle 
plaintext and encrypted clients simultaneously by managing each on their 
respective ports. 

When it comes time to get all clients connecting to your system to start using 
encryption, this approach requires that they make two changes simultaneously: 
altering their client properties to start use the secure settings and altering 
the routing information that they provide in order to know where to connect 
with the ensemble. If either is misconfigured then the client is cut off from 
the ensemble. With a large deployment of clients that are owned by a different 
teams and different tools, this presents a danger in activating the feature. 
Ideally, the two changes could be staggered so that first the encryption 
feature is activated and then the routing information is changed in a 
subsequent phase.

Allow the server connection factory managing the regular client port to handle 
both plaintext and encrypted connections. This will be independent of the 
operation of the server connection factory managing the secure client port but 
similar settings ought to apply to both (e.g. cipher suites) to maintain inter 
compatibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3386) Add admin command to display voting view

2019-05-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3386:
--

 Summary: Add admin command to display voting view
 Key: ZOOKEEPER-3386
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3386
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


Solid agreement on the set of voting servers is a necessity for ZooKeeper and 
it's useful to audit that agreement to validate it does not drift into some 
pathological condition.

 

Create an admin command that exposes the ensemble voting members from the point 
of view of the queried server.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3385) Add admin command to display leader

2019-05-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3385:
--

 Summary: Add admin command to display leader
 Key: ZOOKEEPER-3385
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3385
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


Each QuorumPeer prints the identity of the server it believes is the leader in 
its logs but that is not easily turned into diagnostic information about the 
state of the ensemble. It can be useful in debugging various issues, both when 
a quorum is struggling to be established and when a minority of peers are 
failing to follow, to see at a glance which peers are following the leader 
elected by the majority and which peers are either not following or following a 
different server.

Create an admin command that exposes which server a peer believes is the 
current leader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3378) et the quorum cnxn timeout independently from syncLimit

2019-05-07 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3378:
--

 Summary: et the quorum cnxn timeout independently from syncLimit
 Key: ZOOKEEPER-3378
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Reporter: Brian Nixon






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3378) Set the quorum cnxn timeout independently from syncLimit

2019-05-07 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon updated ZOOKEEPER-3378:
---
Description: If an ensemble requires a high sync limit to support a large 
data tree or transaction rate, it can cause the QuorumCxnManager to hang 
over-long in response to quorum events. Using the sync limit for this timeout 
is a convenience in terms of keeping all failure detection mechanisms in sync 
but it is not strictly required for correct behavior.

> Set the quorum cnxn timeout independently from syncLimit
> 
>
> Key: ZOOKEEPER-3378
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Brian Nixon
>Priority: Minor
>
> If an ensemble requires a high sync limit to support a large data tree or 
> transaction rate, it can cause the QuorumCxnManager to hang over-long in 
> response to quorum events. Using the sync limit for this timeout is a 
> convenience in terms of keeping all failure detection mechanisms in sync but 
> it is not strictly required for correct behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3378) Set the quorum cnxn timeout independently from syncLimit

2019-05-07 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon updated ZOOKEEPER-3378:
---
Summary: Set the quorum cnxn timeout independently from syncLimit  (was: et 
the quorum cnxn timeout independently from syncLimit)

> Set the quorum cnxn timeout independently from syncLimit
> 
>
> Key: ZOOKEEPER-3378
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Reporter: Brian Nixon
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-1651) Add support for compressed snapshot

2019-05-01 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831159#comment-16831159
 ] 

Brian Nixon commented on ZOOKEEPER-1651:


This feature was accepted with ZOOKEEPER-3179. I'd suggest marking this ticket 
as resolved.

> Add support for compressed snapshot
> ---
>
> Key: ZOOKEEPER-1651
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1651
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Brian Nixon
>Priority: Major
>
> We want to keep many copies of snapshots on disk so that we can debug the 
> problem afterward. However, the snapshot can be large, so we added a feature 
> that allow the server to dump/load snapshot in a compressed format (snappy or 
> gzip). This also benefit db loading and snapshotting time as well. 
> This is also depends on client workload. In one of our deployment where 
> clients don't compress its data,  we found that snappy compression work best. 
> The snapshot size is reduced from 381M to 65MB. Db loading/and snapshotting 
> time is also reduced by 20%. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (ZOOKEEPER-1651) Add support for compressed snapshot

2019-05-01 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon reassigned ZOOKEEPER-1651:
--

Assignee: Brian Nixon  (was: Thawan Kooburat)

> Add support for compressed snapshot
> ---
>
> Key: ZOOKEEPER-1651
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1651
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Thawan Kooburat
>Assignee: Brian Nixon
>Priority: Major
>
> We want to keep many copies of snapshots on disk so that we can debug the 
> problem afterward. However, the snapshot can be large, so we added a feature 
> that allow the server to dump/load snapshot in a compressed format (snappy or 
> gzip). This also benefit db loading and snapshotting time as well. 
> This is also depends on client workload. In one of our deployment where 
> clients don't compress its data,  we found that snappy compression work best. 
> The snapshot size is reduced from 381M to 65MB. Db loading/and snapshotting 
> time is also reduced by 20%. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3359) Batch commits in the CommitProcessor

2019-04-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3359:
--

 Summary: Batch commits in the CommitProcessor
 Key: ZOOKEEPER-3359
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3359
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.6.0
Reporter: Brian Nixon


Draining a single commit every time the CommitProcessor switches to commit mode 
can add to the backlog of committed messages. Instead, add controls to batch 
and drain multiple commits and limit the number of reads being served. Improves 
commit throughput and adds backpressure on reads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3352) Use LevelDB For Backend

2019-04-09 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813760#comment-16813760
 ] 

Brian Nixon commented on ZOOKEEPER-3352:


We've been curious whether wiring ZooKeeper on top of RocksDB could give 
similar performance benefits that Instagram saw with putting Apache Cassandra 
on top of RocksDB 
([https://instagram-engineering.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latency-d64f86b43589]
 for some details). Something like this ticket that involves abstracting out 
the data storage components would be useful for us.

> Use LevelDB For Backend
> ---
>
> Key: ZOOKEEPER-3352
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3352
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
> Fix For: 4.0.0
>
>
> Use LevelDB for managing data stored in ZK (transaction logs and snapshots).
> https://stackoverflow.com/questions/6779669/does-leveldb-support-java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3354) Improve efficiency of DeleteAllCommand

2019-04-08 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3354:
--

 Summary: Improve efficiency of DeleteAllCommand
 Key: ZOOKEEPER-3354
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3354
 Project: ZooKeeper
  Issue Type: Improvement
  Components: other
Affects Versions: 3.6.0
Reporter: Brian Nixon


The cli DeleteAllCommand internally uses a synchronous iterative formula. This 
can be improved with batching for quicker response time on large subtrees.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3353) Admin commands for showing initial settings

2019-04-08 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3353:
--

 Summary: Admin commands for showing initial settings
 Key: ZOOKEEPER-3353
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3353
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


It can be useful as a sysadmin to know the settings that were initially used to 
configure a given ZooKeeper server. Some of these can be read from the process 
logs and others from the java args in the process description but if, for 
example, the zoo.cfg file used when starting a process up is overwritten 
without the process itself being restarted then it can be difficult to know 
exactly what is currently being run on the jvm.

Produce admin commands (and four-letter commands) to answer these questions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3343) Add a new doc: zookeeperTools.md

2019-04-08 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812869#comment-16812869
 ] 

Brian Nixon commented on ZOOKEEPER-3343:


This will be great, thanks [~maoling]!

> Add a new doc: zookeeperTools.md
> 
>
> Key: ZOOKEEPER-3343
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3343
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: documentation
>Affects Versions: 3.5.4
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> write zookeeper tools[3.7], which includes the:
>    - list all usages of the shells under the zookeeper/bin. (e.g 
> zkTxnLogToolkit.sh,zkCleanup.sh)
>    - benchmark tool
>    - backup tool
>    - test tools:jepsen



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused

2019-04-04 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon updated ZOOKEEPER-3349:
---
Description: 
QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
class. It's clear from the context that it should either be removed entirely or 
invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit can be 
changed by jmx, I'm thinking that the former is the better solution.

 

  was:
QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
class. It's clear from the context that it should either be removed entirely or 
invoked in QuorumCnxManager::setSockOpts.

 


> QuorumCnxManager socketTimeout unused
> -
>
> Key: ZOOKEEPER-3349
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Trivial
>
> QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
> class. It's clear from the context that it should either be removed entirely 
> or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit 
> can be changed by jmx, I'm thinking that the former is the better solution.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused

2019-04-04 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3349:
--

 Summary: QuorumCnxManager socketTimeout unused
 Key: ZOOKEEPER-3349
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349
 Project: ZooKeeper
  Issue Type: New Feature
  Components: quorum
Affects Versions: 3.6.0
Reporter: Brian Nixon


QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the 
class. It's clear from the context that it should either be removed entirely or 
invoked in QuorumCnxManager::setSockOpts.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3318) Add a complete backup mechanism for zookeeper internal

2019-03-28 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804453#comment-16804453
 ] 

Brian Nixon commented on ZOOKEEPER-3318:


In the interest of a compact backup, I would like to see a way to combine a 
fuzzy snapshot and subsequent transaction logs into a single perfect snapshot 
of the data tree. One possible backup solution based on this - start up an 
observer process to pull the data tree live to a new directory then a 
subsequent operation to combine the resultant files into the perfect snapshot.

> Add a complete backup mechanism for zookeeper internal
> --
>
> Key: ZOOKEEPER-3318
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3318
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: other
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>
> We already had some workaround ways for the backup, e.g
> *scenario 1:* just write a cron shell to copy the snapshots periodically. 
> *scenario 2:* use the observer as the role of backup, then write the 
> snapshots to file system. (e.g HDFS)
> this issue is aiming to implement a complete backup mechanism for zookeeper 
> internal:
> the init propose:
> *1*. write a new CLI:snapshot
>  *1.1* 
>  because this CLI may be time-consuming.A confirmation is needed. e.g.
>  [zk: 127.0.0.1:2180(CONNECTED) 0] snapshot backupDataDir
>  Are you sure to exec:snapshot [yes/no]
>  *1.2* 
>  if no parameter, the default backupDataDir is the dataDir. the format of the 
> backup-snapshot is just like: backup_snapshot.f9f82834 with the "backup_" 
> prefix,when recovering,rename backup_snapshot.f9f82834 to 
> snapshot.f9f82834 and move it to the dataDir, then restart the ensemble.
>  *1.3* 
>  don't worry about exposing the takeSnap() api to the client.Look at this two 
> references:
>  https://github.com/etcd-io/etcd/blob/master/clientv3/snapshot/v3_snapshot.go
>  
> https://github.com/xetorthio/jedis/blob/master/src/main/java/redis/clients/jedis/commands/BasicCommands.java#L68
> *2*. 
>  *2.1* 
>  write a new tool/shell: zkBackup.sh which is the reverse proces of the 
> zkCleanup.sh for no-realtime backup
>  *2.2* 
>  write a new tool/shell: zkBackup_v2.sh which calls the api of the takeSnap() 
> for realtime backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3332) TxnLogToolkit should print multi transactions readably

2019-03-25 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801274#comment-16801274
 ] 

Brian Nixon commented on ZOOKEEPER-3332:


This is a really useful change. We have a couple of flags in our LogFormatter 
that lets us optionally inspect the elements of a MultiTxn as well as dump the 
data of each. We never ported them to the new TxnLogToolkit paradigm but we can 
put up a PR of our changes so you can compare. If I forget, ping me to remind 
me.

> TxnLogToolkit should print multi transactions readably
> --
>
> Key: ZOOKEEPER-3332
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3332
> Project: ZooKeeper
>  Issue Type: Improvement
>Reporter: Toshihiro Suzuki
>Assignee: maoling
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, LogFormatter shows multi transactions like the following and it's 
> not readable:
> {code:java}
> 3/23/19 7:35:21 AM UTC session 0x3699141c4080020 cxid 0x21 zxid 0x102d9 
> multi 
> v{s{1,#000292f726d73746f72652f5a4b524d5374617465526f6f742f524d5f5a4b5f46454e43494e475f4c4f434b00010001f0005776f726c640006616e796f6e651c},s{5,#000312f726d73746f72652f5a4b524d5374617465526f6f742f414d524d546f6b656e5365637265744d616e61676572526f6f7400012a108ffe0fffdff92fff15128fff5ff9a731174ffa8ff86ffb40009},s{2,#000292f726d73746f72652f5a4b524d5374617465526f6f742f524d5f5a4b5f46454e43494e475f4c4f434b}}
> {code}
> Like delete and setData as the following, LogFormatter should print multi 
> transactions readably:
> {code:java}
> 3/22/19 7:20:48 AM UTC session 0x2699141c3f70022 cxid 0x885 zxid 0x102cc 
> delete '/hbase-unsecure/region-in-transition/d6694b5f7ec2c45f6096fe373c8a34bc
> 3/22/19 7:20:50 AM UTC session 0x2699141c3f70024 cxid 0x47 zxid 0x102cd 
> setData 
> '/hbase-unsecure/region-in-transition/a9c6dac76ce74812196667ebc01dad51,#0001a726567696f6e7365727665723a313630323035617afffa42ff94ffe81f5042554684123f53595354454d2e434154414c4f472c2c313535333233313233393533352e61396336646163373663653734383132313936363637656263303164616435312e18ffe9ffa8ff98ffa2ff9a2d2228a1c633132362d6e6f6465342e7371756164726f6e2d6c6162732e636f6d10ff947d18ffcbff96ffa2ff9a2d,2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3333) Detect if txnlogs and / or snapshots is deleted under a running ZK instance

2019-03-25 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801160#comment-16801160
 ] 

Brian Nixon commented on ZOOKEEPER-:


I'm assuming that the scenario in mind is a rogue process on your host that is 
deleting files. One thing to note is that until ZOOKEEPER-3318 is completed to 
a reasonable state, people may outsource their backups to an external process - 
in which case the transaction log files and the snapshot files may have their 
lifecycle controlled by something that is not ZooKeeper (and ZooKeeper should 
not die when files disappear).

 

Having a message logged when a .snap or .log file is unexpectedly changed seems 
reasonable. Could also enable a feature by which the deletion of transaction 
logs triggers a snapshot to make sure the data tree would survive a sudden 
restart. I would not kill the server when a transaction log disappears since 
that would remove your one known copy of the data tree (in memory).

 

To implement this, you may be able to reuse the FileChangeWatcher that was 
added for the TLS work or at least copy from its approach.

> Detect if txnlogs and / or snapshots is deleted under a running ZK instance
> ---
>
> Key: ZOOKEEPER-
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0, 3.5.5, 3.4.14
>Reporter: Norbert Kalmar
>Priority: Major
>
> ZK does not notice if txnlogs are deleted from it's dataDir, and it will just 
> keep running, writing txns in the buffer. Than, when ZK is restarted, it will 
> lose all data.
> To reproduce:
> I run a 3 node ZK ensemble, and deleted dataDir for just one instance, than 
> wrote some data. It turns out, it will not write the transaction to disk. ZK 
> stores everything in memory, until it “feels like” it’s time to persist it on 
> disk. So it doesn’t even notice the file is deleted, and when it tried to 
> flush, I imagine it just fails and keeps it in the buffer. 
> So anyway, I restarted the instance, it got the snapshot + latest txn logs 
> from the other nodes, as expected it would. It also wrote them in dataDir, so 
> now every node had the dataDir.
> So deleting from one node is fine (again, as expected, they will sync after a 
> restart).
> Then, I deleted all 3 nodes dataDir under running instances. Until restart, 
> it worked fine (of course I was getting my buffer full, I did not test until 
> the point it got overflowed).
> But after restart, I got a fresh new ZK with all my znodes gone.
> For starter, I think ZK should detect if the file it is appending is removed. 
> What should ZK do? At least give a warning log message. The question should 
> it try to create a new file? Or try to get it from other nodes? Or just fail 
> instantly? Restart itself, see if it can sync?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3331) Automatically add IP authorization for Netty connections

2019-03-22 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3331:
--

 Summary: Automatically add IP authorization for Netty connections
 Key: ZOOKEEPER-3331
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3331
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


NIOServerCnxn automatically adds the client's address as an auth token under 
the "ip" scheme. Extend that functionality to the NettyServerCnxn as well to 
bring parity to the two approaches.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3320) Leader election port stop listen when hostname unresolvable for some time

2019-03-20 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797657#comment-16797657
 ] 

Brian Nixon commented on ZOOKEEPER-3320:


A configurable retry seems like a good idea to me. Either something like 
"election port bind time" or "dns unavailable time" if we want to be more 
general. Do you want to contribute a short diff?

This may also be related to ZOOKEEPER-2982 (or may not, making a note to check 
later).

> Leader election port stop listen when hostname unresolvable for some time 
> --
>
> Key: ZOOKEEPER-3320
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3320
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.10, 3.5.4
>Reporter: Igor Skokov
>Priority: Major
>
> When trying to run Zookeeper 3.5.4 cluster on Kubernetes, I found out that in 
> some circumstances Zookeeper node stop listening on leader election port. 
> This cause unavailability of ZK cluster. 
> Zookeeper deployed  as StatefulSet in term of Kubernetes and has following 
> dynamic configuration:
> {code:java}
> zookeeper-0.zookeeper:2182:2183:participant;2181
> zookeeper-1.zookeeper:2182:2183:participant;2181
> zookeeper-2.zookeeper:2182:2183:participant;2181
> {code}
> Bind address contains DNS name which generated by Kubernetes for each 
> StatefulSet pod.
> These DNS names will become resolvable after container start, but with some 
> delay. That delay cause stopping of leader election port listener in 
> QuorumCnxManager.Listener class.
> Error happens in QuorumCnxManager.Listener "run" method, it tries to bind 
> leader election port to hostname which not resolvable at this moment. Retry 
> count is hard-coded and equals to 3(with backoff of 1 sec). 
> Zookeeper server log contains following errors:
> {code:java}
> 2019-03-17 07:56:04,844 [myid:1] - WARN  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1230] - 
> Unexpected exception
> java.net.SocketException: Unresolved address
>   at java.base/java.net.ServerSocket.bind(ServerSocket.java:374)
>   at java.base/java.net.ServerSocket.bind(ServerSocket.java:335)
>   at org.apache.zookeeper.server.quorum.Leader.(Leader.java:241)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226)
> 2019-03-17 07:56:04,844 [myid:1] - WARN  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1261] - 
> PeerState set to LOOKING
> 2019-03-17 07:56:04,845 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1136] - 
> LOOKING
> 2019-03-17 07:56:04,845 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@893]
>  - New election. My id =  1, proposed zxid=0x0
> 2019-03-17 07:56:04,846 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@687] - Notification: 2 (message 
> format version), 1 (n.leader), 0x0 (n.zxid), 0xf (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config 
> version)
> 2019-03-17 07:56:04,979 [myid:1] - INFO  
> [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@892] - Leaving listener
> 2019-03-17 07:56:04,979 [myid:1] - ERROR 
> [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@894] - As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: zookeeper-0.zookeeper:2183
> {code}
> This error happens on most nodes on cluster start and Zookeeper is unable to 
> form quorum. This will leave cluster in unusable state.
> As I can see, error present on branches 3.4 and 3.5. 
> I think, this error can be fixed by configurable number of retries(instead of 
> hard-coded value of 3). 
> Other way to fix this is removing of max retries at all. Currently, ZK server 
> only stop leader election listener and continue to serve on other ports. 
> Maybe, if leader election halts, we should abort process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3320) Leader election port stop listen when hostname unresolvable for some time

2019-03-19 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796341#comment-16796341
 ] 

Brian Nixon commented on ZOOKEEPER-3320:


This is an interesting error case!

I would expect an issue in QuorumCnxManager to bring the peer down if it cannot 
create the socket but it seems this only occurs with a BindException and not a 
generic SocketException. At the least, I think we ought to fix that.

Looking at this from the opposite direction, can you add the desired delay in 
the startup sequence of your Kubernetes container? My concern is that the 
pattern of "DNS is currently unreliable but will be reliable soon" seems 
specific to the container management and may result in strange behavior when 
applied to other environments.

> Leader election port stop listen when hostname unresolvable for some time 
> --
>
> Key: ZOOKEEPER-3320
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3320
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.4.10, 3.5.4
>Reporter: Igor Skokov
>Priority: Major
>
> When trying to run Zookeeper 3.5.4 cluster on Kubernetes, I found out that in 
> some circumstances Zookeeper node stop listening on leader election port. 
> This cause unavailability of ZK cluster. 
> Zookeeper deployed  as StatefulSet in term of Kubernetes and has following 
> dynamic configuration:
> {code:java}
> zookeeper-0.zookeeper:2182:2183:participant;2181
> zookeeper-1.zookeeper:2182:2183:participant;2181
> zookeeper-2.zookeeper:2182:2183:participant;2181
> {code}
> Bind address contains DNS name which generated by Kubernetes for each 
> StatefulSet pod.
> These DNS names will become resolvable after container start, but with some 
> delay. That delay cause stopping of leader election port listener in 
> QuorumCnxManager.Listener class.
> Error happens in QuorumCnxManager.Listener "run" method, it tries to bind 
> leader election port to hostname which not resolvable at this moment. Retry 
> count is hard-coded and equals to 3(with backoff of 1 sec). 
> Zookeeper server log contains following errors:
> {code:java}
> 2019-03-17 07:56:04,844 [myid:1] - WARN  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1230] - 
> Unexpected exception
> java.net.SocketException: Unresolved address
>   at java.base/java.net.ServerSocket.bind(ServerSocket.java:374)
>   at java.base/java.net.ServerSocket.bind(ServerSocket.java:335)
>   at org.apache.zookeeper.server.quorum.Leader.(Leader.java:241)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226)
> 2019-03-17 07:56:04,844 [myid:1] - WARN  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1261] - 
> PeerState set to LOOKING
> 2019-03-17 07:56:04,845 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1136] - 
> LOOKING
> 2019-03-17 07:56:04,845 [myid:1] - INFO  
> [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@893]
>  - New election. My id =  1, proposed zxid=0x0
> 2019-03-17 07:56:04,846 [myid:1] - INFO  
> [WorkerReceiver[myid=1]:FastLeaderElection@687] - Notification: 2 (message 
> format version), 1 (n.leader), 0x0 (n.zxid), 0xf (n.round), LOOKING 
> (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config 
> version)
> 2019-03-17 07:56:04,979 [myid:1] - INFO  
> [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@892] - Leaving listener
> 2019-03-17 07:56:04,979 [myid:1] - ERROR 
> [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@894] - As I'm leaving 
> the listener thread, I won't be able to participate in leader election any 
> longer: zookeeper-0.zookeeper:2183
> {code}
> This error happens on most nodes on cluster start and Zookeeper is unable to 
> form quorum. This will leave cluster in unusable state.
> As I can see, error present on branches 3.4 and 3.5. 
> I think, this error can be fixed by configurable number of retries(instead of 
> hard-coded value of 3). 
> Other way to fix this is removing of max retries at all. Currently, ZK server 
> only stop leader election listener and continue to serve on other ports. 
> Maybe, if leader election halts, we should abort process.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3318) Add a complete backup mechanism for zookeeper internal

2019-03-18 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795253#comment-16795253
 ] 

Brian Nixon commented on ZOOKEEPER-3318:


This would be great!

> Add a complete backup mechanism for zookeeper internal
> --
>
> Key: ZOOKEEPER-3318
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3318
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: other
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3311) Allow a delay to the transaction log flush

2019-03-13 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3311:
--

 Summary: Allow a delay to the transaction log flush 
 Key: ZOOKEEPER-3311
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3311
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


The SyncRequestProcessor flushes writes to disk either when 1000 writes are 
pending to be flushed or when the processor fails to retrieve another write 
from its incoming queue. The "flush when queue empty" condition operates poorly 
under many workloads as it can quickly degrade into flushing after every write 
-- losing all benefits of batching and leading to a continuous stream of 
flushes + fsyncs which overwhelm the underlying disk.
 
A configurable flush delay would ensure flushes do not happen more frequently 
than once every X milliseconds. This can be used in-place of or jointly with 
batch size triggered flushes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3264) Add a benchmark tool for zookeeper

2019-03-04 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783824#comment-16783824
 ] 

Brian Nixon commented on ZOOKEEPER-3264:


I know that [~breed] at one point was thinking of using the _Java 
Microbenchmark Harness_, this may also be worth exploring.

> Add a benchmark tool for zookeeper
> --
>
> Key: ZOOKEEPER-3264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3264
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: other
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>
> Reference:
> https://github.com/etcd-io/etcd/blob/master/tools/benchmark/cmd/range.go
> https://github.com/antirez/redis/blob/unstable/src/redis-benchmark.c
> https://github.com/phunt/zk-smoketest/blob/master/zk-latencies.py
> https://github.com/brownsys/zookeeper-benchmark/blob/master/src/main/java/edu/brown/cs/zkbenchmark/ZooKeeperBenchmark.java



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3287) admin command to dump currently known ACLs

2019-02-21 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3287:
--

 Summary: admin command to dump currently known ACLs
 Key: ZOOKEEPER-3287
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3287
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


Add a new command to dump the set of ACLs currently applied on the data tree. 

 

Used by an admin to check what controls are being set for an ensemble. A flat 
list with no connection to the data will suffice - will have to think whether 
any details ought to be emitted as a cryptographic hash to preserve secrecy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3257) Merge count and byte update of Stat

2019-01-25 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3257:
--

 Summary: Merge count and byte update of Stat
 Key: ZOOKEEPER-3257
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3257
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


There is duplication of effort when updating the stats. Merge the count update 
and the byte update into one call and simplify the logic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic

2019-01-18 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746687#comment-16746687
 ] 

Brian Nixon commented on ZOOKEEPER-3180:


Creating ZOOKEEPER-3252 as a follow up.

> Add response cache to improve the throughput of read heavy traffic 
> ---
>
> Key: ZOOKEEPER-3180
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Reporter: Fangmin Lv
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> On read heavy use case with large response data size, the serialization of 
> response takes time and added overhead to the GC.
> Add response cache helps improving the throughput we can support, which also 
> reduces the latency in general.
> This Jira is going to implement a LRU cache for the response, which shows 
> some performance gain on some of our production ensembles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3252) Extend the options for the response cache

2019-01-18 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3252:
--

 Summary: Extend the options for the response cache
 Key: ZOOKEEPER-3252
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3252
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Reporter: Brian Nixon


The response cache added in ZOOKEEPER-3180 is fairly bare bones. It does its 
job but there is room for experimentation and improvement. From the issue pull 
request ([https://github.com/apache/zookeeper/pull/684):]
{quote}"the alternate eviction policies you outline and that LinkedHashMap 
allows. I see three reasonable paths here,
{quote} * 
{quote}Merge this pr as it is (perhaps rename LRUCache to just Cache) and open 
a new JIRA to explore future paths.{quote}
 * 
{quote}I add another property that lets one toggle between insertion order and 
access order with the current implementation as the default.{quote}
 * 
{quote}Drop LinkedHashMap entirely and go with something like a guava 
Cache."{quote}

It was merged with path 1 chosen but I remain interested in the optimizations 
that were suggested.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket

2019-01-11 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740916#comment-16740916
 ] 

Brian Nixon commented on ZOOKEEPER-3240:


[~hanm] could it be that the unclosed/unreaped Socket on the Learner side is 
still maintaining its end of the tcp connection correctly via the protocol so 
the Leader is unable to sense the change in Learner status through the status 
of the network connection? I confess that I'm not as knowledgeable about the 
workings of Socket as I need to be to confirm this theory.

> Close socket on Learner shutdown to avoid dangling socket
> -
>
> Key: ZOOKEEPER-3240
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There was a Learner that had two connections to the Leader after that Learner 
> hit an unexpected exception during flush txn to disk, which will shutdown 
> previous follower instance and restart a new one.
>  
> {quote}2018-10-26 02:31:35,568 ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Input/output error
>     at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>     at 
> java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
>     at 
> java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
>     at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - 
> Thread SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - 
> SyncRequestProcessor exited!{quote}
>  
> It is supposed to close the previous socket, but it doesn't seem to be done 
> anywhere in the code. This leaves the socket open with no one reading from 
> it, and caused the queue full and blocked on sender.
>  
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size 
> keeps growing, the JVM heap size on leader keeps growing and added pressure 
> to the GC, and cause high GC time and latency in the quorum.
>  
> The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3244) Add option to snapshot based on log size

2019-01-11 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3244:
--

 Summary: Add option to snapshot based on log size
 Key: ZOOKEEPER-3244
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3244
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Reporter: Brian Nixon


Currently, ZooKeeper only takes snapshot based on the snap count. If the 
workload on an ensemble includes large txns then we'll end up with large amount 
data kept on disk, and might have a low disk space issue. 

Add a maximum limit on the total size of the log files between each snapshot. 
This will change the snap frequency, which means with the same snap retention 
number a server will eat up less disk.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2669) follower failed to reconnect to leader after a network error

2019-01-10 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739751#comment-16739751
 ] 

Brian Nixon commented on ZOOKEEPER-2669:


Is this related to ZOOKEEPER-3240 ?

> follower failed to  reconnect to leader after a network error
> -
>
> Key: ZOOKEEPER-2669
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2669
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum, server
>Affects Versions: 3.4.9
> Environment: CentOS7
>Reporter: Zhenghua Chen
>Priority: Major
>
> We have a zookeeper cluster with 3 nodes named s1, s2, s3
> By mistake, we shut down the ethernet interface of s2, and zk follower  shut 
> down(zk process remains there)
> Later, after ethernet up again, s2 failed to reconnect to leader s3 to be a 
> follower
> follower s2 keeps printing log like this:
> {quote}
> 2017-01-19 16:40:58,956 WARN  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:7181] 
> o.a.z.s.q.Learner - Got zxid 0x320001019f expected 0x1
> 2017-01-19 16:40:58,956 ERROR [SyncThread:1] o.a.z.s.ZooKeeperCriticalThread 
> - Severe unrecoverable error, from thread : SyncThread:1
> java.nio.channels.ClosedChannelException: null
>   at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99)
>   at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:250)
>   at 
> org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:215)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:241)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:219)
>   at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314)
>   at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:470)
>   at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140)
> 2017-01-19 16:40:58,956 INFO  [SyncThread:1] 
> o.a.z.s.ZooKeeperServerListenerImpl - Thread SyncThread:1 exits, error code 1
> 2017-01-19 16:40:58,956 INFO  [SyncThread:1] o.a.z.s.SyncRequestProcessor - 
> SyncRequestProcessor exited!
> 2017-01-19 16:40:58,957 INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:7181] 
> o.a.z.s.q.Learner - shutdown called
> java.lang.Exception: shutdown Follower
>   at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:164)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:850)
> {quote}
> And, leader s3 keeps printing log like this:
> {quote}
> 2017-01-19 16:30:50,452 INFO  [LearnerHandler-/192.168.40.51:35949] 
> o.a.z.s.q.LearnerHandler - Follower sid: 1 : info : 
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@95258f0
> 2017-01-19 16:30:50,452 INFO  [LearnerHandler-/192.168.40.51:35949] 
> o.a.z.s.q.LearnerHandler - Synchronizing with Follower sid: 1 
> maxCommittedLog=0x320001019e minCommittedLog=0x32ffaa 
> peerLastZxid=0x23
> 2017-01-19 16:30:50,453 WARN  [LearnerHandler-/192.168.40.51:35949] 
> o.a.z.s.q.LearnerHandler - Unhandled proposal scenario
> 2017-01-19 16:30:50,453 INFO  [LearnerHandler-/192.168.40.51:35949] 
> o.a.z.s.q.LearnerHandler - Sending SNAP
> 2017-01-19 16:30:50,453 INFO  [LearnerHandler-/192.168.40.51:35949] 
> o.a.z.s.q.LearnerHandler - Sending snapshot last zxid of peer is 0x23 
>  zxid of leader is 0x320001019esent zxid of db as 0x320001019e
> 2017-01-19 16:30:50,461 INFO  [LearnerHandler-/192.168.40.51:35949] 
> o.a.z.s.q.LearnerHandler - Received NEWLEADER-ACK message from 1
> 2017-01-19 16:30:51,738 ERROR [LearnerHandler-/192.168.40.51:35934] 
> o.a.z.s.q.LearnerHandler - Unexpected exception causing shutdown while sock 
> still open
> java.net.SocketTimeoutException: Read timed out
>   at java.net.SocketInputStream.socketRead0(Native Method)
>   at java.net.SocketInputStream.read(SocketInputStream.java:152)
>   at java.net.SocketInputStream.read(SocketInputStream.java:122)
>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>   at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>   at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>   at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99)
>   at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:542)
> {quote}
> we execute netstat, found lots of close wait socket in s2,  and never closed.
> {quote}
> tcp6   10865  0 192.168.40.51:47181 192.168.40.57:7288  
> CLOSE_WAIT  2217/java   
> tcp62576  0 192.168.40.51:57181 

[jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket

2019-01-10 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739750#comment-16739750
 ] 

Brian Nixon commented on ZOOKEEPER-3240:


This may be the same issue as detected in ZOOKEEPER-2669. The two share certain 
similarities but I haven't looked into it.

> Close socket on Learner shutdown to avoid dangling socket
> -
>
> Key: ZOOKEEPER-3240
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Minor
>
> There was a Learner that had two connections to the Leader after that Learner 
> hit an unexpected exception during flush txn to disk, which will shutdown 
> previous follower instance and restart a new one.
>  
> {quote}2018-10-26 02:31:35,568 ERROR 
> [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from 
> thread : SyncThread:3
> java.io.IOException: Input/output error
>     at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
>     at 
> java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
>     at 
> java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
>     at 
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
>     at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
>     at 
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
> 2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - 
> Thread SyncThread:3 exits, error code 1
> 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - 
> SyncRequestProcessor exited!{quote}
>  
> It is supposed to close the previous socket, but it doesn't seem to be done 
> anywhere in the code. This leaves the socket open with no one reading from 
> it, and caused the queue full and blocked on sender.
>  
> Since the LearnerHandler didn't shutdown gracefully, the learner queue size 
> keeps growing, the JVM heap size on leader keeps growing and added pressure 
> to the GC, and cause high GC time and latency in the quorum.
>  
> The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket

2019-01-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3240:
--

 Summary: Close socket on Learner shutdown to avoid dangling socket
 Key: ZOOKEEPER-3240
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


There was a Learner that had two connections to the Leader after that Learner 
hit an unexpected exception during flush txn to disk, which will shutdown 
previous follower instance and restart a new one.
 
{quote}2018-10-26 02:31:35,568 ERROR [SyncThread:3:ZooKeeperCriticalThread@48] 
- Severe unrecoverable error, from thread : SyncThread:3
java.io.IOException: Input/output error
    at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method)
    at 
java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72)
    at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395)
    at 
org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457)
    at 
org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548)
    at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769)
    at 
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246)
    at 
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172)
2018-10-26 02:31:35,568 INFO  [SyncThread:3:ZooKeeperServerListenerImpl@42] - 
Thread SyncThread:3 exits, error code 1
2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - 
SyncRequestProcessor exited!{quote}
 
It is supposed to close the previous socket, but it doesn't seem to be done 
anywhere in the code. This leaves the socket open with no one reading from it, 
and caused the queue full and blocked on sender.
 
Since the LearnerHandler didn't shutdown gracefully, the learner queue size 
keeps growing, the JVM heap size on leader keeps growing and added pressure to 
the GC, and cause high GC time and latency in the quorum.
 
The simple fix is to gracefully shutdown the socket.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3237) Allow IPv6 wildcard address in peer config

2019-01-08 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3237:
--

 Summary: Allow IPv6 wildcard address in peer config
 Key: ZOOKEEPER-3237
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3237
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


ZooKeeper allows a special exception for the IPv4 wildcard, 0.0.0.0, along with 
the loopback addresses. Extend the same treatment to IPv6's wildcard, [::]. 
Otherwise, reconfig will reject commands with the form [::]:.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshots.

2019-01-07 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736548#comment-16736548
 ] 

Brian Nixon commented on ZOOKEEPER-3231:


It might also make sense to more aggressively delete invalid snapshots (in the 
mode of ZOOKEEPER-3082). If its straightforward to identify and purge such 
files then we won't have to worry about deleting valid snapshots in order to 
preserve invalid snapshots.

>  Purge task may lost data when we have many invalid snapshots.
> --
>
> Key: ZOOKEEPER-3231
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.4, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Major
>
> I read the ZooKeeper source code, and I find the purge task use 
> FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does 
> not check whether the snapshots are valid.
> Consider a worse case, a ZooKeeper server may have many invalid snapshots, 
> and when a purge task begins, it will use the zxid in the last snapshot's 
> name to purge old snapshots and transaction logs, then we may lost data. 
> I think we should use FileSnap#findNValidSnapshots(int) instead of 
> FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots, but I 
> am not sure.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3232) make the log of notification about LE more readable

2019-01-03 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733820#comment-16733820
 ] 

Brian Nixon commented on ZOOKEEPER-3232:


seems reasonable to me

> make the log of notification about LE more readable
> ---
>
> Key: ZOOKEEPER-3232
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3232
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection
>Reporter: maoling
>Assignee: maoling
>Priority: Minor
>
> the log of notification about LE is very important to help us to see the 
> process of LE:e.g.
> {code:java}
> 2019-01-01 16:29:27,494 [myid:2] - INFO 
> [WorkerReceiver[myid=2]:FastLeaderElection@595] - Notification: 1 (message 
> format version), 3 (n.leader), 0x60b3dc215 (n.zxid), 0x3 (n.round), FOLLOWING 
> (n.state), 1 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state){code}
> the current log have some problems:
> 1:don't use the placeholder(other:+),don't in the style of k:v(antiman)
> 2.the properties in the logs are very messed(no group by,no order), not easy 
> to read.
> 3.the value about version is HEX which don't have the 0x prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3218) zk server reopened,the interval for observer connect to the new leader is too long,then session expired

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729186#comment-16729186
 ] 

Brian Nixon commented on ZOOKEEPER-3218:


We had similar issues which we addressed by making the polling interval 
configurable. Attaching our patch to this issue (it adds 
"zookeeper.fastleader.minNotificationInterval").

 

> zk server reopened,the interval for observer connect to the new leader is too 
> long,then session expired
> ---
>
> Key: ZOOKEEPER-3218
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3218
> Project: ZooKeeper
>  Issue Type: Bug
> Environment: win7 32bits
> zookeeper 3.4.6、3.4.13
>Reporter: yangoofy
>Priority: Major
>
> two participants、one observer,zkclient connect to observer。
> Then,close the two participants,the zookeeper server cloesed
> Ten seconds later,reopen the two participants,and leader selected
> 
> But the observer can't connect to the new leader immediately。Because in 
> lookForLeader, the observer use blockingQueue(recvqueue)  to offer/poll 
> notifications,when the recvqueue is empty,poll from recvqueue will be 
> blocked,and timeout is 200ms,400ms,800ms60s。
> For example,09:59:59 observer poll notification,recvqueue was empty and 
> timeout was 60s;10:00:00 two participants reopened and reselected;10:00:59 
> observer polled notification,connected to the new leader
> But the maxSessionTimeout default to 40s。The session expired
> -
> Please improve it:observer should connect to the new leader as soon as 
> possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729136#comment-16729136
 ] 

Brian Nixon commented on ZOOKEEPER-2872:


Now that the patch is merged, was there any further work here?

> Interrupted snapshot sync causes data loss
> --
>
> Key: ZOOKEEPER-2872
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.10, 3.5.3, 3.6.0
>Reporter: Brian Nixon
>Priority: Major
>
> There is a way for observers to permanently lose data from their local data 
> tree while remaining members of good standing with the ensemble and 
> continuing to serve client traffic when the following chain of events occurs.
> 1. The observer dies in epoch N from machine failure.
> 2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
> catch up.
> 3. The machine powers off before the snapshot is synced to disc and after 
> some txn's have been logged (depending on the OS, this can happen!).
> 4. The observer comes back a second time and replays its most recent snapshot 
> (epoch <= N) as well as the txn logs (epoch N+1). 
> 5. A diff sync is requested from the leader and the observer broadcasts 
> availability.
> In this scenario, any commits from epoch N that the observer did not receive 
> before it died the first time will never be exposed to the observer and no 
> part of the ensemble will complain. 
> This situation is not unique to observers and can happen to any learner. As a 
> simple fix, fsync-ing the snapshots received from the leader will avoid the 
> case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729138#comment-16729138
 ] 

Brian Nixon commented on ZOOKEEPER-3197:


Password is probably the wrong term for this variable (though it does suggest 
some potential future work). It's more of a checksum that's used in 
reconnection, carries no security weight, and is treated internally as if it 
carries no security weight.

 

[~breed] might be the only one left who knows the full story (it's telling that 
the secret decodes to "Ben is Cool").

 

> Improve documentation in ZooKeeperServer.superSecret
> 
>
> Key: ZOOKEEPER-3197
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197
> Project: ZooKeeper
>  Issue Type: Task
>Reporter: Colm O hEigeartaigh
>Priority: Trivial
>
> A security scan flagged the use of a hard-coded secret 
> (ZooKeeperServer.superSecret) in conjunction with a java Random instance to 
> generate a password:
> byte[] generatePasswd(long id)
> {         Random r = new Random(id ^ superSecret);         byte p[] = 
> new byte[16];         r.nextBytes(p);         return p;     }
> superSecret has the following javadoc:
>  /**
>     * This is the secret that we use to generate passwords, for the moment it
>     * is more of a sanity check.
>     */
> It is unclear from this comment and looking at the code why it is not a 
> security risk. It would be good to update the javadoc along the lines of 
> "Using a hard-coded secret with Random to generate a password is not a 
> security risk because the resulting passwords are used for X, Y, Z and not 
> for authentication etc" or something would be very helpful for anyone else 
> looking at the code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.

2018-12-26 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729134#comment-16729134
 ] 

Brian Nixon commented on ZOOKEEPER-3220:


I believe ZOOKEEPER-2872 addressed the fsyncing part of this issue and 
ZOOKEEPER-3082 added some nice cleanup around 0 size snapshot file. Neither of 
these changes were backported to 3.4 so that suggests one potential path 
forward. Note that backporting ZOOKEEPER-2872 also requires backporting 
ZOOKEEPER-2870.

> The snapshot is not saved to disk and may cause data inconsistency.
> ---
>
> Key: ZOOKEEPER-3220
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.12, 3.4.13
>Reporter: Jiafu Jiang
>Priority: Critical
>
> We known that ZooKeeper server will call fsync to make sure that log data has 
> been successfully saved to disk. But ZooKeeper server does not call fsync to 
> make sure that a snapshot has been successfully saved, which may cause 
> potential problems. Since a close to a file description does not make sure 
> that data is written to disk, see 
> [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details.
>  
> If the snapshot is not successfully  saved to disk, it may lead to data 
> inconsistency. Here is my example, which is also a real problem I have ever 
> met.
> 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the 
> leader.
> 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid.
> 3. The machine of zk1 restarted, and during the reboot,  log(X+1) ~ log Y are 
> saved to log files of both zk2(leader) and zk3(follower).
> 4. After zk1 restarted successfully, it found itself to be a follower, and it 
> began to synchronize data with the leader. The leader sent a snapshot(records 
> from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by 
> calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the 
> method returned, the snapshot data was not saved to disk yet. In fact the 
> snapshot file was created, but the size was 0.
> 5. zk1 finished the synchronization and began to accept new requests from the 
> leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and  
> saved to log file. With fsync zk1 could make sure log data was not lost.
> 6. zk1 restarted again. Since the snapshot's size was 0, it would not be 
> used, therefore zk1 recovered using the log files. But the records from 
> log(X+1) ~ logY were lost ! 
>  
> Sorry for my poor English.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3140) Allow Followers to host Observers

2018-10-12 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648379#comment-16648379
 ] 

Brian Nixon commented on ZOOKEEPER-3140:


A note on future work - it would be cool to see the serialized format of the 
QuorumVerifier (used for the dynamic config files and the like) updated to a 
more extensible form so we can track more topology and port information through 
it. This would give us a lot more flexibility in setting and propagating the 
observer master port, in particular in letting each server publish its own port 
instead of requiring a single static port for the whole ensemble. Would also be 
useful for purposes such as ZOOKEEPER-3166.

I thought there was an existent Jira on this requested change but didn't see 
one in a cursory search. I will create one specifically eventually, if I get 
the time. :)

> Allow Followers to host Observers
> -
>
> Key: ZOOKEEPER-3140
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3140
> Project: ZooKeeper
>  Issue Type: New Feature
>  Components: server
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Observers function simple as non-voting members of the ensemble, sharing the 
> Learner interface with Followers and holding only a slightly difference 
> internal pipeline. Both maintain connections along the quorum port with the 
> Leader by which they learn of all new proposals on the ensemble. 
>  
>  There are benefits to allowing Observers to connect to the Followers to plug 
> into the commit stream in addition to connecting to the Leader. It shifts the 
> burden of supporting Observers off the Leader and allow it to focus on 
> coordinating the commit of writes. This means better performance when the 
> Leader is under high load, particularly high network load such as can happen 
> after a leader election when many Learners need to sync. It also reduces the 
> total network connections maintained on the Leader when there are a high 
> number of observers. One the other end, Observer availability is improved 
> since it will take shorter time for a high number of Observers to finish 
> syncing and start serving client traffic.
>  
>  The current implementation only supports scaling the number of Observers 
> into the hundreds before performance begins to degrade. By opening up 
> Followers to also host Observers, over a thousand observers can be hosted on 
> a typical ensemble without major negative impact under both normal operation 
> and during post-leader election sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3166) Support changing secure port with reconfig

2018-10-12 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648369#comment-16648369
 ] 

Brian Nixon commented on ZOOKEEPER-3166:


It would be cool to see the serialized format of the QuorumVerifier (used for 
the dynamic config files and the like) updated to a more extensible form so we 
can track more topology and port information through it. A key-value json blob 
is one such solution.

> Support changing secure port with reconfig
> --
>
> Key: ZOOKEEPER-3166
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Minor
>
> The reconfig operation supports changing the plaintext client port and client 
> address but, because the secure client port is not encoded in the 
> QuorumVerifier serialization, the secure client port cannot be changed by 
> similar means. Instead, this information can only be changed in the static 
> configuration files and only viewed there.
> Flagging as a place where there's not feature parity between secure client 
> ports and plaintext client ports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (ZOOKEEPER-3166) Support changing secure port with reconfig

2018-10-12 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648369#comment-16648369
 ] 

Brian Nixon edited comment on ZOOKEEPER-3166 at 10/12/18 7:45 PM:
--

It would be cool to see the serialized format of the QuorumVerifier (used for 
the dynamic config files and the like) updated to a more extensible form so we 
can track more topology and port information through it. A key-value json blob 
is one such solution and would also be useful for ZOOKEEPER-3140.


was (Author: nixon):
It would be cool to see the serialized format of the QuorumVerifier (used for 
the dynamic config files and the like) updated to a more extensible form so we 
can track more topology and port information through it. A key-value json blob 
is one such solution.

> Support changing secure port with reconfig
> --
>
> Key: ZOOKEEPER-3166
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3166
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: quorum
>Affects Versions: 3.6.0
>Reporter: Brian Nixon
>Priority: Minor
>
> The reconfig operation supports changing the plaintext client port and client 
> address but, because the secure client port is not encoded in the 
> QuorumVerifier serialization, the secure client port cannot be changed by 
> similar means. Instead, this information can only be changed in the static 
> configuration files and only viewed there.
> Flagging as a place where there's not feature parity between secure client 
> ports and plaintext client ports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3166) Support changing secure port with reconfig

2018-10-12 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3166:
--

 Summary: Support changing secure port with reconfig
 Key: ZOOKEEPER-3166
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3166
 Project: ZooKeeper
  Issue Type: Improvement
  Components: quorum
Affects Versions: 3.6.0
Reporter: Brian Nixon


The reconfig operation supports changing the plaintext client port and client 
address but, because the secure client port is not encoded in the 
QuorumVerifier serialization, the secure client port cannot be changed by 
similar means. Instead, this information can only be changed in the static 
configuration files and only viewed there.

Flagging as a place where there's not feature parity between secure client 
ports and plaintext client ports.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3142) Extend SnapshotFormatter to dump data in json format

2018-09-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3142:
--

 Summary: Extend SnapshotFormatter to dump data in json format
 Key: ZOOKEEPER-3142
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3142
 Project: ZooKeeper
  Issue Type: Improvement
Affects Versions: 3.6.0
Reporter: Brian Nixon


Json format can be chained into other tools such as ncdu. Extend the 
SnapshotFormatter functionality to dump json.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3140) Allow Followers to host Observers

2018-09-07 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3140:
--

 Summary: Allow Followers to host Observers
 Key: ZOOKEEPER-3140
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3140
 Project: ZooKeeper
  Issue Type: New Feature
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


Observers function simple as non-voting members of the ensemble, sharing the 
Learner interface with Followers and holding only a slightly difference 
internal pipeline. Both maintain connections along the quorum port with the 
Leader by which they learn of all new proposals on the ensemble. 
 
 There are benefits to allowing Observers to connect to the Followers to plug 
into the commit stream in addition to connecting to the Leader. It shifts the 
burden of supporting Observers off the Leader and allow it to focus on 
coordinating the commit of writes. This means better performance when the 
Leader is under high load, particularly high network load such as can happen 
after a leader election when many Learners need to sync. It also reduces the 
total network connections maintained on the Leader when there are a high number 
of observers. One the other end, Observer availability is improved since it 
will take shorter time for a high number of Observers to finish syncing and 
start serving client traffic.
 
 The current implementation only supports scaling the number of Observers into 
the hundreds before performance begins to degrade. By opening up Followers to 
also host Observers, over a thousand observers can be hosted on a typical 
ensemble without major negative impact under both normal operation and during 
post-leader election sync.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3137) add a utility to truncate logs to a zxid

2018-08-31 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3137:
--

 Summary: add a utility to truncate logs to a zxid
 Key: ZOOKEEPER-3137
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3137
 Project: ZooKeeper
  Issue Type: New Feature
Affects Versions: 3.6.0
Reporter: Brian Nixon


Add a utility that allows an admin to truncate a given transaction log to a 
specified zxid. This can be similar to the existent LogFormatter. 

Among the benefits, this allows an admin to put together a point-in-time view 
of a data tree by manually mutating files from a saved backup.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3131) org.apache.zookeeper.server.WatchManager resource leak

2018-08-28 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595645#comment-16595645
 ] 

Brian Nixon commented on ZOOKEEPER-3131:


I'm not sure I follow what you're proposing. Do you want to put up a pull 
request with your suggested changes and we can discuss there?

> org.apache.zookeeper.server.WatchManager resource leak
> --
>
> Key: ZOOKEEPER-3131
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3131
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
> Environment: -Xmx512m 
>Reporter: ChaoWang
>Priority: Major
>
> In some cases, the variable _watch2Paths_ in _Class WatchManager_ does not 
> remove the entry, even if the associated value "HashSet" is empty already. 
> The type of key in Map _watch2Paths_ is Watcher, instance of 
> _NettyServerCnxn._ If it is not removed when the associated set of paths is 
> empty, it will cause the memory increases little by little, and 
> OutOfMemoryError triggered finally. 
>  
> {color:#FF}*Possible Solution:*{color}
> In the following function, the logic should be added to remove the entry.
> org.apache.zookeeper.server.WatchManager#removeWatcher(java.lang.String, 
> org.apache.zookeeper.Watcher)
> if (paths.isEmpty())
> { watch2Paths.remove(watcher); }
> For the following function as well:
> org.apache.zookeeper.server.WatchManager#triggerWatch(java.lang.String, 
> org.apache.zookeeper.Watcher.Event.EventType, 
> java.util.Set)
>  
> Please confirm this issue?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-706) large numbers of watches can cause session re-establishment to fail

2018-08-28 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595642#comment-16595642
 ] 

Brian Nixon commented on ZOOKEEPER-706:
---

I agree that the C client is still vulnerable to this - please do put up a 
patch if you have the time.

> large numbers of watches can cause session re-establishment to fail
> ---
>
> Key: ZOOKEEPER-706
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-706
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: c client, java client
>Affects Versions: 3.1.2, 3.2.2, 3.3.0
>Reporter: Patrick Hunt
>Assignee: Chris Thunes
>Priority: Critical
> Fix For: 3.4.7, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-706-branch-34.patch, 
> ZOOKEEPER-706-branch-34.patch, ZOOKEEPER-706.patch, ZOOKEEPER-706.patch, 
> ZOOKEEPER-706.patch
>
>
> If a client sets a large number of watches the "set watches" operation during 
> session re-establishment can fail.
> for example:
>  WARN  [NIOServerCxn.Factory:22801:NIOServerCnxn@417] - Exception causing 
> close of session 0xe727001201a4ee7c due to java.io.IOException: Len error 
> 4348380
> in this case the client was a web monitoring app and had set both data and 
> child watches on > 32k znodes.
> there are two issues I see here we need to fix:
> 1) handle this case properly (split up the set watches into multiple calls I 
> guess...)
> 2) the session should have expired after the "timeout". however we seem to 
> consider any message from the client as re-setting the expiration on the 
> server side. Probably we should only consider messages from the client that 
> are sent during an established session, otherwise we can see this situation 
> where the session is not established however the session is not expired 
> either. Perhaps we should create another JIRA for this particular issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3115) Delete snapshot file on error

2018-08-08 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3115:
--

 Summary: Delete snapshot file on error
 Key: ZOOKEEPER-3115
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3115
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


ZOOKEEPER-3082 guards against one particular failure mode that can cause a 
corrupt snapshot, when a empty file is created with a valid snapshot file name. 
All other instances of IOException when writing the snapshot are simply allowed 
to propagate up the stack.

One idea that came up during review 
([https://github.com/apache/zookeeper/pull/560)] was whether we would ever want 
to leave a snapshot file on disk when an IOException is thrown. Clearly 
something has gone wrong at this point and rather than leave a potentially 
corrupt file, we can delete it and trust the transaction log when restoring the 
necessary transactions.

It would be great to modify FileTxnSnapLog::save to delete snapshot files more 
often on exceptions - provided that there's a way to identify when the file in 
that case is needed or corrupt.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space

2018-08-01 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565860#comment-16565860
 ] 

Brian Nixon commented on ZOOKEEPER-3082:


[~andorm] my (possibly incorrect) read on ZOOKEEPER-1621 is that the issue is 
related to this one but not strictly a subset. Here we've removed the 
possibility of the snapshot side of recovery being lost during a disk-full 
event. There, the issue seems to be in ensuring the transaction log side of 
recovery is not corrupted by writing empty/incomplete log files. That issue 
will continue to be present even with the patch from this file applied.

> Fix server snapshot behavior when out of disk space
> ---
>
> Key: ZOOKEEPER-3082
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.6.0, 3.4.12, 3.5.5
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.6.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> When the ZK server tries to make a snapshot and the machine is out of disk 
> space, the snapshot creation fails and throws an IOException. An empty 
> snapshot file is created, (probably because the server is able to create an 
> entry in the dir) but is not able to write to the file.
>  
> If snapshot creation fails, the server commits suicide. When it restarts, it 
> will do so from the last known good snapshot. However, when it tries to make 
> a snapshot again, the same thing happens. This results in lots of empty 
> snapshot files being created. If eventually the DataDirCleanupManager garbage 
> collects the good snapshot files then only the empty files remain. At this 
> point, the server is well and truly screwed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3108) deprecated myid file and use a new property "server.id" in the zoo.cfg

2018-08-01 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565814#comment-16565814
 ] 

Brian Nixon commented on ZOOKEEPER-3108:


This seems like a good idea to me (provided myid files are still supported) to 
give admins a bit more flexibility.

One reason I can think of to keep using a separate myid file is that the server 
id is the one property guaranteed to be unique for a given peer across the 
ensemble. All other properties and jvm flags may be identical across every 
instance. This makes reasoning about configuration files very easy - one simply 
propagates the same file everywhere and no custom logic is needed when 
comparing them.

Here's a link to an old discussion around myid -> 
http://zookeeper-user.578899.n2.nabble.com/The-idea-behind-myid-td3711269.html

>  deprecated myid file and use a new property "server.id" in the zoo.cfg
> ---
>
> Key: ZOOKEEPER-3108
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3108
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.5.0
>Reporter: maoling
>Assignee: maoling
>Priority: Major
>
> When use zk in distributional model,we need to touch a myid file in 
> dataDir.then write a unique number to it.It is inconvenient and not 
> user-friendly,Look at an example from other distribution system such as 
> kafka:it just uses broker.id=0 in the server.properties to indentify a unique 
> server node.This issue is going to abandon the myid file and use a new 
> property such as server.id=0 in the zoo.cfg. this fix will be applied to 
> master branch,branch-3.5+,
> keep branch-3.4 unchaged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3083) Remove some redundant and noisy log lines

2018-07-05 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3083:
--

 Summary: Remove some redundant and noisy log lines
 Key: ZOOKEEPER-3083
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3083
 Project: ZooKeeper
  Issue Type: Improvement
  Components: server
Affects Versions: 3.6.0
Reporter: Brian Nixon


Under high client turnover, some log lines around client activity generate an 
outsized amount of noise in the log files. Reducing a few to debug level won't 
cause a big hit on admin understanding as there are redundant elements.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space

2018-07-05 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3082:
--

 Summary: Fix server snapshot behavior when out of disk space
 Key: ZOOKEEPER-3082
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.4.12, 3.6.0, 3.5.5
Reporter: Brian Nixon


When the ZK server tries to make a snapshot and the machine is out of disk 
space, the snapshot creation fails and throws an IOException. An empty snapshot 
file is created, (probably because the server is able to create an entry in the 
dir) but is not able to write to the file.
 
If snapshot creation fails, the server commits suicide. When it restarts, it 
will do so from the last known good snapshot. However, when it tries to make a 
snapshot again, the same thing happens. This results in lots of empty snapshot 
files being created. If eventually the DataDirCleanupManager garbage collects 
the good snapshot files then only the empty files remain. At this point, the 
server is well and truly screwed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-3068) Improve C client logging of IPv6 hosts

2018-06-22 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-3068:
--

 Summary: Improve C client logging of IPv6 hosts
 Key: ZOOKEEPER-3068
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3068
 Project: ZooKeeper
  Issue Type: Improvement
  Components: c client
Affects Versions: 3.6.0
Reporter: Brian Nixon
Assignee: Brian Nixon


The C client formats host-port pairings as [host:port] when logging. This is 
visually confusing when the host is an IPv6 address (see the below). In that 
case, it would be preferable to cleanly separate the IPv6 from the port. 
{code:java}
ZOO_INFO@check_events@2736: initiated connection to server 
[2401:db00:1020:40bf:face:0:5:0:2181]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file

2018-06-21 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519839#comment-16519839
 ] 

Brian Nixon commented on ZOOKEEPER-3056:


As a work-around for anyone currently blocked on this issue, I've uploaded a 
empty snapshot file here.

To perform an upgrade (3.4 -> 3.5):
 * download the "snapshot.0" file attached
 * copy it to the versioned directory (e.g. "version-2") within your data 
directory (parameter "dataDir" in your config - this is the directory 
containing the "myid" file for a peer)
 * restart the peer
 * upgrade the peer (this can be combined with the above step if you like)

 

> Fails to load database with missing snapshot file but valid transaction log 
> file
> 
>
> Key: ZOOKEEPER-3056
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
>Reporter: Michael Han
>Priority: Critical
> Attachments: snapshot.0
>
>
> [An 
> issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E]
>  was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing 
> snapshot file.
> The code complains about missing snapshot file is 
> [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
>  which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the 
> transaction log files are present and valid. This could be a problem for 
> restoring a ZK instance which does not have a snapshot file but have a sound 
> state (e.g. it crashes before being able to take the first snap shot with a 
> large snapCount parameter configured).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file

2018-06-21 Thread Brian Nixon (JIRA)


 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brian Nixon updated ZOOKEEPER-3056:
---
Attachment: snapshot.0

> Fails to load database with missing snapshot file but valid transaction log 
> file
> 
>
> Key: ZOOKEEPER-3056
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
>Reporter: Michael Han
>Priority: Critical
> Attachments: snapshot.0
>
>
> [An 
> issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E]
>  was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing 
> snapshot file.
> The code complains about missing snapshot file is 
> [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
>  which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the 
> transaction log files are present and valid. This could be a problem for 
> restoring a ZK instance which does not have a snapshot file but have a sound 
> state (e.g. it crashes before being able to take the first snap shot with a 
> large snapCount parameter configured).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2873) print error and/or abort on invalid server definition

2018-06-18 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516514#comment-16516514
 ] 

Brian Nixon commented on ZOOKEEPER-2873:


With no one commenting in almost a year, this issue strikes me as fair game for 
anyone to patch.

> print error and/or abort on invalid server definition
> -
>
> Key: ZOOKEEPER-2873
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2873
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: server
>Affects Versions: 3.4.10
>Reporter: Christopher Smith
>Assignee: Mark Fenes
>Priority: Minor
>
> While bringing up a new cluster, I managed to fat-finger a sed script and put 
> some lines like this into my config file:
> {code}
> server.1=zookeeper1:2888:2888
> {code}
> This led to a predictable spew of error messages when the client and election 
> components fought over the single port. Since a configuration of this case is 
> *always* an error, I suggest that it would be sensible to abort the server 
> startup if an entry is found with the same port for both client and election. 
> (Logging the error explicitly without shutting down is less helpful because 
> of how fast the logs pile up.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2421) testSessionReuse is commented out

2018-06-10 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507670#comment-16507670
 ] 

Brian Nixon commented on ZOOKEEPER-2421:


[~prasanthm] This is an ancient test so someone who was involved in the project 
pre-2008 may have to provide some context around it and it's purpose. That's 
not me but I can say a bit after looking at the SessionTest file.

Session id reuse as such is not allowed in current ZooKeeper. There are two 
ways that this test could now go. One is to make sure that the second client 
can *not* use that session id, that there is no state retained server-side that 
allows a reuse after close. The second is to change it to a session moved style 
test but I think this scenario is already covered in testSession and 
testSessionMove.

If you don't see a useful way of reintroducing the test after a bit of poking, 
I'd say to put up a pull request removing it entirely and see if it gets 
accepted - it simply may no longer be meaningful.

> testSessionReuse is commented out
> -
>
> Key: ZOOKEEPER-2421
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2421
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Flavio Junqueira
>Assignee: Prasanth Mathialagan
>Priority: Major
>
> This test case in SessionTest:
> {noformat}
>testSessionReuse
> {noformat}
> is commented out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file

2018-06-08 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506687#comment-16506687
 ] 

Brian Nixon commented on ZOOKEEPER-3056:


[~mmerli] That's a very reasonable concern and I'd ideally have all upgrades be 
seamless in exactly the way you describe. Property gating the validation is 
only undesirable from a proliferation of config point of view.

[~hanm] I think the signal file is a very workable approach and pretty 
straightforward to implement. The first intervention that I scoped out (create 
a snapshot.0) was inspired by yours as it simplifies the path of "signal file" 
to "database load with trust in the transaction log" to "create snapshot, 
delete signal file". -- It's a trade-off between admin time and server side 
code complexity for sure.

In order of decreasing seamlessness/admin time:
 * property flag snapshot validation (default off)
 * property flag snapshot validation (default on)
 * signal file
 * admin script to create a snapshot.0 file in the snapshot directory
 * upgrade notes to create a snapshot.0 file in the snapshot directory

For the use cases that we maintain, it's far more likely that being unable to 
load a snapshot indicates corruption or machine malfeasance than a legitimate 
database so I'd like to expand that impression with more information from the 
community. Is a snapshot-less db expected/unremarkable under some reasonable 
workloads or is it something worth (politely) discouraging? I do believe 
ZOOKEEPER-2325 is a good feature and it would be a shame to set it off by 
default.

> Fails to load database with missing snapshot file but valid transaction log 
> file
> 
>
> Key: ZOOKEEPER-3056
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
>Reporter: Michael Han
>Priority: Critical
>
> [An 
> issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E]
>  was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing 
> snapshot file.
> The code complains about missing snapshot file is 
> [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
>  which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the 
> transaction log files are present and valid. This could be a problem for 
> restoring a ZK instance which does not have a snapshot file but have a sound 
> state (e.g. it crashes before being able to take the first snap shot with a 
> large snapCount parameter configured).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file

2018-06-08 Thread Brian Nixon (JIRA)


[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506419#comment-16506419
 ] 

Brian Nixon commented on ZOOKEEPER-3056:


We have not run an ensemble without some form of ZOOKEEPER-2325 in years, as 
such we always have snapshots available and ZooKeeper being unable to load a 
valid snapshot is a sign that something is very wrong.

>From my read on the mail thread there are two questions that we're trying to 
>answer:
- how to update ensembles without snapshots from a pre-2325 to a post-2325 state
- what constitutes a stable db (and what role a snapshot plays in that)

The second ought to take more thought so I'll follow up on that after 
considering it.

Two possible interventions on the first:
- the base snapshot is very small and simple, one could copy/create a 
snapshot.0 file to the appropriate directory before upgrade
- property gate the entire "-1L == deserializeResult" conditional block in 3.4, 
3.5, and master to allow a snapshot-less db. To the extent that we agree that 
snapshot-less is a degenerate mode, we also add a 4 letter or admin command to 
create a snapshot on demand (allowing the admin to quickly move out of this 
state post-upgrade)

> Fails to load database with missing snapshot file but valid transaction log 
> file
> 
>
> Key: ZOOKEEPER-3056
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.3, 3.5.4
>Reporter: Michael Han
>Priority: Critical
>
> [An 
> issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E]
>  was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing 
> snapshot file.
> The code complains about missing snapshot file is 
> [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206]
>  which is introduced as part of ZOOKEEPER-2325.
> With this check, ZK will not load the db without a snapshot file, even the 
> transaction log files are present and valid. This could be a problem for 
> restoring a ZK instance which does not have a snapshot file but have a sound 
> state (e.g. it crashes before being able to take the first snap shot with a 
> large snapCount parameter configured).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2988) NPE triggered if server receives a vote for a server id not in their voting view

2018-04-30 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458844#comment-16458844
 ] 

Brian Nixon commented on ZOOKEEPER-2988:


[~hanm] thanks for merging it to 3.5 too, I'll shut down that second pr. This 
bug is applicable to 3.4 as well - imo it's a worse danger on that branch since 
its easy for configuration files to be stale. I'll fix up pr 478 for the 3.4 
branch to reflect the comments I got on pr 476 so it can be ready for review.

> NPE triggered if server receives a vote for a server id not in their voting 
> view
> 
>
> Key: ZOOKEEPER-2988
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2988
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: leaderElection
>Affects Versions: 3.5.3, 3.4.11
>Reporter: Brian Nixon
>Assignee: Brian Nixon
>Priority: Minor
> Fix For: 3.5.4, 3.6.0
>
>
> We've observed the following behavior in elections when a node is lagging 
> behind the quorum in its view of the ensemble topology.
> - Node A is operating with node B in its voting view, but without view of 
> node C.
> - B votes for C.
> - A then switches its vote to C, but throws a NPE when attempting to connect.
> This causes the QuorumPeer to spin up a Follower only to immediately have it 
> shutdown by the exception.
> Ideally, A would not advertise a vote for a server that it will not follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZOOKEEPER-2988) NPE triggered if server receives a vote for a server id not in their voting view

2018-03-01 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-2988:
--

 Summary: NPE triggered if server receives a vote for a server id 
not in their voting view
 Key: ZOOKEEPER-2988
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2988
 Project: ZooKeeper
  Issue Type: Bug
  Components: leaderElection
Affects Versions: 3.4.11, 3.5.3
Reporter: Brian Nixon


We've observed the following behavior in elections when a node is lagging 
behind the quorum in its view of the ensemble topology.

- Node A is operating with node B in its voting view, but without view of node 
C.

- B votes for C.

- A then switches its vote to C, but throws a NPE when attempting to connect.

This causes the QuorumPeer to spin up a Follower only to immediately have it 
shutdown by the exception.

Ideally, A would not advertise a vote for a server that it will not follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ZOOKEEPER-2357) Unhandled errors propagating through cluster

2017-11-05 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239715#comment-16239715
 ] 

Brian Nixon commented on ZOOKEEPER-2357:


Not an expert but a couple things pop out to me. One, the WARN messages are 
what you expect when a follower loses contact with the leader. Two, 50 seconds 
to sync the txn log is a long time.

I don't know what the SyncThread of the FileTxnLog is blocking but it could be 
the case that the data load is impacting the server-server communication.


> Unhandled errors propagating through cluster
> 
>
> Key: ZOOKEEPER-2357
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2357
> Project: ZooKeeper
>  Issue Type: Task
>  Components: leaderElection, quorum, server
>Affects Versions: 3.4.6
>Reporter: Gareth Humphries
>Priority: Minor
>
> Hi,
> I need some help understanding a recurring problem we're seeing with our 
> zookeeper cluster.  It's a five node cluster that ordinarily runs fine.  
> Occasionally we see an error from which the cluster recovers, but it causes a 
> lot of grief and I'm sure is representative of an unhealthy situation.
> To my eye it looks like an invalid bit of data getting into the system and 
> not being handled gracefully; I'm the first to say my eye is not expert 
> though, so I humbly submit an annotated log exert in the hope some who knows 
> more than me can provide some illumination.
> The cluster seems to be ticking along fine, until we get errors on 2 of the 5 
> nodes like so:
> 2016-01-19 13:12:49,698 - WARN  [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@89] 
> - Exception when following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2016-01-19 13:12:49,698 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@166] - shutdown called
> java.lang.Exception: shutdown Follower
> at 
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
> at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
> This is immediately followed by 380 occurences of:
> 2016-01-19 13:12:49,699 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket 
> connection for client /X.Y.Z.56:59028 which had sessionid 0x151b01ee8330234
> and a:
> 2016-01-19 13:12:49,766 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:FollowerZooKeeperServer@139] - Shutting down
> 2016-01-19 13:12:49,766 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@441] - shutting down
> 2016-01-19 13:12:49,766 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:FollowerRequestProcessor@105] - Shutting down
> 2016-01-19 13:12:49,766 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:CommitProcessor@181] - Shutting down
> 2016-01-19 13:12:49,766 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:FinalRequestProcessor@415] - shutdown of 
> request processor complete
> 2016-01-19 13:12:49,767 - INFO  
> [QuorumPeer[myid=1]/0.0.0.0:2181:SyncRequestProcessor@209] - Shutting down
> 2016-01-19 13:12:49,767 - INFO  [CommitProcessor:1:CommitProcessor@150] - 
> CommitProcessor exited loop!
> 2016-01-19 13:12:49,767 - INFO  
> [FollowerRequestProcessor:1:FollowerRequestProcessor@95] - 
> FollowerRequestProcessor exited loop!
> 2016-01-19 13:13:09,418 - WARN  [SyncThread:1:FileTxnLog@334] - fsync-ing the 
> write ahead log in SyncThread:1 took 30334ms which will adversely effect 
> operation latency. See the ZooKeeper troubleshooting guide
> 2016-01-19 13:13:09,427 - WARN  [SyncThread:1:SendAckRequestProcessor@64] - 
> Closing connection to leader, exception during packet send
> java.net.SocketException: Socket closed
> at 
> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:121)
> at java.net.SocketOutputStream.write(SocketOutputStream.java:159)
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
> at 
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139)
> at 
> org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62)
> at 
> 

[jira] [Commented] (ZOOKEEPER-2773) zookeeper-service

2017-11-05 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239709#comment-16239709
 ] 

Brian Nixon commented on ZOOKEEPER-2773:


there's not enough information here to go on. Did you set ZOO_LOG_DIR and can 
you provide the zookeeper logs?

> zookeeper-service
> -
>
> Key: ZOOKEEPER-2773
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2773
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: quorum
>Affects Versions: 3.4.10
> Environment: Linux
>Reporter: Ashwath
>  Labels: beginner
> Fix For: 3.4.10
>
>
> Hi
> I run zookeeper in 3 Linux Machines. 
> 1.I downloaded zookeeper-3.4.10.jar file and extracted that.
> 2.I copy zoo_sample to zoo.cfg and edited datadir and added 3 ip address.
> 3.I created a new file called myid and insert numbers into that.
> Now I am running zookeeper cluster successfully..but
> When I am trying to run it as a service I am getting following error
> zookeeper.service - Apache ZooKeeper
>   Loaded: loaded (/lib/systemd/system/zookeeper.service; disabled; vendor 
> preset: enabled)
>   Active: activating (auto-restart) (Result: exit-code) since Wed 2017-05-03 
> 09:56:28 IST; 1s ago
>  Process: 678 ExecStart=/home/melon/software/ZooKeeper/zk/bin/zkServer.sh 
> start-foreground (code=exited
> Main PID: 678 (code=exited, status=127)
> May 03 09:56:28 deds14 systemd[1]: zookeeper.service: Unit entered failed 
> state.
> May 03 09:56:28 deds14 systemd[1]: zookeeper.service: Failed with result 
> 'exit-code'.
> Here the code I added
> Unit]
> Description=Apache ZooKeeper
> After=network.target
> ConditionPathExists=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf/zoo.cfg
> ConditionPathExists=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf/log4j.properties
> [Service]
> Environment="ZOOCFGDIR=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf"
> SyslogIdentifier=zookeeper
> WorkingDirectory=/home/melon/software/ZooKeeper
> ExecStart=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/bin/zkServer.sh
>  start-foreground
> Restart=on-failure
> RestartSec=20
> User=root
> Group=root
> Thank you



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss

2017-08-10 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-2872:
--

 Summary: Interrupted snapshot sync causes data loss
 Key: ZOOKEEPER-2872
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.3, 3.4.10, 3.6.0
Reporter: Brian Nixon


There is a way for observers to permanently lose data from their local data 
tree while remaining members of good standing with the ensemble and continuing 
to serve client traffic when the following chain of events occurs.

1. The observer dies in epoch N from machine failure.
2. The observer comes back up in epoch N+1 and requests a snapshot sync to 
catch up.
3. The machine powers off before the snapshot is synced to disc and after some 
txn's have been logged (depending on the OS, this can happen!).
4. The observer comes back a second time and replays its most recent snapshot 
(epoch <= N) as well as the txn logs (epoch N+1). 
5. A diff sync is requested from the leader and the observer broadcasts 
availability.

In this scenario, any commits from epoch N that the observer did not receive 
before it died the first time will never be exposed to the observer and no part 
of the ensemble will complain. 

This situation is not unique to observers and can happen to any learner. As a 
simple fix, fsync-ing the snapshots received from the leader will avoid the 
case of missing snapshots causing data loss.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ZOOKEEPER-2723) ConnectStringParser does not parse correctly if quorum string has znode path

2017-04-18 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973457#comment-15973457
 ] 

Brian Nixon commented on ZOOKEEPER-2723:


I'm not seeing a reference to ConnectStringParser in the attached stack trace - 
it looks like a DNS resolution problem. Can you add more detail?

> ConnectStringParser does not parse correctly if quorum string has znode path
> 
>
> Key: ZOOKEEPER-2723
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2723
> Project: ZooKeeper
>  Issue Type: Bug
>Reporter: Vishal Khandelwal
>Assignee: Vishal Khandelwal
>
> f2017-03-14 07:10:26,247 INFO [main] zookeeper.ZooKeeper - Initiating client 
> connection, 
> connectString=x1-1-was.ops.sfdc.net:2181,x2-1-was.ops.sfdc.net:2181,x3-1-was.ops.sfdc.net:2181,x4-1-was.ops.sfdc.net:2181,x5-1-was.ops.sfdc.net:2181:/hbase
>  sessionTimeout=6 
> watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@6e16b8b5 2017-03-14 
> 07:10:26,250 ERROR [main] client.StaticHostProvider - Unable to connect to 
> server: x5-1-was.ops.sfdc.net:2181:2181 java.net.UnknownHostException: 
> x5-1-was.ops.sfdc.net:2181: Name or service not known at 
> java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at 
> java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at 
> java.net.InetAddress.getAllByName0(InetAddress.java:1276) at 
> java.net.InetAddress.getAllByName(InetAddress.java:1192) at 
> java.net.InetAddress.getAllByName(InetAddress.java:1126) at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
>  at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:446) at 
> org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141)
>  at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.(RecoverableZooKeeper.java:128)
>  at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:135) at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:173)
>  at 
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:147)
>  at 
> org.apache.hadoop.hbase.client.ZooKeeperKeepAliveConnection.(ZooKeeperKeepAliveConnection.java:43)
>  at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(HConnectionManager.java:1875)
>  at 
> org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:82)
>  at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:929)
>  at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:714)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:466)
>  at 
> org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:445)
>  at 
> org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:326)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-04-13 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968437#comment-15968437
 ] 

Brian Nixon commented on ZOOKEEPER-2325:


When a server starts up, it should always capture the state of the loaded 
database with a fresh snapshot. I don't believe it is a valid state to have a 
log file without a snapshot file.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, 
> ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2725) Upgrading to a global session fails with a multiop

2017-03-15 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927211#comment-15927211
 ] 

Brian Nixon commented on ZOOKEEPER-2725:


I messed up the name of the pull request - meant to link 
https://github.com/apache/zookeeper/pull/195 to this issue.

> Upgrading to a global session fails with a multiop
> --
>
> Key: ZOOKEEPER-2725
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2725
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.5.2
>Reporter: Brian Nixon
>
> On an ensemble with local sessions enabled, when a client with a local 
> session requests the creation of an ephemeral node within a multi-op, the 
> client gets a session expired message.  The same multi-op works if the 
> session is already global. This breaks the client's expectation of seamless 
> promotion from local session to global session server-side. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZOOKEEPER-2725) Upgrading to a global session fails with a multiop

2017-03-15 Thread Brian Nixon (JIRA)
Brian Nixon created ZOOKEEPER-2725:
--

 Summary: Upgrading to a global session fails with a multiop
 Key: ZOOKEEPER-2725
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2725
 Project: ZooKeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.5.2
Reporter: Brian Nixon


On an ensemble with local sessions enabled, when a client with a local session 
requests the creation of an ephemeral node within a multi-op, the client gets a 
session expired message.  The same multi-op works if the session is already 
global. This breaks the client's expectation of seamless promotion from local 
session to global session server-side. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-08 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810600#comment-15810600
 ] 

Brian Nixon commented on ZOOKEEPER-2325:


Thanks [~hanm]!

Glad we kept the changes for this task and 261 separate. I'll make sure that 
https://github.com/apache/zookeeper/pull/120 still commits cleanly and update 
that PR as necessary.


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Priority: Critical
> Fix For: 3.5.3, 3.6.0
>
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

2017-01-06 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805547#comment-15805547
 ] 

Brian Nixon commented on ZOOKEEPER-2325:


Any word on committing this patch? I'd love to unblock ZOOKEEPER-261.

[~fpj] ?


> Data inconsistency if all snapshots empty or missing
> 
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
>  Issue Type: Bug
>  Components: server
>Affects Versions: 3.4.6
>Reporter: Andrew Grasso
>Assignee: Andrew Grasso
>Priority: Critical
> Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, 
> zk.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the 
> result of FileSnap.deserialize, which is -1L if no valid snapshots are found. 
> Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin 
> serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election

2016-12-06 Thread Brian Nixon (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727016#comment-15727016
 ] 

Brian Nixon commented on ZOOKEEPER-261:
---

Ben and I discussed this offline. When starting up without any local data, the 
safest thing to do is view this lack with extreme suspicion and not participate 
in voting until you can pull down the data tree from the rest of the ensemble. 
Such a server is not qualified to confirm which servers are up to date and 
could inadvertently elect a server that is missing some data. The one exception 
is the creation of a fresh ensemble, when there is no data to repopulate the 
local data tree. It's not clear that an ensemble can detect that it is in this 
state on its own since in the worst case, every server will be subject to the 
same data losing fault (in which case you should recover from backups instead 
of coming online as an empty data base). This extra information needs to come 
from the admin.

With the changes from ZOOKEEPER-2325, a server with no local data tree starts 
with a zxid of 0. I'll submit a pull request that changes that initial zxid to 
-1 unless a special 'initialize' file is present in the data directory and 
removes voting privileges from members reporting -1. The idea is that creating 
the 'initialize' file alongside 'myid' will be a standard part of ensemble 
creation - the extra information from the admin. The 'initialize' file will be 
automatically cleaned up by the server and subsequent restarts can view missing 
data directories as a sign they are legitimately missing context (e.g. adding 
to an existing ensemble).


> Reinitialized servers should not participate in leader election
> ---
>
> Key: ZOOKEEPER-261
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261
> Project: ZooKeeper
>  Issue Type: Improvement
>  Components: leaderElection, quorum
>Reporter: Benjamin Reed
>
> A server that has lost its data should not participate in leader election 
> until it has resynced with a leader. Our leader election algorithm and 
> NEW_LEADER commit assumes that the followers voting on a leader have not lost 
> any of their data. We should have a flag in the data directory saying whether 
> or not the data is preserved so that the the flag will be cleared if the data 
> is ever cleared.
> Here is the problematic scenario: you have have ensemble of machines A, B, 
> and C. C is down. the last transaction seen by C is z. a transaction, z+1, is 
> committed on A and B. Now there is a power outage. B's data gets 
> reinitialized. when power comes back up, B and C comes up, but A does not. C 
> will be elected leader and transaction z+1 is lost. (note, this can happen 
> even if all three machines are up and C just responds quickly. in that case C 
> would tell A to truncate z+1 from its log.) in theory we haven't violated our 
> 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, 
> but it would be nice if when we don't have quorum that system stops working 
> rather than works incorrectly if we lose quorum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)