[jira] [Created] (ZOOKEEPER-3459) Add admin command to display synced state of peer
Brian Nixon created ZOOKEEPER-3459: -- Summary: Add admin command to display synced state of peer Key: ZOOKEEPER-3459 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3459 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Assignee: Brian Nixon Add another command to the admin server that will respond with the current phase of the Zab protocol that a given peer is running. This will help with understanding what is going on in an ensemble while it is settling after a leader election and with programmatically checking for a healthy "broadcast" state. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3421) Better insight into Observer connections
Brian Nixon created ZOOKEEPER-3421: -- Summary: Better insight into Observer connections Key: ZOOKEEPER-3421 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3421 Project: ZooKeeper Issue Type: Wish Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon With the introduction of the Learner Master feature in ZOOKEEPER-3140, tracking the state of the Observers synced with the voting quorum became more difficult from an operational perspective. Observers could now be synced with any voting member and not just the leader and to discover where an observer was being hosted required digging in to the server logs or complex jmx queries. Add commands that externalize the state of observers from the point of view of the voting quorum. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3415) convert internal logic to use java 8 streams
Brian Nixon created ZOOKEEPER-3415: -- Summary: convert internal logic to use java 8 streams Key: ZOOKEEPER-3415 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3415 Project: ZooKeeper Issue Type: Wish Affects Versions: 3.6.0 Reporter: Brian Nixon There are a number of places in the code where for loops are used to perform basic filtering and collection. The java 8 stream api's make these operations much more polished. Since the master branch has been at this language level for a while, I'd wish for a (series of) refactor(s) to convert more of these loops to streams. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1523) Better logging during instance loading/syncing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847110#comment-16847110 ] Brian Nixon commented on ZOOKEEPER-1523: [~Yohan123] , I have some code stashed by the side that might address the 4LTR word portion of this ticket. Give me a chance to clean it up and add it as a PR. Maybe it can be used to bootstrap the requesting logging as well. > Better logging during instance loading/syncing > -- > > Key: ZOOKEEPER-1523 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1523 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum, server >Affects Versions: 3.3.5 >Reporter: Jordan Zimmerman >Priority: Critical > > When an instance is coming up and loading from snapshot, better logging is > needed so an operator knows how long until completion. Also, when syncing > with the leader, better logging is needed to know how long until success. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16846245#comment-16846245 ] Brian Nixon commented on ZOOKEEPER-1147: [~larsfrancke] - just created ZOOKEEPER-3400 to create some documentation. > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat >Priority: Major > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3400) Add documentation on local sessions
Brian Nixon created ZOOKEEPER-3400: -- Summary: Add documentation on local sessions Key: ZOOKEEPER-3400 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3400 Project: ZooKeeper Issue Type: Improvement Components: documentation Affects Versions: 3.6.0, 3.5.6 Reporter: Brian Nixon ZOOKEEPER-1147 added local sessions (client sessions not ratified by the leader) to ZooKeeper as a lightweight augmentation of the existing global sessions. Add some outward facing documentation that describes this feature ([https://zookeeper.apache.org/doc/r3.5.5/zookeeperProgrammers.html#ch_zkSessions] seems like a reasonable place). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1147) Add support for local sessions
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844382#comment-16844382 ] Brian Nixon commented on ZOOKEEPER-1147: Checking the usual places, I don't see any good documentation. The only good description of the feature is on this ticket. Seems like an obvious oversight - a new ticket should be created ([https://zookeeper.apache.org/doc/r3.5.5/zookeeperProgrammers.html#ch_zkSessions] seems like a reasonable place to land the feature description). > Add support for local sessions > -- > > Key: ZOOKEEPER-1147 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1147 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.3.3 >Reporter: Vishal Kathuria >Assignee: Thawan Kooburat >Priority: Major > Labels: api-change, scaling > Fix For: 3.5.0 > > Attachments: ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, ZOOKEEPER-1147.patch, > ZOOKEEPER-1147.patch > > Original Estimate: 840h > Remaining Estimate: 840h > > This improvement is in the bucket of making ZooKeeper work at a large scale. > We are planning on having about a 1 million clients connect to a ZooKeeper > ensemble through a set of 50-100 observers. Majority of these clients are > read only - ie they do not do any updates or create ephemeral nodes. > In ZooKeeper today, the client creates a session and the session creation is > handled like any other update. In the above use case, the session create/drop > workload can easily overwhelm an ensemble. The following is a proposal for a > "local session", to support a larger number of connections. > 1. The idea is to introduce a new type of session - "local" session. A > "local" session doesn't have a full functionality of a normal session. > 2. Local sessions cannot create ephemeral nodes. > 3. Once a local session is lost, you cannot re-establish it using the > session-id/password. The session and its watches are gone for good. > 4. When a local session connects, the session info is only maintained > on the zookeeper server (in this case, an observer) that it is connected to. > The leader is not aware of the creation of such a session and there is no > state written to disk. > 5. The pings and expiration is handled by the server that the session > is connected to. > With the above changes, we can make ZooKeeper scale to a much larger number > of clients without making the core ensemble a bottleneck. > In terms of API, there are two options that are being considered > 1. Let the client specify at the connect time which kind of session do they > want. > 2. All sessions connect as local sessions and automatically get promoted to > global sessions when they do an operation that requires a global session > (e.g. creating an ephemeral node) > Chubby took the approach of lazily promoting all sessions to global, but I > don't think that would work in our case, where we want to keep sessions which > never create ephemeral nodes as always local. Option 2 would make it more > broadly usable but option 1 would be easier to implement. > We are thinking of implementing option 1 as the first cut. There would be a > client flag, IsLocalSession (much like the current readOnly flag) that would > be used to determine whether to create a local session or a global session. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon resolved ZOOKEEPER-3349. Resolution: Not A Problem Fix Version/s: 3.6.0 > QuorumCnxManager socketTimeout unused > - > > Key: ZOOKEEPER-3349 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349 > Project: ZooKeeper > Issue Type: New Feature > Components: quorum >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Trivial > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the > class. It's clear from the context that it should either be removed entirely > or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit > can be changed by jmx, I'm thinking that the former is the better solution. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon reassigned ZOOKEEPER-3349: -- Assignee: Brian Nixon > QuorumCnxManager socketTimeout unused > - > > Key: ZOOKEEPER-3349 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349 > Project: ZooKeeper > Issue Type: New Feature > Components: quorum >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Trivial > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the > class. It's clear from the context that it should either be removed entirely > or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit > can be changed by jmx, I'm thinking that the former is the better solution. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844371#comment-16844371 ] Brian Nixon commented on ZOOKEEPER-3349: This parameter is being used again as of ZOOKEEPER-3378. Nothing to do here. > QuorumCnxManager socketTimeout unused > - > > Key: ZOOKEEPER-3349 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349 > Project: ZooKeeper > Issue Type: New Feature > Components: quorum >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Trivial > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > > QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the > class. It's clear from the context that it should either be removed entirely > or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit > can be changed by jmx, I'm thinking that the former is the better solution. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1000) Provide SSL in zookeeper to be able to run cross colos.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844370#comment-16844370 ] Brian Nixon commented on ZOOKEEPER-1000: That's my take as well. Not sure what else there would be to do here. > Provide SSL in zookeeper to be able to run cross colos. > --- > > Key: ZOOKEEPER-1000 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1000 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Mahadev konar >Assignee: Mahadev konar >Priority: Major > Fix For: 3.6.0, 3.5.6 > > > This jira is to track SSL for zookeeper. The inter zookeeper server > communication and the client to server communication should be over ssl so > that zookeeper can be deployed over WAN's. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3311) Allow a delay to the transaction log flush
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon reassigned ZOOKEEPER-3311: -- Assignee: Brian Nixon > Allow a delay to the transaction log flush > --- > > Key: ZOOKEEPER-3311 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3311 > Project: ZooKeeper > Issue Type: New Feature > Components: server >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > The SyncRequestProcessor flushes writes to disk either when 1000 writes are > pending to be flushed or when the processor fails to retrieve another write > from its incoming queue. The "flush when queue empty" condition operates > poorly under many workloads as it can quickly degrade into flushing after > every write -- losing all benefits of batching and leading to a continuous > stream of flushes + fsyncs which overwhelm the underlying disk. > > A configurable flush delay would ensure flushes do not happen more frequently > than once every X milliseconds. This can be used in-place of or jointly with > batch size triggered flushes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-3378) Set the quorum cnxn timeout independently from syncLimit
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon reassigned ZOOKEEPER-3378: -- Assignee: Brian Nixon > Set the quorum cnxn timeout independently from syncLimit > > > Key: ZOOKEEPER-3378 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Minor > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > If an ensemble requires a high sync limit to support a large data tree or > transaction rate, it can cause the QuorumCxnManager to hang over-long in > response to quorum events. Using the sync limit for this timeout is a > convenience in terms of keeping all failure detection mechanisms in sync but > it is not strictly required for correct behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3396) Flaky test in RestoreCommittedLogTest
Brian Nixon created ZOOKEEPER-3396: -- Summary: Flaky test in RestoreCommittedLogTest Key: ZOOKEEPER-3396 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3396 Project: ZooKeeper Issue Type: Improvement Components: tests Affects Versions: 3.6.0 Reporter: Brian Nixon The patch for ZOOKEEPER-3244 ([https://github.com/apache/zookeeper/pull/770)] introduced a flaky test RestoreCommittedLogTest::testRestoreCommittedLogWithSnapSize. Get it running consistently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3395) Document individual admin commands in markdown
Brian Nixon created ZOOKEEPER-3395: -- Summary: Document individual admin commands in markdown Key: ZOOKEEPER-3395 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3395 Project: ZooKeeper Issue Type: Improvement Components: documentation Affects Versions: 3.6.0, 3.5.6 Reporter: Brian Nixon The "ZooKeeper Commands" section of the ZooKeeper Administrator's Guide takes time to document each four letter command individually but when it comes to the admin commands, it just directs the user to query a live peer in order to get the supported list (e.g. curl http://localhost:8080/commands). While such a query will provide the best source for the admin commands available on a given ZooKeeper version, it's not replacement for the role that the central guide provides. Create an enumerated list of the supported admin commands in the section "The AdminServer" in the style that the four letter commands are documented in "The Four Letter Words". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3394) Delay observer reconnect when all learner masters have been tried
Brian Nixon created ZOOKEEPER-3394: -- Summary: Delay observer reconnect when all learner masters have been tried Key: ZOOKEEPER-3394 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3394 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.6.0 Reporter: Brian Nixon Observers will disconnect when the voting peers perform a leader election and reconnect after. The delay zookeeper.observer.reconnectDelayMs was added to insulate the voting peers from the observers returning. With a large number of peers and the observerMaster feature active, this delay is mostly detrimental as it means that the observer is more likely to get hung up on connecting to a bad (down/corrupt) peer and it would be better off switching to a new one quickly. To retain the protective virtue of the delay, it makes sense to add a delay that after all observer master's in the list have been tried before iterating through the list again. In the case where observer master's are not active, this degenerates to a delay between connection attempts on the leader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3392) Add admin command to display last snapshot information
Brian Nixon created ZOOKEEPER-3392: -- Summary: Add admin command to display last snapshot information Key: ZOOKEEPER-3392 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3392 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Basic systems to backup ZooKeeper data will maintain snapshot files of the data tree. In order to understand the health of these systems, they need a way to determine how in or out of date their files are to the current state in the ensemble. Add an admin command that exposes the zxid and timestamp of the last saved/restored snapshot of the server. This will let such a backup system know when it can update and when it is stale. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3388) Allow client port to support plaintext and encrypted connections simultaneously
Brian Nixon created ZOOKEEPER-3388: -- Summary: Allow client port to support plaintext and encrypted connections simultaneously Key: ZOOKEEPER-3388 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3388 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon ZOOKEEPER-2125 extended the ZooKeeper server-side to handle encrypted client connections by allowing the server to open a second client port (the secure client port) to manage this new style of traffic. A server is able to handle plaintext and encrypted clients simultaneously by managing each on their respective ports. When it comes time to get all clients connecting to your system to start using encryption, this approach requires that they make two changes simultaneously: altering their client properties to start use the secure settings and altering the routing information that they provide in order to know where to connect with the ensemble. If either is misconfigured then the client is cut off from the ensemble. With a large deployment of clients that are owned by a different teams and different tools, this presents a danger in activating the feature. Ideally, the two changes could be staggered so that first the encryption feature is activated and then the routing information is changed in a subsequent phase. Allow the server connection factory managing the regular client port to handle both plaintext and encrypted connections. This will be independent of the operation of the server connection factory managing the secure client port but similar settings ought to apply to both (e.g. cipher suites) to maintain inter compatibility. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3386) Add admin command to display voting view
Brian Nixon created ZOOKEEPER-3386: -- Summary: Add admin command to display voting view Key: ZOOKEEPER-3386 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3386 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Solid agreement on the set of voting servers is a necessity for ZooKeeper and it's useful to audit that agreement to validate it does not drift into some pathological condition. Create an admin command that exposes the ensemble voting members from the point of view of the queried server. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3385) Add admin command to display leader
Brian Nixon created ZOOKEEPER-3385: -- Summary: Add admin command to display leader Key: ZOOKEEPER-3385 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3385 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Each QuorumPeer prints the identity of the server it believes is the leader in its logs but that is not easily turned into diagnostic information about the state of the ensemble. It can be useful in debugging various issues, both when a quorum is struggling to be established and when a minority of peers are failing to follow, to see at a glance which peers are following the leader elected by the majority and which peers are either not following or following a different server. Create an admin command that exposes which server a peer believes is the current leader. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3378) et the quorum cnxn timeout independently from syncLimit
Brian Nixon created ZOOKEEPER-3378: -- Summary: et the quorum cnxn timeout independently from syncLimit Key: ZOOKEEPER-3378 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378 Project: ZooKeeper Issue Type: Improvement Components: quorum Reporter: Brian Nixon -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3378) Set the quorum cnxn timeout independently from syncLimit
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon updated ZOOKEEPER-3378: --- Description: If an ensemble requires a high sync limit to support a large data tree or transaction rate, it can cause the QuorumCxnManager to hang over-long in response to quorum events. Using the sync limit for this timeout is a convenience in terms of keeping all failure detection mechanisms in sync but it is not strictly required for correct behavior. > Set the quorum cnxn timeout independently from syncLimit > > > Key: ZOOKEEPER-3378 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Reporter: Brian Nixon >Priority: Minor > > If an ensemble requires a high sync limit to support a large data tree or > transaction rate, it can cause the QuorumCxnManager to hang over-long in > response to quorum events. Using the sync limit for this timeout is a > convenience in terms of keeping all failure detection mechanisms in sync but > it is not strictly required for correct behavior. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3378) Set the quorum cnxn timeout independently from syncLimit
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon updated ZOOKEEPER-3378: --- Summary: Set the quorum cnxn timeout independently from syncLimit (was: et the quorum cnxn timeout independently from syncLimit) > Set the quorum cnxn timeout independently from syncLimit > > > Key: ZOOKEEPER-3378 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3378 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Reporter: Brian Nixon >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-1651) Add support for compressed snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831159#comment-16831159 ] Brian Nixon commented on ZOOKEEPER-1651: This feature was accepted with ZOOKEEPER-3179. I'd suggest marking this ticket as resolved. > Add support for compressed snapshot > --- > > Key: ZOOKEEPER-1651 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1651 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Thawan Kooburat >Assignee: Brian Nixon >Priority: Major > > We want to keep many copies of snapshots on disk so that we can debug the > problem afterward. However, the snapshot can be large, so we added a feature > that allow the server to dump/load snapshot in a compressed format (snappy or > gzip). This also benefit db loading and snapshotting time as well. > This is also depends on client workload. In one of our deployment where > clients don't compress its data, we found that snappy compression work best. > The snapshot size is reduced from 381M to 65MB. Db loading/and snapshotting > time is also reduced by 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ZOOKEEPER-1651) Add support for compressed snapshot
[ https://issues.apache.org/jira/browse/ZOOKEEPER-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon reassigned ZOOKEEPER-1651: -- Assignee: Brian Nixon (was: Thawan Kooburat) > Add support for compressed snapshot > --- > > Key: ZOOKEEPER-1651 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1651 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Thawan Kooburat >Assignee: Brian Nixon >Priority: Major > > We want to keep many copies of snapshots on disk so that we can debug the > problem afterward. However, the snapshot can be large, so we added a feature > that allow the server to dump/load snapshot in a compressed format (snappy or > gzip). This also benefit db loading and snapshotting time as well. > This is also depends on client workload. In one of our deployment where > clients don't compress its data, we found that snappy compression work best. > The snapshot size is reduced from 381M to 65MB. Db loading/and snapshotting > time is also reduced by 20%. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3359) Batch commits in the CommitProcessor
Brian Nixon created ZOOKEEPER-3359: -- Summary: Batch commits in the CommitProcessor Key: ZOOKEEPER-3359 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3359 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.6.0 Reporter: Brian Nixon Draining a single commit every time the CommitProcessor switches to commit mode can add to the backlog of committed messages. Instead, add controls to batch and drain multiple commits and limit the number of reads being served. Improves commit throughput and adds backpressure on reads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3352) Use LevelDB For Backend
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813760#comment-16813760 ] Brian Nixon commented on ZOOKEEPER-3352: We've been curious whether wiring ZooKeeper on top of RocksDB could give similar performance benefits that Instagram saw with putting Apache Cassandra on top of RocksDB ([https://instagram-engineering.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latency-d64f86b43589] for some details). Something like this ticket that involves abstracting out the data storage components would be useful for us. > Use LevelDB For Backend > --- > > Key: ZOOKEEPER-3352 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3352 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: David Mollitor >Assignee: David Mollitor >Priority: Major > Fix For: 4.0.0 > > > Use LevelDB for managing data stored in ZK (transaction logs and snapshots). > https://stackoverflow.com/questions/6779669/does-leveldb-support-java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3354) Improve efficiency of DeleteAllCommand
Brian Nixon created ZOOKEEPER-3354: -- Summary: Improve efficiency of DeleteAllCommand Key: ZOOKEEPER-3354 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3354 Project: ZooKeeper Issue Type: Improvement Components: other Affects Versions: 3.6.0 Reporter: Brian Nixon The cli DeleteAllCommand internally uses a synchronous iterative formula. This can be improved with batching for quicker response time on large subtrees. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3353) Admin commands for showing initial settings
Brian Nixon created ZOOKEEPER-3353: -- Summary: Admin commands for showing initial settings Key: ZOOKEEPER-3353 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3353 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon It can be useful as a sysadmin to know the settings that were initially used to configure a given ZooKeeper server. Some of these can be read from the process logs and others from the java args in the process description but if, for example, the zoo.cfg file used when starting a process up is overwritten without the process itself being restarted then it can be difficult to know exactly what is currently being run on the jvm. Produce admin commands (and four-letter commands) to answer these questions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3343) Add a new doc: zookeeperTools.md
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16812869#comment-16812869 ] Brian Nixon commented on ZOOKEEPER-3343: This will be great, thanks [~maoling]! > Add a new doc: zookeeperTools.md > > > Key: ZOOKEEPER-3343 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3343 > Project: ZooKeeper > Issue Type: New Feature > Components: documentation >Affects Versions: 3.5.4 >Reporter: maoling >Assignee: maoling >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > write zookeeper tools[3.7], which includes the: > - list all usages of the shells under the zookeeper/bin. (e.g > zkTxnLogToolkit.sh,zkCleanup.sh) > - benchmark tool > - backup tool > - test tools:jepsen -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon updated ZOOKEEPER-3349: --- Description: QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the class. It's clear from the context that it should either be removed entirely or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit can be changed by jmx, I'm thinking that the former is the better solution. was: QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the class. It's clear from the context that it should either be removed entirely or invoked in QuorumCnxManager::setSockOpts. > QuorumCnxManager socketTimeout unused > - > > Key: ZOOKEEPER-3349 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349 > Project: ZooKeeper > Issue Type: New Feature > Components: quorum >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Trivial > > QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the > class. It's clear from the context that it should either be removed entirely > or invoked in QuorumCnxManager::setSockOpts. Since the QuorumPeer syncLimit > can be changed by jmx, I'm thinking that the former is the better solution. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3349) QuorumCnxManager socketTimeout unused
Brian Nixon created ZOOKEEPER-3349: -- Summary: QuorumCnxManager socketTimeout unused Key: ZOOKEEPER-3349 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3349 Project: ZooKeeper Issue Type: New Feature Components: quorum Affects Versions: 3.6.0 Reporter: Brian Nixon QuorumCnxManager member variable 'socketTimeout' is not used anywhere in the class. It's clear from the context that it should either be removed entirely or invoked in QuorumCnxManager::setSockOpts. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3318) Add a complete backup mechanism for zookeeper internal
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804453#comment-16804453 ] Brian Nixon commented on ZOOKEEPER-3318: In the interest of a compact backup, I would like to see a way to combine a fuzzy snapshot and subsequent transaction logs into a single perfect snapshot of the data tree. One possible backup solution based on this - start up an observer process to pull the data tree live to a new directory then a subsequent operation to combine the resultant files into the perfect snapshot. > Add a complete backup mechanism for zookeeper internal > -- > > Key: ZOOKEEPER-3318 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3318 > Project: ZooKeeper > Issue Type: New Feature > Components: other >Reporter: maoling >Assignee: maoling >Priority: Major > > We already had some workaround ways for the backup, e.g > *scenario 1:* just write a cron shell to copy the snapshots periodically. > *scenario 2:* use the observer as the role of backup, then write the > snapshots to file system. (e.g HDFS) > this issue is aiming to implement a complete backup mechanism for zookeeper > internal: > the init propose: > *1*. write a new CLI:snapshot > *1.1* > because this CLI may be time-consuming.A confirmation is needed. e.g. > [zk: 127.0.0.1:2180(CONNECTED) 0] snapshot backupDataDir > Are you sure to exec:snapshot [yes/no] > *1.2* > if no parameter, the default backupDataDir is the dataDir. the format of the > backup-snapshot is just like: backup_snapshot.f9f82834 with the "backup_" > prefix,when recovering,rename backup_snapshot.f9f82834 to > snapshot.f9f82834 and move it to the dataDir, then restart the ensemble. > *1.3* > don't worry about exposing the takeSnap() api to the client.Look at this two > references: > https://github.com/etcd-io/etcd/blob/master/clientv3/snapshot/v3_snapshot.go > > https://github.com/xetorthio/jedis/blob/master/src/main/java/redis/clients/jedis/commands/BasicCommands.java#L68 > *2*. > *2.1* > write a new tool/shell: zkBackup.sh which is the reverse proces of the > zkCleanup.sh for no-realtime backup > *2.2* > write a new tool/shell: zkBackup_v2.sh which calls the api of the takeSnap() > for realtime backup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3332) TxnLogToolkit should print multi transactions readably
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801274#comment-16801274 ] Brian Nixon commented on ZOOKEEPER-3332: This is a really useful change. We have a couple of flags in our LogFormatter that lets us optionally inspect the elements of a MultiTxn as well as dump the data of each. We never ported them to the new TxnLogToolkit paradigm but we can put up a PR of our changes so you can compare. If I forget, ping me to remind me. > TxnLogToolkit should print multi transactions readably > -- > > Key: ZOOKEEPER-3332 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3332 > Project: ZooKeeper > Issue Type: Improvement >Reporter: Toshihiro Suzuki >Assignee: maoling >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, LogFormatter shows multi transactions like the following and it's > not readable: > {code:java} > 3/23/19 7:35:21 AM UTC session 0x3699141c4080020 cxid 0x21 zxid 0x102d9 > multi > v{s{1,#000292f726d73746f72652f5a4b524d5374617465526f6f742f524d5f5a4b5f46454e43494e475f4c4f434b00010001f0005776f726c640006616e796f6e651c},s{5,#000312f726d73746f72652f5a4b524d5374617465526f6f742f414d524d546f6b656e5365637265744d616e61676572526f6f7400012a108ffe0fffdff92fff15128fff5ff9a731174ffa8ff86ffb40009},s{2,#000292f726d73746f72652f5a4b524d5374617465526f6f742f524d5f5a4b5f46454e43494e475f4c4f434b}} > {code} > Like delete and setData as the following, LogFormatter should print multi > transactions readably: > {code:java} > 3/22/19 7:20:48 AM UTC session 0x2699141c3f70022 cxid 0x885 zxid 0x102cc > delete '/hbase-unsecure/region-in-transition/d6694b5f7ec2c45f6096fe373c8a34bc > 3/22/19 7:20:50 AM UTC session 0x2699141c3f70024 cxid 0x47 zxid 0x102cd > setData > '/hbase-unsecure/region-in-transition/a9c6dac76ce74812196667ebc01dad51,#0001a726567696f6e7365727665723a313630323035617afffa42ff94ffe81f5042554684123f53595354454d2e434154414c4f472c2c313535333233313233393533352e61396336646163373663653734383132313936363637656263303164616435312e18ffe9ffa8ff98ffa2ff9a2d2228a1c633132362d6e6f6465342e7371756164726f6e2d6c6162732e636f6d10ff947d18ffcbff96ffa2ff9a2d,2 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3333) Detect if txnlogs and / or snapshots is deleted under a running ZK instance
[ https://issues.apache.org/jira/browse/ZOOKEEPER-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801160#comment-16801160 ] Brian Nixon commented on ZOOKEEPER-: I'm assuming that the scenario in mind is a rogue process on your host that is deleting files. One thing to note is that until ZOOKEEPER-3318 is completed to a reasonable state, people may outsource their backups to an external process - in which case the transaction log files and the snapshot files may have their lifecycle controlled by something that is not ZooKeeper (and ZooKeeper should not die when files disappear). Having a message logged when a .snap or .log file is unexpectedly changed seems reasonable. Could also enable a feature by which the deletion of transaction logs triggers a snapshot to make sure the data tree would survive a sudden restart. I would not kill the server when a transaction log disappears since that would remove your one known copy of the data tree (in memory). To implement this, you may be able to reuse the FileChangeWatcher that was added for the TLS work or at least copy from its approach. > Detect if txnlogs and / or snapshots is deleted under a running ZK instance > --- > > Key: ZOOKEEPER- > URL: https://issues.apache.org/jira/browse/ZOOKEEPER- > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.6.0, 3.5.5, 3.4.14 >Reporter: Norbert Kalmar >Priority: Major > > ZK does not notice if txnlogs are deleted from it's dataDir, and it will just > keep running, writing txns in the buffer. Than, when ZK is restarted, it will > lose all data. > To reproduce: > I run a 3 node ZK ensemble, and deleted dataDir for just one instance, than > wrote some data. It turns out, it will not write the transaction to disk. ZK > stores everything in memory, until it “feels like” it’s time to persist it on > disk. So it doesn’t even notice the file is deleted, and when it tried to > flush, I imagine it just fails and keeps it in the buffer. > So anyway, I restarted the instance, it got the snapshot + latest txn logs > from the other nodes, as expected it would. It also wrote them in dataDir, so > now every node had the dataDir. > So deleting from one node is fine (again, as expected, they will sync after a > restart). > Then, I deleted all 3 nodes dataDir under running instances. Until restart, > it worked fine (of course I was getting my buffer full, I did not test until > the point it got overflowed). > But after restart, I got a fresh new ZK with all my znodes gone. > For starter, I think ZK should detect if the file it is appending is removed. > What should ZK do? At least give a warning log message. The question should > it try to create a new file? Or try to get it from other nodes? Or just fail > instantly? Restart itself, see if it can sync? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3331) Automatically add IP authorization for Netty connections
Brian Nixon created ZOOKEEPER-3331: -- Summary: Automatically add IP authorization for Netty connections Key: ZOOKEEPER-3331 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3331 Project: ZooKeeper Issue Type: New Feature Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon NIOServerCnxn automatically adds the client's address as an auth token under the "ip" scheme. Extend that functionality to the NettyServerCnxn as well to bring parity to the two approaches. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3320) Leader election port stop listen when hostname unresolvable for some time
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797657#comment-16797657 ] Brian Nixon commented on ZOOKEEPER-3320: A configurable retry seems like a good idea to me. Either something like "election port bind time" or "dns unavailable time" if we want to be more general. Do you want to contribute a short diff? This may also be related to ZOOKEEPER-2982 (or may not, making a note to check later). > Leader election port stop listen when hostname unresolvable for some time > -- > > Key: ZOOKEEPER-3320 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3320 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.4.10, 3.5.4 >Reporter: Igor Skokov >Priority: Major > > When trying to run Zookeeper 3.5.4 cluster on Kubernetes, I found out that in > some circumstances Zookeeper node stop listening on leader election port. > This cause unavailability of ZK cluster. > Zookeeper deployed as StatefulSet in term of Kubernetes and has following > dynamic configuration: > {code:java} > zookeeper-0.zookeeper:2182:2183:participant;2181 > zookeeper-1.zookeeper:2182:2183:participant;2181 > zookeeper-2.zookeeper:2182:2183:participant;2181 > {code} > Bind address contains DNS name which generated by Kubernetes for each > StatefulSet pod. > These DNS names will become resolvable after container start, but with some > delay. That delay cause stopping of leader election port listener in > QuorumCnxManager.Listener class. > Error happens in QuorumCnxManager.Listener "run" method, it tries to bind > leader election port to hostname which not resolvable at this moment. Retry > count is hard-coded and equals to 3(with backoff of 1 sec). > Zookeeper server log contains following errors: > {code:java} > 2019-03-17 07:56:04,844 [myid:1] - WARN > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1230] - > Unexpected exception > java.net.SocketException: Unresolved address > at java.base/java.net.ServerSocket.bind(ServerSocket.java:374) > at java.base/java.net.ServerSocket.bind(ServerSocket.java:335) > at org.apache.zookeeper.server.quorum.Leader.(Leader.java:241) > at > org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226) > 2019-03-17 07:56:04,844 [myid:1] - WARN > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1261] - > PeerState set to LOOKING > 2019-03-17 07:56:04,845 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1136] - > LOOKING > 2019-03-17 07:56:04,845 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@893] > - New election. My id = 1, proposed zxid=0x0 > 2019-03-17 07:56:04,846 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@687] - Notification: 2 (message > format version), 1 (n.leader), 0x0 (n.zxid), 0xf (n.round), LOOKING > (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config > version) > 2019-03-17 07:56:04,979 [myid:1] - INFO > [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@892] - Leaving listener > 2019-03-17 07:56:04,979 [myid:1] - ERROR > [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@894] - As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: zookeeper-0.zookeeper:2183 > {code} > This error happens on most nodes on cluster start and Zookeeper is unable to > form quorum. This will leave cluster in unusable state. > As I can see, error present on branches 3.4 and 3.5. > I think, this error can be fixed by configurable number of retries(instead of > hard-coded value of 3). > Other way to fix this is removing of max retries at all. Currently, ZK server > only stop leader election listener and continue to serve on other ports. > Maybe, if leader election halts, we should abort process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3320) Leader election port stop listen when hostname unresolvable for some time
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796341#comment-16796341 ] Brian Nixon commented on ZOOKEEPER-3320: This is an interesting error case! I would expect an issue in QuorumCnxManager to bring the peer down if it cannot create the socket but it seems this only occurs with a BindException and not a generic SocketException. At the least, I think we ought to fix that. Looking at this from the opposite direction, can you add the desired delay in the startup sequence of your Kubernetes container? My concern is that the pattern of "DNS is currently unreliable but will be reliable soon" seems specific to the container management and may result in strange behavior when applied to other environments. > Leader election port stop listen when hostname unresolvable for some time > -- > > Key: ZOOKEEPER-3320 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3320 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.4.10, 3.5.4 >Reporter: Igor Skokov >Priority: Major > > When trying to run Zookeeper 3.5.4 cluster on Kubernetes, I found out that in > some circumstances Zookeeper node stop listening on leader election port. > This cause unavailability of ZK cluster. > Zookeeper deployed as StatefulSet in term of Kubernetes and has following > dynamic configuration: > {code:java} > zookeeper-0.zookeeper:2182:2183:participant;2181 > zookeeper-1.zookeeper:2182:2183:participant;2181 > zookeeper-2.zookeeper:2182:2183:participant;2181 > {code} > Bind address contains DNS name which generated by Kubernetes for each > StatefulSet pod. > These DNS names will become resolvable after container start, but with some > delay. That delay cause stopping of leader election port listener in > QuorumCnxManager.Listener class. > Error happens in QuorumCnxManager.Listener "run" method, it tries to bind > leader election port to hostname which not resolvable at this moment. Retry > count is hard-coded and equals to 3(with backoff of 1 sec). > Zookeeper server log contains following errors: > {code:java} > 2019-03-17 07:56:04,844 [myid:1] - WARN > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1230] - > Unexpected exception > java.net.SocketException: Unresolved address > at java.base/java.net.ServerSocket.bind(ServerSocket.java:374) > at java.base/java.net.ServerSocket.bind(ServerSocket.java:335) > at org.apache.zookeeper.server.quorum.Leader.(Leader.java:241) > at > org.apache.zookeeper.server.quorum.QuorumPeer.makeLeader(QuorumPeer.java:1023) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:1226) > 2019-03-17 07:56:04,844 [myid:1] - WARN > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1261] - > PeerState set to LOOKING > 2019-03-17 07:56:04,845 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):QuorumPeer@1136] - > LOOKING > 2019-03-17 07:56:04,845 [myid:1] - INFO > [QuorumPeer[myid=1](plain=/0.0.0.0:2181)(secure=disabled):FastLeaderElection@893] > - New election. My id = 1, proposed zxid=0x0 > 2019-03-17 07:56:04,846 [myid:1] - INFO > [WorkerReceiver[myid=1]:FastLeaderElection@687] - Notification: 2 (message > format version), 1 (n.leader), 0x0 (n.zxid), 0xf (n.round), LOOKING > (n.state), 1 (n.sid), 0x0 (n.peerEPoch), LOOKING (my state)0 (n.config > version) > 2019-03-17 07:56:04,979 [myid:1] - INFO > [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@892] - Leaving listener > 2019-03-17 07:56:04,979 [myid:1] - ERROR > [zookeeper-0.zookeeper:2183:QuorumCnxManager$Listener@894] - As I'm leaving > the listener thread, I won't be able to participate in leader election any > longer: zookeeper-0.zookeeper:2183 > {code} > This error happens on most nodes on cluster start and Zookeeper is unable to > form quorum. This will leave cluster in unusable state. > As I can see, error present on branches 3.4 and 3.5. > I think, this error can be fixed by configurable number of retries(instead of > hard-coded value of 3). > Other way to fix this is removing of max retries at all. Currently, ZK server > only stop leader election listener and continue to serve on other ports. > Maybe, if leader election halts, we should abort process. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3318) Add a complete backup mechanism for zookeeper internal
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795253#comment-16795253 ] Brian Nixon commented on ZOOKEEPER-3318: This would be great! > Add a complete backup mechanism for zookeeper internal > -- > > Key: ZOOKEEPER-3318 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3318 > Project: ZooKeeper > Issue Type: New Feature > Components: other >Reporter: maoling >Assignee: maoling >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3311) Allow a delay to the transaction log flush
Brian Nixon created ZOOKEEPER-3311: -- Summary: Allow a delay to the transaction log flush Key: ZOOKEEPER-3311 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3311 Project: ZooKeeper Issue Type: New Feature Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon The SyncRequestProcessor flushes writes to disk either when 1000 writes are pending to be flushed or when the processor fails to retrieve another write from its incoming queue. The "flush when queue empty" condition operates poorly under many workloads as it can quickly degrade into flushing after every write -- losing all benefits of batching and leading to a continuous stream of flushes + fsyncs which overwhelm the underlying disk. A configurable flush delay would ensure flushes do not happen more frequently than once every X milliseconds. This can be used in-place of or jointly with batch size triggered flushes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3264) Add a benchmark tool for zookeeper
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16783824#comment-16783824 ] Brian Nixon commented on ZOOKEEPER-3264: I know that [~breed] at one point was thinking of using the _Java Microbenchmark Harness_, this may also be worth exploring. > Add a benchmark tool for zookeeper > -- > > Key: ZOOKEEPER-3264 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3264 > Project: ZooKeeper > Issue Type: New Feature > Components: other >Reporter: maoling >Assignee: maoling >Priority: Major > > Reference: > https://github.com/etcd-io/etcd/blob/master/tools/benchmark/cmd/range.go > https://github.com/antirez/redis/blob/unstable/src/redis-benchmark.c > https://github.com/phunt/zk-smoketest/blob/master/zk-latencies.py > https://github.com/brownsys/zookeeper-benchmark/blob/master/src/main/java/edu/brown/cs/zkbenchmark/ZooKeeperBenchmark.java -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3287) admin command to dump currently known ACLs
Brian Nixon created ZOOKEEPER-3287: -- Summary: admin command to dump currently known ACLs Key: ZOOKEEPER-3287 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3287 Project: ZooKeeper Issue Type: New Feature Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Add a new command to dump the set of ACLs currently applied on the data tree. Used by an admin to check what controls are being set for an ensemble. A flat list with no connection to the data will suffice - will have to think whether any details ought to be emitted as a cryptographic hash to preserve secrecy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3257) Merge count and byte update of Stat
Brian Nixon created ZOOKEEPER-3257: -- Summary: Merge count and byte update of Stat Key: ZOOKEEPER-3257 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3257 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon There is duplication of effort when updating the stats. Merge the count update and the byte update into one call and simplify the logic. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3180) Add response cache to improve the throughput of read heavy traffic
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746687#comment-16746687 ] Brian Nixon commented on ZOOKEEPER-3180: Creating ZOOKEEPER-3252 as a follow up. > Add response cache to improve the throughput of read heavy traffic > --- > > Key: ZOOKEEPER-3180 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3180 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Reporter: Fangmin Lv >Assignee: Brian Nixon >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > On read heavy use case with large response data size, the serialization of > response takes time and added overhead to the GC. > Add response cache helps improving the throughput we can support, which also > reduces the latency in general. > This Jira is going to implement a LRU cache for the response, which shows > some performance gain on some of our production ensembles. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3252) Extend the options for the response cache
Brian Nixon created ZOOKEEPER-3252: -- Summary: Extend the options for the response cache Key: ZOOKEEPER-3252 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3252 Project: ZooKeeper Issue Type: Improvement Components: server Reporter: Brian Nixon The response cache added in ZOOKEEPER-3180 is fairly bare bones. It does its job but there is room for experimentation and improvement. From the issue pull request ([https://github.com/apache/zookeeper/pull/684):] {quote}"the alternate eviction policies you outline and that LinkedHashMap allows. I see three reasonable paths here, {quote} * {quote}Merge this pr as it is (perhaps rename LRUCache to just Cache) and open a new JIRA to explore future paths.{quote} * {quote}I add another property that lets one toggle between insertion order and access order with the current implementation as the default.{quote} * {quote}Drop LinkedHashMap entirely and go with something like a guava Cache."{quote} It was merged with path 1 chosen but I remain interested in the optimizations that were suggested. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740916#comment-16740916 ] Brian Nixon commented on ZOOKEEPER-3240: [~hanm] could it be that the unclosed/unreaped Socket on the Learner side is still maintaining its end of the tcp connection correctly via the protocol so the Leader is unable to sense the change in Learner status through the status of the network connection? I confess that I'm not as knowledgeable about the workings of Socket as I need to be to confirm this theory. > Close socket on Learner shutdown to avoid dangling socket > - > > Key: ZOOKEEPER-3240 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Minor > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > There was a Learner that had two connections to the Leader after that Learner > hit an unexpected exception during flush txn to disk, which will shutdown > previous follower instance and restart a new one. > > {quote}2018-10-26 02:31:35,568 ERROR > [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from > thread : SyncThread:3 > java.io.IOException: Input/output error > at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72) > at > java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172) > 2018-10-26 02:31:35,568 INFO [SyncThread:3:ZooKeeperServerListenerImpl@42] - > Thread SyncThread:3 exits, error code 1 > 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - > SyncRequestProcessor exited!{quote} > > It is supposed to close the previous socket, but it doesn't seem to be done > anywhere in the code. This leaves the socket open with no one reading from > it, and caused the queue full and blocked on sender. > > Since the LearnerHandler didn't shutdown gracefully, the learner queue size > keeps growing, the JVM heap size on leader keeps growing and added pressure > to the GC, and cause high GC time and latency in the quorum. > > The simple fix is to gracefully shutdown the socket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3244) Add option to snapshot based on log size
Brian Nixon created ZOOKEEPER-3244: -- Summary: Add option to snapshot based on log size Key: ZOOKEEPER-3244 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3244 Project: ZooKeeper Issue Type: New Feature Components: server Reporter: Brian Nixon Currently, ZooKeeper only takes snapshot based on the snap count. If the workload on an ensemble includes large txns then we'll end up with large amount data kept on disk, and might have a low disk space issue. Add a maximum limit on the total size of the log files between each snapshot. This will change the snap frequency, which means with the same snap retention number a server will eat up less disk. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2669) follower failed to reconnect to leader after a network error
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739751#comment-16739751 ] Brian Nixon commented on ZOOKEEPER-2669: Is this related to ZOOKEEPER-3240 ? > follower failed to reconnect to leader after a network error > - > > Key: ZOOKEEPER-2669 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2669 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server >Affects Versions: 3.4.9 > Environment: CentOS7 >Reporter: Zhenghua Chen >Priority: Major > > We have a zookeeper cluster with 3 nodes named s1, s2, s3 > By mistake, we shut down the ethernet interface of s2, and zk follower shut > down(zk process remains there) > Later, after ethernet up again, s2 failed to reconnect to leader s3 to be a > follower > follower s2 keeps printing log like this: > {quote} > 2017-01-19 16:40:58,956 WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:7181] > o.a.z.s.q.Learner - Got zxid 0x320001019f expected 0x1 > 2017-01-19 16:40:58,956 ERROR [SyncThread:1] o.a.z.s.ZooKeeperCriticalThread > - Severe unrecoverable error, from thread : SyncThread:1 > java.nio.channels.ClosedChannelException: null > at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:99) > at sun.nio.ch.FileChannelImpl.position(FileChannelImpl.java:250) > at > org.apache.zookeeper.server.persistence.Util.padLogFile(Util.java:215) > at > org.apache.zookeeper.server.persistence.FileTxnLog.padFile(FileTxnLog.java:241) > at > org.apache.zookeeper.server.persistence.FileTxnLog.append(FileTxnLog.java:219) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.append(FileTxnSnapLog.java:314) > at org.apache.zookeeper.server.ZKDatabase.append(ZKDatabase.java:470) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:140) > 2017-01-19 16:40:58,956 INFO [SyncThread:1] > o.a.z.s.ZooKeeperServerListenerImpl - Thread SyncThread:1 exits, error code 1 > 2017-01-19 16:40:58,956 INFO [SyncThread:1] o.a.z.s.SyncRequestProcessor - > SyncRequestProcessor exited! > 2017-01-19 16:40:58,957 INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:7181] > o.a.z.s.q.Learner - shutdown called > java.lang.Exception: shutdown Follower > at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:164) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:850) > {quote} > And, leader s3 keeps printing log like this: > {quote} > 2017-01-19 16:30:50,452 INFO [LearnerHandler-/192.168.40.51:35949] > o.a.z.s.q.LearnerHandler - Follower sid: 1 : info : > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@95258f0 > 2017-01-19 16:30:50,452 INFO [LearnerHandler-/192.168.40.51:35949] > o.a.z.s.q.LearnerHandler - Synchronizing with Follower sid: 1 > maxCommittedLog=0x320001019e minCommittedLog=0x32ffaa > peerLastZxid=0x23 > 2017-01-19 16:30:50,453 WARN [LearnerHandler-/192.168.40.51:35949] > o.a.z.s.q.LearnerHandler - Unhandled proposal scenario > 2017-01-19 16:30:50,453 INFO [LearnerHandler-/192.168.40.51:35949] > o.a.z.s.q.LearnerHandler - Sending SNAP > 2017-01-19 16:30:50,453 INFO [LearnerHandler-/192.168.40.51:35949] > o.a.z.s.q.LearnerHandler - Sending snapshot last zxid of peer is 0x23 > zxid of leader is 0x320001019esent zxid of db as 0x320001019e > 2017-01-19 16:30:50,461 INFO [LearnerHandler-/192.168.40.51:35949] > o.a.z.s.q.LearnerHandler - Received NEWLEADER-ACK message from 1 > 2017-01-19 16:30:51,738 ERROR [LearnerHandler-/192.168.40.51:35934] > o.a.z.s.q.LearnerHandler - Unexpected exception causing shutdown while sock > still open > java.net.SocketTimeoutException: Read timed out > at java.net.SocketInputStream.socketRead0(Native Method) > at java.net.SocketInputStream.read(SocketInputStream.java:152) > at java.net.SocketInputStream.read(SocketInputStream.java:122) > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > at java.io.BufferedInputStream.read(BufferedInputStream.java:254) > at java.io.DataInputStream.readInt(DataInputStream.java:387) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:99) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:542) > {quote} > we execute netstat, found lots of close wait socket in s2, and never closed. > {quote} > tcp6 10865 0 192.168.40.51:47181 192.168.40.57:7288 > CLOSE_WAIT 2217/java > tcp62576 0 192.168.40.51:57181
[jira] [Commented] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739750#comment-16739750 ] Brian Nixon commented on ZOOKEEPER-3240: This may be the same issue as detected in ZOOKEEPER-2669. The two share certain similarities but I haven't looked into it. > Close socket on Learner shutdown to avoid dangling socket > - > > Key: ZOOKEEPER-3240 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Minor > > There was a Learner that had two connections to the Leader after that Learner > hit an unexpected exception during flush txn to disk, which will shutdown > previous follower instance and restart a new one. > > {quote}2018-10-26 02:31:35,568 ERROR > [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from > thread : SyncThread:3 > java.io.IOException: Input/output error > at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at > java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72) > at > java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172) > 2018-10-26 02:31:35,568 INFO [SyncThread:3:ZooKeeperServerListenerImpl@42] - > Thread SyncThread:3 exits, error code 1 > 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - > SyncRequestProcessor exited!{quote} > > It is supposed to close the previous socket, but it doesn't seem to be done > anywhere in the code. This leaves the socket open with no one reading from > it, and caused the queue full and blocked on sender. > > Since the LearnerHandler didn't shutdown gracefully, the learner queue size > keeps growing, the JVM heap size on leader keeps growing and added pressure > to the GC, and cause high GC time and latency in the quorum. > > The simple fix is to gracefully shutdown the socket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3240) Close socket on Learner shutdown to avoid dangling socket
Brian Nixon created ZOOKEEPER-3240: -- Summary: Close socket on Learner shutdown to avoid dangling socket Key: ZOOKEEPER-3240 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3240 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon There was a Learner that had two connections to the Leader after that Learner hit an unexpected exception during flush txn to disk, which will shutdown previous follower instance and restart a new one. {quote}2018-10-26 02:31:35,568 ERROR [SyncThread:3:ZooKeeperCriticalThread@48] - Severe unrecoverable error, from thread : SyncThread:3 java.io.IOException: Input/output error at java.base/sun.nio.ch.FileDispatcherImpl.force0(Native Method) at java.base/sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:72) at java.base/sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:395) at org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:457) at org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:548) at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:769) at org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:246) at org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:172) 2018-10-26 02:31:35,568 INFO [SyncThread:3:ZooKeeperServerListenerImpl@42] - Thread SyncThread:3 exits, error code 1 2018-10-26 02:31:35,568 INFO [SyncThread:3:SyncRequestProcessor@234] - SyncRequestProcessor exited!{quote} It is supposed to close the previous socket, but it doesn't seem to be done anywhere in the code. This leaves the socket open with no one reading from it, and caused the queue full and blocked on sender. Since the LearnerHandler didn't shutdown gracefully, the learner queue size keeps growing, the JVM heap size on leader keeps growing and added pressure to the GC, and cause high GC time and latency in the quorum. The simple fix is to gracefully shutdown the socket. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3237) Allow IPv6 wildcard address in peer config
Brian Nixon created ZOOKEEPER-3237: -- Summary: Allow IPv6 wildcard address in peer config Key: ZOOKEEPER-3237 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3237 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon ZooKeeper allows a special exception for the IPv4 wildcard, 0.0.0.0, along with the loopback addresses. Extend the same treatment to IPv6's wildcard, [::]. Otherwise, reconfig will reject commands with the form [::]:. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3231) Purge task may lost data when we have many invalid snapshots.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736548#comment-16736548 ] Brian Nixon commented on ZOOKEEPER-3231: It might also make sense to more aggressively delete invalid snapshots (in the mode of ZOOKEEPER-3082). If its straightforward to identify and purge such files then we won't have to worry about deleting valid snapshots in order to preserve invalid snapshots. > Purge task may lost data when we have many invalid snapshots. > -- > > Key: ZOOKEEPER-3231 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3231 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.4, 3.4.13 >Reporter: Jiafu Jiang >Priority: Major > > I read the ZooKeeper source code, and I find the purge task use > FileTxnSnapLog#findNRecentSnapshots to find snapshots, but the method does > not check whether the snapshots are valid. > Consider a worse case, a ZooKeeper server may have many invalid snapshots, > and when a purge task begins, it will use the zxid in the last snapshot's > name to purge old snapshots and transaction logs, then we may lost data. > I think we should use FileSnap#findNValidSnapshots(int) instead of > FileSnap#findNRecentSnapshots in FileTxnSnapLog#findNRecentSnapshots, but I > am not sure. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3232) make the log of notification about LE more readable
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733820#comment-16733820 ] Brian Nixon commented on ZOOKEEPER-3232: seems reasonable to me > make the log of notification about LE more readable > --- > > Key: ZOOKEEPER-3232 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3232 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection >Reporter: maoling >Assignee: maoling >Priority: Minor > > the log of notification about LE is very important to help us to see the > process of LE:e.g. > {code:java} > 2019-01-01 16:29:27,494 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@595] - Notification: 1 (message > format version), 3 (n.leader), 0x60b3dc215 (n.zxid), 0x3 (n.round), FOLLOWING > (n.state), 1 (n.sid), 0x7 (n.peerEpoch) LOOKING (my state){code} > the current log have some problems: > 1:don't use the placeholder(other:+),don't in the style of k:v(antiman) > 2.the properties in the logs are very messed(no group by,no order), not easy > to read. > 3.the value about version is HEX which don't have the 0x prefix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3218) zk server reopened,the interval for observer connect to the new leader is too long,then session expired
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729186#comment-16729186 ] Brian Nixon commented on ZOOKEEPER-3218: We had similar issues which we addressed by making the polling interval configurable. Attaching our patch to this issue (it adds "zookeeper.fastleader.minNotificationInterval"). > zk server reopened,the interval for observer connect to the new leader is too > long,then session expired > --- > > Key: ZOOKEEPER-3218 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3218 > Project: ZooKeeper > Issue Type: Bug > Environment: win7 32bits > zookeeper 3.4.6、3.4.13 >Reporter: yangoofy >Priority: Major > > two participants、one observer,zkclient connect to observer。 > Then,close the two participants,the zookeeper server cloesed > Ten seconds later,reopen the two participants,and leader selected > > But the observer can't connect to the new leader immediately。Because in > lookForLeader, the observer use blockingQueue(recvqueue) to offer/poll > notifications,when the recvqueue is empty,poll from recvqueue will be > blocked,and timeout is 200ms,400ms,800ms60s。 > For example,09:59:59 observer poll notification,recvqueue was empty and > timeout was 60s;10:00:00 two participants reopened and reselected;10:00:59 > observer polled notification,connected to the new leader > But the maxSessionTimeout default to 40s。The session expired > - > Please improve it:observer should connect to the new leader as soon as > possible -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729136#comment-16729136 ] Brian Nixon commented on ZOOKEEPER-2872: Now that the patch is merged, was there any further work here? > Interrupted snapshot sync causes data loss > -- > > Key: ZOOKEEPER-2872 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.10, 3.5.3, 3.6.0 >Reporter: Brian Nixon >Priority: Major > > There is a way for observers to permanently lose data from their local data > tree while remaining members of good standing with the ensemble and > continuing to serve client traffic when the following chain of events occurs. > 1. The observer dies in epoch N from machine failure. > 2. The observer comes back up in epoch N+1 and requests a snapshot sync to > catch up. > 3. The machine powers off before the snapshot is synced to disc and after > some txn's have been logged (depending on the OS, this can happen!). > 4. The observer comes back a second time and replays its most recent snapshot > (epoch <= N) as well as the txn logs (epoch N+1). > 5. A diff sync is requested from the leader and the observer broadcasts > availability. > In this scenario, any commits from epoch N that the observer did not receive > before it died the first time will never be exposed to the observer and no > part of the ensemble will complain. > This situation is not unique to observers and can happen to any learner. As a > simple fix, fsync-ing the snapshots received from the leader will avoid the > case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3197) Improve documentation in ZooKeeperServer.superSecret
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729138#comment-16729138 ] Brian Nixon commented on ZOOKEEPER-3197: Password is probably the wrong term for this variable (though it does suggest some potential future work). It's more of a checksum that's used in reconnection, carries no security weight, and is treated internally as if it carries no security weight. [~breed] might be the only one left who knows the full story (it's telling that the secret decodes to "Ben is Cool"). > Improve documentation in ZooKeeperServer.superSecret > > > Key: ZOOKEEPER-3197 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3197 > Project: ZooKeeper > Issue Type: Task >Reporter: Colm O hEigeartaigh >Priority: Trivial > > A security scan flagged the use of a hard-coded secret > (ZooKeeperServer.superSecret) in conjunction with a java Random instance to > generate a password: > byte[] generatePasswd(long id) > { Random r = new Random(id ^ superSecret); byte p[] = > new byte[16]; r.nextBytes(p); return p; } > superSecret has the following javadoc: > /** > * This is the secret that we use to generate passwords, for the moment it > * is more of a sanity check. > */ > It is unclear from this comment and looking at the code why it is not a > security risk. It would be good to update the javadoc along the lines of > "Using a hard-coded secret with Random to generate a password is not a > security risk because the resulting passwords are used for X, Y, Z and not > for authentication etc" or something would be very helpful for anyone else > looking at the code. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3220) The snapshot is not saved to disk and may cause data inconsistency.
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729134#comment-16729134 ] Brian Nixon commented on ZOOKEEPER-3220: I believe ZOOKEEPER-2872 addressed the fsyncing part of this issue and ZOOKEEPER-3082 added some nice cleanup around 0 size snapshot file. Neither of these changes were backported to 3.4 so that suggests one potential path forward. Note that backporting ZOOKEEPER-2872 also requires backporting ZOOKEEPER-2870. > The snapshot is not saved to disk and may cause data inconsistency. > --- > > Key: ZOOKEEPER-3220 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3220 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.12, 3.4.13 >Reporter: Jiafu Jiang >Priority: Critical > > We known that ZooKeeper server will call fsync to make sure that log data has > been successfully saved to disk. But ZooKeeper server does not call fsync to > make sure that a snapshot has been successfully saved, which may cause > potential problems. Since a close to a file description does not make sure > that data is written to disk, see > [http://man7.org/linux/man-pages/man2/close.2.html#notes] for more details. > > If the snapshot is not successfully saved to disk, it may lead to data > inconsistency. Here is my example, which is also a real problem I have ever > met. > 1. I deployed a 3-node ZooKeeper cluster: zk1, zk2, and zk3, zk2 was the > leader. > 2. Both zk1 and zk2 had the log records from log1~logX, X was the zxid. > 3. The machine of zk1 restarted, and during the reboot, log(X+1) ~ log Y are > saved to log files of both zk2(leader) and zk3(follower). > 4. After zk1 restarted successfully, it found itself to be a follower, and it > began to synchronize data with the leader. The leader sent a snapshot(records > from log 1 ~ log Y) to zk1, zk1 then saved the snapshot to local disk by > calling the method ZooKeeperServer.takeSnapshot. But unfortunately, when the > method returned, the snapshot data was not saved to disk yet. In fact the > snapshot file was created, but the size was 0. > 5. zk1 finished the synchronization and began to accept new requests from the > leader. Say log records from log(Y + 1) ~ log Z were accepted by zk1 and > saved to log file. With fsync zk1 could make sure log data was not lost. > 6. zk1 restarted again. Since the snapshot's size was 0, it would not be > used, therefore zk1 recovered using the log files. But the records from > log(X+1) ~ logY were lost ! > > Sorry for my poor English. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3140) Allow Followers to host Observers
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648379#comment-16648379 ] Brian Nixon commented on ZOOKEEPER-3140: A note on future work - it would be cool to see the serialized format of the QuorumVerifier (used for the dynamic config files and the like) updated to a more extensible form so we can track more topology and port information through it. This would give us a lot more flexibility in setting and propagating the observer master port, in particular in letting each server publish its own port instead of requiring a single static port for the whole ensemble. Would also be useful for purposes such as ZOOKEEPER-3166. I thought there was an existent Jira on this requested change but didn't see one in a cursory search. I will create one specifically eventually, if I get the time. :) > Allow Followers to host Observers > - > > Key: ZOOKEEPER-3140 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3140 > Project: ZooKeeper > Issue Type: New Feature > Components: server >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Minor > Labels: pull-request-available > Time Spent: 5h > Remaining Estimate: 0h > > Observers function simple as non-voting members of the ensemble, sharing the > Learner interface with Followers and holding only a slightly difference > internal pipeline. Both maintain connections along the quorum port with the > Leader by which they learn of all new proposals on the ensemble. > > There are benefits to allowing Observers to connect to the Followers to plug > into the commit stream in addition to connecting to the Leader. It shifts the > burden of supporting Observers off the Leader and allow it to focus on > coordinating the commit of writes. This means better performance when the > Leader is under high load, particularly high network load such as can happen > after a leader election when many Learners need to sync. It also reduces the > total network connections maintained on the Leader when there are a high > number of observers. One the other end, Observer availability is improved > since it will take shorter time for a high number of Observers to finish > syncing and start serving client traffic. > > The current implementation only supports scaling the number of Observers > into the hundreds before performance begins to degrade. By opening up > Followers to also host Observers, over a thousand observers can be hosted on > a typical ensemble without major negative impact under both normal operation > and during post-leader election sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3166) Support changing secure port with reconfig
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648369#comment-16648369 ] Brian Nixon commented on ZOOKEEPER-3166: It would be cool to see the serialized format of the QuorumVerifier (used for the dynamic config files and the like) updated to a more extensible form so we can track more topology and port information through it. A key-value json blob is one such solution. > Support changing secure port with reconfig > -- > > Key: ZOOKEEPER-3166 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3166 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Minor > > The reconfig operation supports changing the plaintext client port and client > address but, because the secure client port is not encoded in the > QuorumVerifier serialization, the secure client port cannot be changed by > similar means. Instead, this information can only be changed in the static > configuration files and only viewed there. > Flagging as a place where there's not feature parity between secure client > ports and plaintext client ports. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ZOOKEEPER-3166) Support changing secure port with reconfig
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648369#comment-16648369 ] Brian Nixon edited comment on ZOOKEEPER-3166 at 10/12/18 7:45 PM: -- It would be cool to see the serialized format of the QuorumVerifier (used for the dynamic config files and the like) updated to a more extensible form so we can track more topology and port information through it. A key-value json blob is one such solution and would also be useful for ZOOKEEPER-3140. was (Author: nixon): It would be cool to see the serialized format of the QuorumVerifier (used for the dynamic config files and the like) updated to a more extensible form so we can track more topology and port information through it. A key-value json blob is one such solution. > Support changing secure port with reconfig > -- > > Key: ZOOKEEPER-3166 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3166 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum >Affects Versions: 3.6.0 >Reporter: Brian Nixon >Priority: Minor > > The reconfig operation supports changing the plaintext client port and client > address but, because the secure client port is not encoded in the > QuorumVerifier serialization, the secure client port cannot be changed by > similar means. Instead, this information can only be changed in the static > configuration files and only viewed there. > Flagging as a place where there's not feature parity between secure client > ports and plaintext client ports. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3166) Support changing secure port with reconfig
Brian Nixon created ZOOKEEPER-3166: -- Summary: Support changing secure port with reconfig Key: ZOOKEEPER-3166 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3166 Project: ZooKeeper Issue Type: Improvement Components: quorum Affects Versions: 3.6.0 Reporter: Brian Nixon The reconfig operation supports changing the plaintext client port and client address but, because the secure client port is not encoded in the QuorumVerifier serialization, the secure client port cannot be changed by similar means. Instead, this information can only be changed in the static configuration files and only viewed there. Flagging as a place where there's not feature parity between secure client ports and plaintext client ports. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3142) Extend SnapshotFormatter to dump data in json format
Brian Nixon created ZOOKEEPER-3142: -- Summary: Extend SnapshotFormatter to dump data in json format Key: ZOOKEEPER-3142 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3142 Project: ZooKeeper Issue Type: Improvement Affects Versions: 3.6.0 Reporter: Brian Nixon Json format can be chained into other tools such as ncdu. Extend the SnapshotFormatter functionality to dump json. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3140) Allow Followers to host Observers
Brian Nixon created ZOOKEEPER-3140: -- Summary: Allow Followers to host Observers Key: ZOOKEEPER-3140 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3140 Project: ZooKeeper Issue Type: New Feature Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Observers function simple as non-voting members of the ensemble, sharing the Learner interface with Followers and holding only a slightly difference internal pipeline. Both maintain connections along the quorum port with the Leader by which they learn of all new proposals on the ensemble. There are benefits to allowing Observers to connect to the Followers to plug into the commit stream in addition to connecting to the Leader. It shifts the burden of supporting Observers off the Leader and allow it to focus on coordinating the commit of writes. This means better performance when the Leader is under high load, particularly high network load such as can happen after a leader election when many Learners need to sync. It also reduces the total network connections maintained on the Leader when there are a high number of observers. One the other end, Observer availability is improved since it will take shorter time for a high number of Observers to finish syncing and start serving client traffic. The current implementation only supports scaling the number of Observers into the hundreds before performance begins to degrade. By opening up Followers to also host Observers, over a thousand observers can be hosted on a typical ensemble without major negative impact under both normal operation and during post-leader election sync. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3137) add a utility to truncate logs to a zxid
Brian Nixon created ZOOKEEPER-3137: -- Summary: add a utility to truncate logs to a zxid Key: ZOOKEEPER-3137 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3137 Project: ZooKeeper Issue Type: New Feature Affects Versions: 3.6.0 Reporter: Brian Nixon Add a utility that allows an admin to truncate a given transaction log to a specified zxid. This can be similar to the existent LogFormatter. Among the benefits, this allows an admin to put together a point-in-time view of a data tree by manually mutating files from a saved backup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3131) org.apache.zookeeper.server.WatchManager resource leak
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595645#comment-16595645 ] Brian Nixon commented on ZOOKEEPER-3131: I'm not sure I follow what you're proposing. Do you want to put up a pull request with your suggested changes and we can discuss there? > org.apache.zookeeper.server.WatchManager resource leak > -- > > Key: ZOOKEEPER-3131 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3131 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3, 3.5.4 > Environment: -Xmx512m >Reporter: ChaoWang >Priority: Major > > In some cases, the variable _watch2Paths_ in _Class WatchManager_ does not > remove the entry, even if the associated value "HashSet" is empty already. > The type of key in Map _watch2Paths_ is Watcher, instance of > _NettyServerCnxn._ If it is not removed when the associated set of paths is > empty, it will cause the memory increases little by little, and > OutOfMemoryError triggered finally. > > {color:#FF}*Possible Solution:*{color} > In the following function, the logic should be added to remove the entry. > org.apache.zookeeper.server.WatchManager#removeWatcher(java.lang.String, > org.apache.zookeeper.Watcher) > if (paths.isEmpty()) > { watch2Paths.remove(watcher); } > For the following function as well: > org.apache.zookeeper.server.WatchManager#triggerWatch(java.lang.String, > org.apache.zookeeper.Watcher.Event.EventType, > java.util.Set) > > Please confirm this issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-706) large numbers of watches can cause session re-establishment to fail
[ https://issues.apache.org/jira/browse/ZOOKEEPER-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595642#comment-16595642 ] Brian Nixon commented on ZOOKEEPER-706: --- I agree that the C client is still vulnerable to this - please do put up a patch if you have the time. > large numbers of watches can cause session re-establishment to fail > --- > > Key: ZOOKEEPER-706 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-706 > Project: ZooKeeper > Issue Type: Bug > Components: c client, java client >Affects Versions: 3.1.2, 3.2.2, 3.3.0 >Reporter: Patrick Hunt >Assignee: Chris Thunes >Priority: Critical > Fix For: 3.4.7, 3.5.2, 3.6.0 > > Attachments: ZOOKEEPER-706-branch-34.patch, > ZOOKEEPER-706-branch-34.patch, ZOOKEEPER-706.patch, ZOOKEEPER-706.patch, > ZOOKEEPER-706.patch > > > If a client sets a large number of watches the "set watches" operation during > session re-establishment can fail. > for example: > WARN [NIOServerCxn.Factory:22801:NIOServerCnxn@417] - Exception causing > close of session 0xe727001201a4ee7c due to java.io.IOException: Len error > 4348380 > in this case the client was a web monitoring app and had set both data and > child watches on > 32k znodes. > there are two issues I see here we need to fix: > 1) handle this case properly (split up the set watches into multiple calls I > guess...) > 2) the session should have expired after the "timeout". however we seem to > consider any message from the client as re-setting the expiration on the > server side. Probably we should only consider messages from the client that > are sent during an established session, otherwise we can see this situation > where the session is not established however the session is not expired > either. Perhaps we should create another JIRA for this particular issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3115) Delete snapshot file on error
Brian Nixon created ZOOKEEPER-3115: -- Summary: Delete snapshot file on error Key: ZOOKEEPER-3115 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3115 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon ZOOKEEPER-3082 guards against one particular failure mode that can cause a corrupt snapshot, when a empty file is created with a valid snapshot file name. All other instances of IOException when writing the snapshot are simply allowed to propagate up the stack. One idea that came up during review ([https://github.com/apache/zookeeper/pull/560)] was whether we would ever want to leave a snapshot file on disk when an IOException is thrown. Clearly something has gone wrong at this point and rather than leave a potentially corrupt file, we can delete it and trust the transaction log when restoring the necessary transactions. It would be great to modify FileTxnSnapLog::save to delete snapshot files more often on exceptions - provided that there's a way to identify when the file in that case is needed or corrupt. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565860#comment-16565860 ] Brian Nixon commented on ZOOKEEPER-3082: [~andorm] my (possibly incorrect) read on ZOOKEEPER-1621 is that the issue is related to this one but not strictly a subset. Here we've removed the possibility of the snapshot side of recovery being lost during a disk-full event. There, the issue seems to be in ensuring the transaction log side of recovery is not corrupted by writing empty/incomplete log files. That issue will continue to be present even with the patch from this file applied. > Fix server snapshot behavior when out of disk space > --- > > Key: ZOOKEEPER-3082 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.6.0, 3.4.12, 3.5.5 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Minor > Labels: pull-request-available > Fix For: 3.6.0 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > When the ZK server tries to make a snapshot and the machine is out of disk > space, the snapshot creation fails and throws an IOException. An empty > snapshot file is created, (probably because the server is able to create an > entry in the dir) but is not able to write to the file. > > If snapshot creation fails, the server commits suicide. When it restarts, it > will do so from the last known good snapshot. However, when it tries to make > a snapshot again, the same thing happens. This results in lots of empty > snapshot files being created. If eventually the DataDirCleanupManager garbage > collects the good snapshot files then only the empty files remain. At this > point, the server is well and truly screwed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3108) deprecated myid file and use a new property "server.id" in the zoo.cfg
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16565814#comment-16565814 ] Brian Nixon commented on ZOOKEEPER-3108: This seems like a good idea to me (provided myid files are still supported) to give admins a bit more flexibility. One reason I can think of to keep using a separate myid file is that the server id is the one property guaranteed to be unique for a given peer across the ensemble. All other properties and jvm flags may be identical across every instance. This makes reasoning about configuration files very easy - one simply propagates the same file everywhere and no custom logic is needed when comparing them. Here's a link to an old discussion around myid -> http://zookeeper-user.578899.n2.nabble.com/The-idea-behind-myid-td3711269.html > deprecated myid file and use a new property "server.id" in the zoo.cfg > --- > > Key: ZOOKEEPER-3108 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3108 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.5.0 >Reporter: maoling >Assignee: maoling >Priority: Major > > When use zk in distributional model,we need to touch a myid file in > dataDir.then write a unique number to it.It is inconvenient and not > user-friendly,Look at an example from other distribution system such as > kafka:it just uses broker.id=0 in the server.properties to indentify a unique > server node.This issue is going to abandon the myid file and use a new > property such as server.id=0 in the zoo.cfg. this fix will be applied to > master branch,branch-3.5+, > keep branch-3.4 unchaged. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3083) Remove some redundant and noisy log lines
Brian Nixon created ZOOKEEPER-3083: -- Summary: Remove some redundant and noisy log lines Key: ZOOKEEPER-3083 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3083 Project: ZooKeeper Issue Type: Improvement Components: server Affects Versions: 3.6.0 Reporter: Brian Nixon Under high client turnover, some log lines around client activity generate an outsized amount of noise in the log files. Reducing a few to debug level won't cause a big hit on admin understanding as there are redundant elements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3082) Fix server snapshot behavior when out of disk space
Brian Nixon created ZOOKEEPER-3082: -- Summary: Fix server snapshot behavior when out of disk space Key: ZOOKEEPER-3082 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3082 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.4.12, 3.6.0, 3.5.5 Reporter: Brian Nixon When the ZK server tries to make a snapshot and the machine is out of disk space, the snapshot creation fails and throws an IOException. An empty snapshot file is created, (probably because the server is able to create an entry in the dir) but is not able to write to the file. If snapshot creation fails, the server commits suicide. When it restarts, it will do so from the last known good snapshot. However, when it tries to make a snapshot again, the same thing happens. This results in lots of empty snapshot files being created. If eventually the DataDirCleanupManager garbage collects the good snapshot files then only the empty files remain. At this point, the server is well and truly screwed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-3068) Improve C client logging of IPv6 hosts
Brian Nixon created ZOOKEEPER-3068: -- Summary: Improve C client logging of IPv6 hosts Key: ZOOKEEPER-3068 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3068 Project: ZooKeeper Issue Type: Improvement Components: c client Affects Versions: 3.6.0 Reporter: Brian Nixon Assignee: Brian Nixon The C client formats host-port pairings as [host:port] when logging. This is visually confusing when the host is an IPv6 address (see the below). In that case, it would be preferable to cleanly separate the IPv6 from the port. {code:java} ZOO_INFO@check_events@2736: initiated connection to server [2401:db00:1020:40bf:face:0:5:0:2181]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16519839#comment-16519839 ] Brian Nixon commented on ZOOKEEPER-3056: As a work-around for anyone currently blocked on this issue, I've uploaded a empty snapshot file here. To perform an upgrade (3.4 -> 3.5): * download the "snapshot.0" file attached * copy it to the versioned directory (e.g. "version-2") within your data directory (parameter "dataDir" in your config - this is the directory containing the "myid" file for a peer) * restart the peer * upgrade the peer (this can be combined with the above step if you like) > Fails to load database with missing snapshot file but valid transaction log > file > > > Key: ZOOKEEPER-3056 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3, 3.5.4 >Reporter: Michael Han >Priority: Critical > Attachments: snapshot.0 > > > [An > issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E] > was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing > snapshot file. > The code complains about missing snapshot file is > [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206] > which is introduced as part of ZOOKEEPER-2325. > With this check, ZK will not load the db without a snapshot file, even the > transaction log files are present and valid. This could be a problem for > restoring a ZK instance which does not have a snapshot file but have a sound > state (e.g. it crashes before being able to take the first snap shot with a > large snapCount parameter configured). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brian Nixon updated ZOOKEEPER-3056: --- Attachment: snapshot.0 > Fails to load database with missing snapshot file but valid transaction log > file > > > Key: ZOOKEEPER-3056 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3, 3.5.4 >Reporter: Michael Han >Priority: Critical > Attachments: snapshot.0 > > > [An > issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E] > was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing > snapshot file. > The code complains about missing snapshot file is > [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206] > which is introduced as part of ZOOKEEPER-2325. > With this check, ZK will not load the db without a snapshot file, even the > transaction log files are present and valid. This could be a problem for > restoring a ZK instance which does not have a snapshot file but have a sound > state (e.g. it crashes before being able to take the first snap shot with a > large snapCount parameter configured). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2873) print error and/or abort on invalid server definition
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516514#comment-16516514 ] Brian Nixon commented on ZOOKEEPER-2873: With no one commenting in almost a year, this issue strikes me as fair game for anyone to patch. > print error and/or abort on invalid server definition > - > > Key: ZOOKEEPER-2873 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2873 > Project: ZooKeeper > Issue Type: Improvement > Components: server >Affects Versions: 3.4.10 >Reporter: Christopher Smith >Assignee: Mark Fenes >Priority: Minor > > While bringing up a new cluster, I managed to fat-finger a sed script and put > some lines like this into my config file: > {code} > server.1=zookeeper1:2888:2888 > {code} > This led to a predictable spew of error messages when the client and election > components fought over the single port. Since a configuration of this case is > *always* an error, I suggest that it would be sensible to abort the server > startup if an entry is found with the same port for both client and election. > (Logging the error explicitly without shutting down is less helpful because > of how fast the logs pile up.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2421) testSessionReuse is commented out
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16507670#comment-16507670 ] Brian Nixon commented on ZOOKEEPER-2421: [~prasanthm] This is an ancient test so someone who was involved in the project pre-2008 may have to provide some context around it and it's purpose. That's not me but I can say a bit after looking at the SessionTest file. Session id reuse as such is not allowed in current ZooKeeper. There are two ways that this test could now go. One is to make sure that the second client can *not* use that session id, that there is no state retained server-side that allows a reuse after close. The second is to change it to a session moved style test but I think this scenario is already covered in testSession and testSessionMove. If you don't see a useful way of reintroducing the test after a bit of poking, I'd say to put up a pull request removing it entirely and see if it gets accepted - it simply may no longer be meaningful. > testSessionReuse is commented out > - > > Key: ZOOKEEPER-2421 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2421 > Project: ZooKeeper > Issue Type: Bug >Reporter: Flavio Junqueira >Assignee: Prasanth Mathialagan >Priority: Major > > This test case in SessionTest: > {noformat} >testSessionReuse > {noformat} > is commented out. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506687#comment-16506687 ] Brian Nixon commented on ZOOKEEPER-3056: [~mmerli] That's a very reasonable concern and I'd ideally have all upgrades be seamless in exactly the way you describe. Property gating the validation is only undesirable from a proliferation of config point of view. [~hanm] I think the signal file is a very workable approach and pretty straightforward to implement. The first intervention that I scoped out (create a snapshot.0) was inspired by yours as it simplifies the path of "signal file" to "database load with trust in the transaction log" to "create snapshot, delete signal file". -- It's a trade-off between admin time and server side code complexity for sure. In order of decreasing seamlessness/admin time: * property flag snapshot validation (default off) * property flag snapshot validation (default on) * signal file * admin script to create a snapshot.0 file in the snapshot directory * upgrade notes to create a snapshot.0 file in the snapshot directory For the use cases that we maintain, it's far more likely that being unable to load a snapshot indicates corruption or machine malfeasance than a legitimate database so I'd like to expand that impression with more information from the community. Is a snapshot-less db expected/unremarkable under some reasonable workloads or is it something worth (politely) discouraging? I do believe ZOOKEEPER-2325 is a good feature and it would be a shame to set it off by default. > Fails to load database with missing snapshot file but valid transaction log > file > > > Key: ZOOKEEPER-3056 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3, 3.5.4 >Reporter: Michael Han >Priority: Critical > > [An > issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E] > was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing > snapshot file. > The code complains about missing snapshot file is > [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206] > which is introduced as part of ZOOKEEPER-2325. > With this check, ZK will not load the db without a snapshot file, even the > transaction log files are present and valid. This could be a problem for > restoring a ZK instance which does not have a snapshot file but have a sound > state (e.g. it crashes before being able to take the first snap shot with a > large snapCount parameter configured). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-3056) Fails to load database with missing snapshot file but valid transaction log file
[ https://issues.apache.org/jira/browse/ZOOKEEPER-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16506419#comment-16506419 ] Brian Nixon commented on ZOOKEEPER-3056: We have not run an ensemble without some form of ZOOKEEPER-2325 in years, as such we always have snapshots available and ZooKeeper being unable to load a valid snapshot is a sign that something is very wrong. >From my read on the mail thread there are two questions that we're trying to >answer: - how to update ensembles without snapshots from a pre-2325 to a post-2325 state - what constitutes a stable db (and what role a snapshot plays in that) The second ought to take more thought so I'll follow up on that after considering it. Two possible interventions on the first: - the base snapshot is very small and simple, one could copy/create a snapshot.0 file to the appropriate directory before upgrade - property gate the entire "-1L == deserializeResult" conditional block in 3.4, 3.5, and master to allow a snapshot-less db. To the extent that we agree that snapshot-less is a degenerate mode, we also add a 4 letter or admin command to create a snapshot on demand (allowing the admin to quickly move out of this state post-upgrade) > Fails to load database with missing snapshot file but valid transaction log > file > > > Key: ZOOKEEPER-3056 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3056 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.3, 3.5.4 >Reporter: Michael Han >Priority: Critical > > [An > issue|https://lists.apache.org/thread.html/cc17af6ef05d42318f74148f1a704f16934d1253f14721a93b4b@%3Cdev.zookeeper.apache.org%3E] > was reported when a user failed to upgrade from 3.4.10 to 3.5.4 with missing > snapshot file. > The code complains about missing snapshot file is > [here|https://github.com/apache/zookeeper/blob/release-3.5.4/src/java/main/org/apache/zookeeper/server/persistence/FileTxnSnapLog.java#L206] > which is introduced as part of ZOOKEEPER-2325. > With this check, ZK will not load the db without a snapshot file, even the > transaction log files are present and valid. This could be a problem for > restoring a ZK instance which does not have a snapshot file but have a sound > state (e.g. it crashes before being able to take the first snap shot with a > large snapCount parameter configured). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2988) NPE triggered if server receives a vote for a server id not in their voting view
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16458844#comment-16458844 ] Brian Nixon commented on ZOOKEEPER-2988: [~hanm] thanks for merging it to 3.5 too, I'll shut down that second pr. This bug is applicable to 3.4 as well - imo it's a worse danger on that branch since its easy for configuration files to be stale. I'll fix up pr 478 for the 3.4 branch to reflect the comments I got on pr 476 so it can be ready for review. > NPE triggered if server receives a vote for a server id not in their voting > view > > > Key: ZOOKEEPER-2988 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2988 > Project: ZooKeeper > Issue Type: Bug > Components: leaderElection >Affects Versions: 3.5.3, 3.4.11 >Reporter: Brian Nixon >Assignee: Brian Nixon >Priority: Minor > Fix For: 3.5.4, 3.6.0 > > > We've observed the following behavior in elections when a node is lagging > behind the quorum in its view of the ensemble topology. > - Node A is operating with node B in its voting view, but without view of > node C. > - B votes for C. > - A then switches its vote to C, but throws a NPE when attempting to connect. > This causes the QuorumPeer to spin up a Follower only to immediately have it > shutdown by the exception. > Ideally, A would not advertise a vote for a server that it will not follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ZOOKEEPER-2988) NPE triggered if server receives a vote for a server id not in their voting view
Brian Nixon created ZOOKEEPER-2988: -- Summary: NPE triggered if server receives a vote for a server id not in their voting view Key: ZOOKEEPER-2988 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2988 Project: ZooKeeper Issue Type: Bug Components: leaderElection Affects Versions: 3.4.11, 3.5.3 Reporter: Brian Nixon We've observed the following behavior in elections when a node is lagging behind the quorum in its view of the ensemble topology. - Node A is operating with node B in its voting view, but without view of node C. - B votes for C. - A then switches its vote to C, but throws a NPE when attempting to connect. This causes the QuorumPeer to spin up a Follower only to immediately have it shutdown by the exception. Ideally, A would not advertise a vote for a server that it will not follow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ZOOKEEPER-2357) Unhandled errors propagating through cluster
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239715#comment-16239715 ] Brian Nixon commented on ZOOKEEPER-2357: Not an expert but a couple things pop out to me. One, the WARN messages are what you expect when a follower loses contact with the leader. Two, 50 seconds to sync the txn log is a long time. I don't know what the SyncThread of the FileTxnLog is blocking but it could be the case that the data load is impacting the server-server communication. > Unhandled errors propagating through cluster > > > Key: ZOOKEEPER-2357 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2357 > Project: ZooKeeper > Issue Type: Task > Components: leaderElection, quorum, server >Affects Versions: 3.4.6 >Reporter: Gareth Humphries >Priority: Minor > > Hi, > I need some help understanding a recurring problem we're seeing with our > zookeeper cluster. It's a five node cluster that ordinarily runs fine. > Occasionally we see an error from which the cluster recovers, but it causes a > lot of grief and I'm sure is representative of an unhealthy situation. > To my eye it looks like an invalid bit of data getting into the system and > not being handled gracefully; I'm the first to say my eye is not expert > though, so I humbly submit an annotated log exert in the hope some who knows > more than me can provide some illumination. > The cluster seems to be ticking along fine, until we get errors on 2 of the 5 > nodes like so: > 2016-01-19 13:12:49,698 - WARN [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@89] > - Exception when following the leader > java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103) > at > org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153) > at > org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786) > 2016-01-19 13:12:49,698 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:Follower@166] - shutdown called > java.lang.Exception: shutdown Follower > at > org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166) > at > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790) > This is immediately followed by 380 occurences of: > 2016-01-19 13:12:49,699 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket > connection for client /X.Y.Z.56:59028 which had sessionid 0x151b01ee8330234 > and a: > 2016-01-19 13:12:49,766 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:FollowerZooKeeperServer@139] - Shutting down > 2016-01-19 13:12:49,766 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:ZooKeeperServer@441] - shutting down > 2016-01-19 13:12:49,766 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:FollowerRequestProcessor@105] - Shutting down > 2016-01-19 13:12:49,766 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:CommitProcessor@181] - Shutting down > 2016-01-19 13:12:49,766 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:FinalRequestProcessor@415] - shutdown of > request processor complete > 2016-01-19 13:12:49,767 - INFO > [QuorumPeer[myid=1]/0.0.0.0:2181:SyncRequestProcessor@209] - Shutting down > 2016-01-19 13:12:49,767 - INFO [CommitProcessor:1:CommitProcessor@150] - > CommitProcessor exited loop! > 2016-01-19 13:12:49,767 - INFO > [FollowerRequestProcessor:1:FollowerRequestProcessor@95] - > FollowerRequestProcessor exited loop! > 2016-01-19 13:13:09,418 - WARN [SyncThread:1:FileTxnLog@334] - fsync-ing the > write ahead log in SyncThread:1 took 30334ms which will adversely effect > operation latency. See the ZooKeeper troubleshooting guide > 2016-01-19 13:13:09,427 - WARN [SyncThread:1:SendAckRequestProcessor@64] - > Closing connection to leader, exception during packet send > java.net.SocketException: Socket closed > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:121) > at java.net.SocketOutputStream.write(SocketOutputStream.java:159) > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > at > org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:139) > at > org.apache.zookeeper.server.quorum.SendAckRequestProcessor.flush(SendAckRequestProcessor.java:62) > at >
[jira] [Commented] (ZOOKEEPER-2773) zookeeper-service
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239709#comment-16239709 ] Brian Nixon commented on ZOOKEEPER-2773: there's not enough information here to go on. Did you set ZOO_LOG_DIR and can you provide the zookeeper logs? > zookeeper-service > - > > Key: ZOOKEEPER-2773 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2773 > Project: ZooKeeper > Issue Type: Bug > Components: quorum >Affects Versions: 3.4.10 > Environment: Linux >Reporter: Ashwath > Labels: beginner > Fix For: 3.4.10 > > > Hi > I run zookeeper in 3 Linux Machines. > 1.I downloaded zookeeper-3.4.10.jar file and extracted that. > 2.I copy zoo_sample to zoo.cfg and edited datadir and added 3 ip address. > 3.I created a new file called myid and insert numbers into that. > Now I am running zookeeper cluster successfully..but > When I am trying to run it as a service I am getting following error > zookeeper.service - Apache ZooKeeper > Loaded: loaded (/lib/systemd/system/zookeeper.service; disabled; vendor > preset: enabled) > Active: activating (auto-restart) (Result: exit-code) since Wed 2017-05-03 > 09:56:28 IST; 1s ago > Process: 678 ExecStart=/home/melon/software/ZooKeeper/zk/bin/zkServer.sh > start-foreground (code=exited > Main PID: 678 (code=exited, status=127) > May 03 09:56:28 deds14 systemd[1]: zookeeper.service: Unit entered failed > state. > May 03 09:56:28 deds14 systemd[1]: zookeeper.service: Failed with result > 'exit-code'. > Here the code I added > Unit] > Description=Apache ZooKeeper > After=network.target > ConditionPathExists=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf/zoo.cfg > ConditionPathExists=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf/log4j.properties > [Service] > Environment="ZOOCFGDIR=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/conf" > SyslogIdentifier=zookeeper > WorkingDirectory=/home/melon/software/ZooKeeper > ExecStart=/home/melon/software/ZooKeeper/zookeeper-3.4.10-beta/bin/zkServer.sh > start-foreground > Restart=on-failure > RestartSec=20 > User=root > Group=root > Thank you -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ZOOKEEPER-2872) Interrupted snapshot sync causes data loss
Brian Nixon created ZOOKEEPER-2872: -- Summary: Interrupted snapshot sync causes data loss Key: ZOOKEEPER-2872 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2872 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.3, 3.4.10, 3.6.0 Reporter: Brian Nixon There is a way for observers to permanently lose data from their local data tree while remaining members of good standing with the ensemble and continuing to serve client traffic when the following chain of events occurs. 1. The observer dies in epoch N from machine failure. 2. The observer comes back up in epoch N+1 and requests a snapshot sync to catch up. 3. The machine powers off before the snapshot is synced to disc and after some txn's have been logged (depending on the OS, this can happen!). 4. The observer comes back a second time and replays its most recent snapshot (epoch <= N) as well as the txn logs (epoch N+1). 5. A diff sync is requested from the leader and the observer broadcasts availability. In this scenario, any commits from epoch N that the observer did not receive before it died the first time will never be exposed to the observer and no part of the ensemble will complain. This situation is not unique to observers and can happen to any learner. As a simple fix, fsync-ing the snapshots received from the leader will avoid the case of missing snapshots causing data loss. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ZOOKEEPER-2723) ConnectStringParser does not parse correctly if quorum string has znode path
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973457#comment-15973457 ] Brian Nixon commented on ZOOKEEPER-2723: I'm not seeing a reference to ConnectStringParser in the attached stack trace - it looks like a DNS resolution problem. Can you add more detail? > ConnectStringParser does not parse correctly if quorum string has znode path > > > Key: ZOOKEEPER-2723 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2723 > Project: ZooKeeper > Issue Type: Bug >Reporter: Vishal Khandelwal >Assignee: Vishal Khandelwal > > f2017-03-14 07:10:26,247 INFO [main] zookeeper.ZooKeeper - Initiating client > connection, > connectString=x1-1-was.ops.sfdc.net:2181,x2-1-was.ops.sfdc.net:2181,x3-1-was.ops.sfdc.net:2181,x4-1-was.ops.sfdc.net:2181,x5-1-was.ops.sfdc.net:2181:/hbase > sessionTimeout=6 > watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@6e16b8b5 2017-03-14 > 07:10:26,250 ERROR [main] client.StaticHostProvider - Unable to connect to > server: x5-1-was.ops.sfdc.net:2181:2181 java.net.UnknownHostException: > x5-1-was.ops.sfdc.net:2181: Name or service not known at > java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at > java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at > java.net.InetAddress.getAllByName0(InetAddress.java:1276) at > java.net.InetAddress.getAllByName(InetAddress.java:1192) at > java.net.InetAddress.getAllByName(InetAddress.java:1126) at > org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:446) at > org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.checkZk(RecoverableZooKeeper.java:141) > at > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.(RecoverableZooKeeper.java:128) > at org.apache.hadoop.hbase.zookeeper.ZKUtil.connect(ZKUtil.java:135) at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:173) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.(ZooKeeperWatcher.java:147) > at > org.apache.hadoop.hbase.client.ZooKeeperKeepAliveConnection.(ZooKeeperKeepAliveConnection.java:43) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveZooKeeperWatcher(HConnectionManager.java:1875) > at > org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:82) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:929) > at > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.(HConnectionManager.java:714) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:466) > at > org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:445) > at > org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:326) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15968437#comment-15968437 ] Brian Nixon commented on ZOOKEEPER-2325: When a server starts up, it should always capture the state of the loaded database with a fresh snapshot. I don't believe it is a valid state to have a log file without a snapshot file. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.4, 3.6.0 > > Attachments: zk.patch, ZOOKEEPER-2325.001.patch, > ZOOKEEPER-2325-test.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2725) Upgrading to a global session fails with a multiop
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927211#comment-15927211 ] Brian Nixon commented on ZOOKEEPER-2725: I messed up the name of the pull request - meant to link https://github.com/apache/zookeeper/pull/195 to this issue. > Upgrading to a global session fails with a multiop > -- > > Key: ZOOKEEPER-2725 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2725 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.5.2 >Reporter: Brian Nixon > > On an ensemble with local sessions enabled, when a client with a local > session requests the creation of an ephemeral node within a multi-op, the > client gets a session expired message. The same multi-op works if the > session is already global. This breaks the client's expectation of seamless > promotion from local session to global session server-side. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (ZOOKEEPER-2725) Upgrading to a global session fails with a multiop
Brian Nixon created ZOOKEEPER-2725: -- Summary: Upgrading to a global session fails with a multiop Key: ZOOKEEPER-2725 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2725 Project: ZooKeeper Issue Type: Bug Components: server Affects Versions: 3.5.2 Reporter: Brian Nixon On an ensemble with local sessions enabled, when a client with a local session requests the creation of an ephemeral node within a multi-op, the client gets a session expired message. The same multi-op works if the session is already global. This breaks the client's expectation of seamless promotion from local session to global session server-side. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810600#comment-15810600 ] Brian Nixon commented on ZOOKEEPER-2325: Thanks [~hanm]! Glad we kept the changes for this task and 261 separate. I'll make sure that https://github.com/apache/zookeeper/pull/120 still commits cleanly and update that PR as necessary. > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Priority: Critical > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805547#comment-15805547 ] Brian Nixon commented on ZOOKEEPER-2325: Any word on committing this patch? I'd love to unblock ZOOKEEPER-261. [~fpj] ? > Data inconsistency if all snapshots empty or missing > > > Key: ZOOKEEPER-2325 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325 > Project: ZooKeeper > Issue Type: Bug > Components: server >Affects Versions: 3.4.6 >Reporter: Andrew Grasso >Assignee: Andrew Grasso >Priority: Critical > Attachments: ZOOKEEPER-2325-test.patch, ZOOKEEPER-2325.001.patch, > zk.patch > > Original Estimate: 1h > Remaining Estimate: 1h > > When loading state from snapshots on startup, FileTxnSnapLog.java ignores the > result of FileSnap.deserialize, which is -1L if no valid snapshots are found. > Recovery proceeds with dt.lastProcessed == 0, its initial value. > The result is that Zookeeper will process the transaction logs and then begin > serving requests with a different state than the rest of the ensemble. > To reproduce: > In a healthy zookeeper cluster of size >= 3, shut down one node. > Either delete all snapshots for this node or change all to be empty files. > Restart the node. > We believe this can happen organically if a node runs out of disk space. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (ZOOKEEPER-261) Reinitialized servers should not participate in leader election
[ https://issues.apache.org/jira/browse/ZOOKEEPER-261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15727016#comment-15727016 ] Brian Nixon commented on ZOOKEEPER-261: --- Ben and I discussed this offline. When starting up without any local data, the safest thing to do is view this lack with extreme suspicion and not participate in voting until you can pull down the data tree from the rest of the ensemble. Such a server is not qualified to confirm which servers are up to date and could inadvertently elect a server that is missing some data. The one exception is the creation of a fresh ensemble, when there is no data to repopulate the local data tree. It's not clear that an ensemble can detect that it is in this state on its own since in the worst case, every server will be subject to the same data losing fault (in which case you should recover from backups instead of coming online as an empty data base). This extra information needs to come from the admin. With the changes from ZOOKEEPER-2325, a server with no local data tree starts with a zxid of 0. I'll submit a pull request that changes that initial zxid to -1 unless a special 'initialize' file is present in the data directory and removes voting privileges from members reporting -1. The idea is that creating the 'initialize' file alongside 'myid' will be a standard part of ensemble creation - the extra information from the admin. The 'initialize' file will be automatically cleaned up by the server and subsequent restarts can view missing data directories as a sign they are legitimately missing context (e.g. adding to an existing ensemble). > Reinitialized servers should not participate in leader election > --- > > Key: ZOOKEEPER-261 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-261 > Project: ZooKeeper > Issue Type: Improvement > Components: leaderElection, quorum >Reporter: Benjamin Reed > > A server that has lost its data should not participate in leader election > until it has resynced with a leader. Our leader election algorithm and > NEW_LEADER commit assumes that the followers voting on a leader have not lost > any of their data. We should have a flag in the data directory saying whether > or not the data is preserved so that the the flag will be cleared if the data > is ever cleared. > Here is the problematic scenario: you have have ensemble of machines A, B, > and C. C is down. the last transaction seen by C is z. a transaction, z+1, is > committed on A and B. Now there is a power outage. B's data gets > reinitialized. when power comes back up, B and C comes up, but A does not. C > will be elected leader and transaction z+1 is lost. (note, this can happen > even if all three machines are up and C just responds quickly. in that case C > would tell A to truncate z+1 from its log.) in theory we haven't violated our > 2f+1 guarantee, since A is failed and B still hasn't recovered from failure, > but it would be nice if when we don't have quorum that system stops working > rather than works incorrectly if we lose quorum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)