[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921528#action_12921528 ] Dave Wright commented on ZOOKEEPER-885: --- Has it been verified that ZK is doing no disk activity at all during that time? What about log file writes? What about sessions being established/torn down (which would cause syncs)? > Zookeeper drops connections under moderate IO load > -- > > Key: ZOOKEEPER-885 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.2.2, 3.3.1 > Environment: Debian (Lenny) > 1Gb RAM > swap disabled > 100Mb heap for zookeeper >Reporter: Alexandre Hardy >Priority: Critical > Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, > WatcherTest.java, zklogs.tar.gz > > > A zookeeper server under minimum load, with a number of clients watching > exactly one node will fail to maintain the connection when the machine is > subjected to moderate IO load. > In a specific test example we had three zookeeper servers running on > dedicated machines with 45 clients connected, watching exactly one node. The > clients would disconnect after moderate load was added to each of the > zookeeper servers with the command: > {noformat} > dd if=/dev/urandom of=/dev/mapper/nimbula-test > {noformat} > The {{dd}} command transferred data at a rate of about 4Mb/s. > The same thing happens with > {noformat} > dd if=/dev/zero of=/dev/mapper/nimbula-test > {noformat} > It seems strange that such a moderate load should cause instability in the > connection. > Very few other processes were running, the machines were setup to test the > connection instability we have experienced. Clients performed no other read > or mutation operations. > Although the documents state that minimal competing IO load should present on > the zookeeper server, it seems reasonable that moderate IO should not cause > problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921459#action_12921459 ] Dave Wright commented on ZOOKEEPER-885: --- I don't think the cause of this is much of a mystery, as we experienced similar problems when we had the zookeeper files on the same filesystem as an IO-heavy application that was doing buffered IO. Quite simply, when zookeeper does a sync on its own files, it causes the entire filesystem to sync, flushing any buffered data from the IO-heavy application and freezing the ZK server process for long enough for heartbeats to timeout. When you say "moderate IO load" I'm curious what the bottleneck is - the dd command will copy data as fast as possible, if you're only getting 4MB/sec, the underlying device must be pretty slow, which would further indicate why a sync() request would take a while to complete. The only fix we've seen is to put the ZK files on their own device, although you may be able to fix it with a different partition on the same device. > Zookeeper drops connections under moderate IO load > -- > > Key: ZOOKEEPER-885 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 > Project: Zookeeper > Issue Type: Bug > Components: server >Affects Versions: 3.2.2, 3.3.1 > Environment: Debian (Lenny) > 1Gb RAM > swap disabled > 100Mb heap for zookeeper >Reporter: Alexandre Hardy >Priority: Critical > Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, > WatcherTest.java, zklogs.tar.gz > > > A zookeeper server under minimum load, with a number of clients watching > exactly one node will fail to maintain the connection when the machine is > subjected to moderate IO load. > In a specific test example we had three zookeeper servers running on > dedicated machines with 45 clients connected, watching exactly one node. The > clients would disconnect after moderate load was added to each of the > zookeeper servers with the command: > {noformat} > dd if=/dev/urandom of=/dev/mapper/nimbula-test > {noformat} > The {{dd}} command transferred data at a rate of about 4Mb/s. > The same thing happens with > {noformat} > dd if=/dev/zero of=/dev/mapper/nimbula-test > {noformat} > It seems strange that such a moderate load should cause instability in the > connection. > Very few other processes were running, the machines were setup to test the > connection instability we have experienced. Clients performed no other read > or mutation operations. > Although the documents state that minimal competing IO load should present on > the zookeeper server, it seems reasonable that moderate IO should not cause > problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-782) Incorrect C API documentation for Watches
Incorrect C API documentation for Watches - Key: ZOOKEEPER-782 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-782 Project: Zookeeper Issue Type: Bug Components: c client Affects Versions: 3.3.1 Reporter: Dave Wright Priority: Trivial The C API Doxygen documentation states: " If the client is ever disconnected from the service, even if the disconnection is temporary, the watches of the client will be removed from the service, so a client must treat a disconnect notification as an implicit trigger of all outstanding watches." This is incorrect as of v.3. Watches are only lost and need to be re-registered when a session times out. When a normal disconnection occurs watches are reset automatically on reconnection. The documentation in zookeeper.h needs to be updated to correct this explanation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-704) GSoC 2010: Read-Only Mode
[ https://issues.apache.org/jira/browse/ZOOKEEPER-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12865465#action_12865465 ] Dave Wright commented on ZOOKEEPER-704: --- This is a great idea, but I'm afraid there is a somewhat fundamental problem with this concept. What you want is if enough nodes "go down" that a quorum can't be formed (at all), the remaining nodes go into read-only mode. The problem is that if a partition occurs (say, a single server loses contact with the rest of the cluster), but a quorum still exists, we want clients who were connected to the partitioned server to re-connect to a server in the majority. The current design allows for this by denying connections to minority nodes, forcing clients to hunt for the majority. If we allow servers in the minority to keep/accept connections, then clients will end up in read-only mode when they could have simply reconnected to the majority. It may be possible to accomplish the desired outcome with some client-side and connection protocol changes. Specifically, a flag on the connection request from the client that says "allow read-only connections" - if false, the server will close the connection, allowing the client to hunt for a server in the majority. Once a client has gone through all the servers in the list (and found out that none are in the majority) it could flip the flag to true and connect to any running servers in read-only mode. There is still the question of how to get back out of read only mode (e.g. should we keep hunting in the background for a majority, or just wait until the server we are connected to re-forms a quorum). > GSoC 2010: Read-Only Mode > - > > Key: ZOOKEEPER-704 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-704 > Project: Zookeeper > Issue Type: Wish >Reporter: Henry Robinson > > Read-only mode > Possible Mentor > Henry Robinson (henry at apache dot org) > Requirements > Java and TCP/IP networking > Description > When a ZooKeeper server loses contact with over half of the other servers in > an ensemble ('loses a quorum'), it stops responding to client requests > because it cannot guarantee that writes will get processed correctly. For > some applications, it would be beneficial if a server still responded to read > requests when the quorum is lost, but caused an error condition when a write > request was attempted. > This project would implement a 'read-only' mode for ZooKeeper servers (maybe > only for Observers) that allowed read requests to be served as long as the > client can contact a server. > This is a great project for getting really hands-on with the internals of > ZooKeeper - you must be comfortable with Java and networking otherwise you'll > have a hard time coming up to speed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-762) Allow dynamic addition/removal of server nodes in the client API
Allow dynamic addition/removal of server nodes in the client API Key: ZOOKEEPER-762 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-762 Project: Zookeeper Issue Type: Sub-task Components: c client, java client Reporter: Dave Wright Priority: Minor Currently the list of zookeeper servers needs to be provided to the client APIs at construction time, and cannot be changed without a complete shutdown/restart of the client API. However, there are scenarios that require the server list to be updated, such as removal or addition of a ZK cluster node, and it would be nice if the list could be updated via a simple API call. The general approach (in the Java client) would be to "RemoveServer()/AddServer()" functions for Zookeeper that calls down to ClientCnxn, where they are just maintained in a list. Of course if the server being removed is the one currently connected, we'd need to disconnect, but a simple call to disconnect() seems like it would resolve that and trigger the automatic re-connection logic. An equivalent change could be made in the C code. This change would also make dynamic cluster membership in ZOOKEEPER-107 easier to implement. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.