[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandre Hardy updated ZOOKEEPER-885: -- Attachment: tracezklogs.tar.gz I accidentally missed the configuration for one of the nodes (switching from 3.2.2 to 3.3.1). Thanks for spotting that. Here are updated log files. I have updated the test program to terminate as soon as any single connection is dropped (Since my goal is to have zero connection failures for this simple test). The client and server clocks are a bit out of sync (about 4 minutes, with the servers logging in UTC and client logging in UTC+2), but the test has been set up so that only one instability event is recorded. Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load
[ https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexandre Hardy updated ZOOKEEPER-885: -- Attachment: tracezklogs.tar.gz I failed to set up the debug logs correctly in the previous attachment. This attachment has full trace logs. Zookeeper drops connections under moderate IO load -- Key: ZOOKEEPER-885 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885 Project: Zookeeper Issue Type: Bug Components: server Affects Versions: 3.2.2 Environment: Debian (Lenny) 1Gb RAM swap disabled 100Mb heap for zookeeper Reporter: Alexandre Hardy Priority: Critical Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz A zookeeper server under minimum load, with a number of clients watching exactly one node will fail to maintain the connection when the machine is subjected to moderate IO load. In a specific test example we had three zookeeper servers running on dedicated machines with 45 clients connected, watching exactly one node. The clients would disconnect after moderate load was added to each of the zookeeper servers with the command: {noformat} dd if=/dev/urandom of=/dev/mapper/nimbula-test {noformat} The {{dd}} command transferred data at a rate of about 4Mb/s. The same thing happens with {noformat} dd if=/dev/zero of=/dev/mapper/nimbula-test {noformat} It seems strange that such a moderate load should cause instability in the connection. Very few other processes were running, the machines were setup to test the connection instability we have experienced. Clients performed no other read or mutation operations. Although the documents state that minimal competing IO load should present on the zookeeper server, it seems reasonable that moderate IO should not cause problems in this case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (ZOOKEEPER-892) Remote replication of Zookeeper data
Remote replication of Zookeeper data Key: ZOOKEEPER-892 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-892 Project: Zookeeper Issue Type: New Feature Components: quorum Affects Versions: 3.4.0 Environment: [r...@llf531123 Zookeeper]# uname -a Linux llf531123.crawl.yahoo.net 2.6.9-67.0.22.ELsmp #1 SMP Fri Jul 11 10:37:57 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux [r...@llf531123 Zookeeper]# java -version java version 1.6.0_03 Java(TM) SE Runtime Environment (build 1.6.0_03-b05) Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode) [r...@llf531123 Zookeeper]# Reporter: Anirban Roy Assignee: Anirban Roy Fix For: 3.4.0 ZooKeeper is a highly available and scalable system for distributed synchrony and is used for cluster management. In its current incarnation it has issues with cross-colo communication and data replication. Presently, the only way to distribute ZooKeeper across multiple data centers is to maintain a cross-colo Quorum using Observer members, leading to huge bandwidth consumption and performance degradation. The idea behind the ZooKeeper replication feature is to provide replication of ZooKeeper data from one site to others using a new type of ZooKeeper member called a Replicator. The Replicator will be asynchronous, non-intrusive, and can be applied to a subset of the data. It will be a part of the Main ZooKeeper Site and will push changes to multiple data centers with guaranteed ordering of events. This will bring about many of the benefits of database replication such as resilience to site failure and localized serving across data centers. In short, the goal is to provide remote (sub) data replication with guaranteed ordering, without affecting the Main ZooKeeper performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections
[ https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919293#action_12919293 ] Thomas Koch commented on ZOOKEEPER-823: --- How to continue with this? I know have a couple of Heisenbugs that do not appear with every test run. Most often * org.apache.zookeeper.test.ACLTest.testAcls * org.apache.zookeeper.test.AsyncHammerTest.testObserversHammer * org.apache.zookeeper.test.AsyncHammerTest.testHammer I have the following believing: - there are some issues with missing / misplaced synchronization blocks in the Netty java client code - there may also be some bugs in the server netty code. Sometimes I get bugs like this, which could also be caused on the server side, but I have not yet learned the ZK server code: Exception caught [id: 0x19509443, /127.0.0.1:53300 = localhost/127.0.0.1:11270] EXCEPTION: java.io.IOException: Xid out of order. Got Xid 32 with err 0 expected Xid 33 for a packet with details: I'd rather like to commit the current state to trunk and to continue the bug hunting in trunk. This would allow me to do some cleanups that could help to understand the correct synchronizations. This cleanups should be ZOOKEEPER-878 and ZOOKEEPER-879 This Exception by example could be better understood by solving the above mentioned issues: java.lang.NullPointerException at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:574) at org.apache.zookeeper.ClientCnxn.access$2300(ClientCnxn.java:76) at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:869) at org.apache.zookeeper.ClientCnxnSocketNetty.cleanup(ClientCnxnSocketNetty.java:147) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:741) update ZooKeeper java client to optionally use Netty for connections Key: ZOOKEEPER-823 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823 Project: Zookeeper Issue Type: New Feature Components: java client Reporter: Patrick Hunt Assignee: Patrick Hunt Fix For: 3.4.0 Attachments: NettyNettySuiteTest.rtf, TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch This jira will port the client side connection code to use netty rather than direct nio. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.