[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-08 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: tracezklogs.tar.gz

I accidentally missed the configuration for one of the nodes (switching from 
3.2.2 to 3.3.1). Thanks for spotting that.

Here are updated log files. I have updated the test program to terminate as 
soon as any single connection is dropped (Since my goal is to have zero 
connection failures for this simple test).

The client and server clocks are a bit out of sync (about 4 minutes, with the 
servers logging in UTC and client logging in UTC+2), but the test has been set 
up so that only one instability event is recorded.

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-08 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: tracezklogs.tar.gz

I failed to set up the debug logs correctly in the previous attachment. This 
attachment has full trace logs.

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz, 
 WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (ZOOKEEPER-892) Remote replication of Zookeeper data

2010-10-08 Thread Anirban Roy (JIRA)
Remote replication of Zookeeper data


 Key: ZOOKEEPER-892
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-892
 Project: Zookeeper
  Issue Type: New Feature
  Components: quorum
Affects Versions: 3.4.0
 Environment: [r...@llf531123 Zookeeper]# uname -a
Linux llf531123.crawl.yahoo.net 2.6.9-67.0.22.ELsmp #1 SMP Fri Jul 11 10:37:57 
EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
[r...@llf531123 Zookeeper]# java -version
java version 1.6.0_03
Java(TM) SE Runtime Environment (build 1.6.0_03-b05)
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_03-b05, mixed mode)
[r...@llf531123 Zookeeper]# 
Reporter: Anirban Roy
Assignee: Anirban Roy
 Fix For: 3.4.0


ZooKeeper is a highly available and scalable system for distributed synchrony 
and is used for cluster management. In its current incarnation it has issues 
with cross-colo communication and data replication. Presently, the only way to 
distribute ZooKeeper across multiple data centers is to maintain a cross-colo 
Quorum using Observer members, leading to huge bandwidth consumption and 
performance degradation. The idea behind the ZooKeeper replication feature is 
to provide replication of ZooKeeper data from one site to others using a new 
type of ZooKeeper member called a Replicator. The Replicator will be 
asynchronous, non-intrusive, and can be applied to a subset of the data. It 
will be a part of the Main ZooKeeper Site and will push changes to multiple 
data centers with guaranteed ordering of events. This will bring about many of 
the benefits of database replication such as resilience to site failure and 
localized serving across data centers. In short, the goal is to provide remote 
(sub) data replication with guaranteed ordering, without affecting the Main 
ZooKeeper performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (ZOOKEEPER-823) update ZooKeeper java client to optionally use Netty for connections

2010-10-08 Thread Thomas Koch (JIRA)

[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919293#action_12919293
 ] 

Thomas Koch commented on ZOOKEEPER-823:
---

How to continue with this?

I know have a couple of Heisenbugs that do not appear with every test run. Most 
often

* org.apache.zookeeper.test.ACLTest.testAcls
* org.apache.zookeeper.test.AsyncHammerTest.testObserversHammer
* org.apache.zookeeper.test.AsyncHammerTest.testHammer

I have the following believing:

- there are some issues with missing / misplaced synchronization blocks in the 
Netty java client code
- there may also be some bugs in the server netty code. Sometimes I get bugs 
like this, which could also be caused on the server side, but I have not yet 
learned the ZK server code:

Exception caught [id: 0x19509443, /127.0.0.1:53300 = 
localhost/127.0.0.1:11270] EXCEPTION: java.io.IOException: Xid out of order. 
Got Xid 32 with err 0 expected Xid 33 for a packet with details:

I'd rather like to commit the current state to trunk and to continue the bug 
hunting in trunk. This would allow me to do some cleanups that could help to 
understand the correct synchronizations. This cleanups should be ZOOKEEPER-878 
and ZOOKEEPER-879

This Exception by example could be better understood by solving the above 
mentioned issues:

java.lang.NullPointerException
at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:574)
at org.apache.zookeeper.ClientCnxn.access$2300(ClientCnxn.java:76)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:869)
at 
org.apache.zookeeper.ClientCnxnSocketNetty.cleanup(ClientCnxnSocketNetty.java:147)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:741)

 update ZooKeeper java client to optionally use Netty for connections
 

 Key: ZOOKEEPER-823
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-823
 Project: Zookeeper
  Issue Type: New Feature
  Components: java client
Reporter: Patrick Hunt
Assignee: Patrick Hunt
 Fix For: 3.4.0

 Attachments: NettyNettySuiteTest.rtf, 
 TEST-org.apache.zookeeper.test.NettyNettySuiteTest.txt.gz, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, 
 ZOOKEEPER-823.patch, ZOOKEEPER-823.patch, ZOOKEEPER-823.patch


 This jira will port the client side connection code to use netty rather than 
 direct nio.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.