[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-15 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: benchmark.csv

Here are some results with different settings of {{dirty_ratio}}, 
{{dirty_bytes}} (finer control), session timeouts, and io priorities (set with 
{{ionice}}). 

The status field indicates success or failure, 0=success and anything else is 
failure.
Where failure means a zookeeper session disconnected (but did not necessarily 
expire).

Each test ran for a maximum of 5 minutes. A test could also fail if it failed 
to connect to zookeeper servers within the first 60 seconds, unfortunately the 
return code did not differentiate properly between these cases. However pauses 
of about 4 seconds were allowed between tests, during which all IO operations 
(by the test program) were stopped. This should have allowed the system to 
stabilize somewhat.

In this test the session timeout had the most significant influence on 
stability (not too surprising) and the other system settings had far less 
influence.

We have settled on a 60s session timeout for the moment (to achieve the 
stability we need under the IO loads we are experiencing). But it would be 
great if we could reduce this a bit.



 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2, 3.3.1
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: benchmark.csv, tracezklogs.tar.gz, tracezklogs.tar.gz, 
 WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-12 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Affects Version/s: 3.3.1

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2, 3.3.1
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz, 
 WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-08 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: tracezklogs.tar.gz

I accidentally missed the configuration for one of the nodes (switching from 
3.2.2 to 3.3.1). Thanks for spotting that.

Here are updated log files. I have updated the test program to terminate as 
soon as any single connection is dropped (Since my goal is to have zero 
connection failures for this simple test).

The client and server clocks are a bit out of sync (about 4 minutes, with the 
servers logging in UTC and client logging in UTC+2), but the test has been set 
up so that only one instability event is recorded.

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: tracezklogs.tar.gz, WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-08 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: tracezklogs.tar.gz

I failed to set up the debug logs correctly in the previous attachment. This 
attachment has full trace logs.

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: tracezklogs.tar.gz, tracezklogs.tar.gz, 
 WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-07 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: zklogs.tar.gz

Attached are logs from the two sessions with disconnects. I have not filtered 
the logs in any way. The logs for 3.3.1 are the most clear, and have exactly 
one failure roughly 3 minutes after the logs start. Unfortunately the logs 
don't offer much information (as far as I can make out). Should I enable more 
verbose logging?

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: WatcherTest.java, zklogs.tar.gz


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-01 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Attachment: WatcherTest.java

I have attached the client used to test the zookeeper servers.

 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: WatcherTest.java


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 transferring at a rate of about 20Mb/s. 
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-01 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Description: 
A zookeeper server under minimum load, with a number of clients watching 
exactly one node will fail to maintain the connection when the machine is 
subjected to moderate IO load.

In a specific test example we had three zookeeper servers running on dedicated 
machines with 45 clients connected, watching exactly one node. The clients 
would disconnect after moderate load was added to each of the zookeeper servers 
with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}

The {{dd}} command transferred data at a rate of about 4Mb/s.

The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}
transferring at a rate of about 120Mb/s. 

It seems strange that such a moderate load should cause instability in the 
connection.

Very few other processes were running, the machines were setup to test the 
connection instability we have experienced. Clients performed no other read or 
mutation operations.

Although the documents state that minimal competing IO load should present on 
the zookeeper server, it seems reasonable that moderate IO should not cause 
problems in this case.

  was:
A zookeeper server under minimum load, with a number of clients watching 
exactly one node will fail to maintain the connection when the machine is 
subjected to moderate IO load.

In a specific test example we had three zookeeper servers running on dedicated 
machines with 45 clients connected, watching exactly one node. The clients 
would disconnect after moderate load was added to each of the zookeeper servers 
with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}

The {{dd}} command transferred data at a rate of about 4Mb/s.

The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}
transferring at a rate of about 20Mb/s. 

It seems strange that such a moderate load should cause instability in the 
connection.

Very few other processes were running, the machines were setup to test the 
connection instability we have experienced. Clients performed no other read or 
mutation operations.

Although the documents state that minimal competing IO load should present on 
the zookeeper server, it seems reasonable that moderate IO should not cause 
problems in this case.


 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: WatcherTest.java


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 transferring at a rate of about 120Mb/s. 
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (ZOOKEEPER-885) Zookeeper drops connections under moderate IO load

2010-10-01 Thread Alexandre Hardy (JIRA)

 [ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Hardy updated ZOOKEEPER-885:
--

Description: 
A zookeeper server under minimum load, with a number of clients watching 
exactly one node will fail to maintain the connection when the machine is 
subjected to moderate IO load.

In a specific test example we had three zookeeper servers running on dedicated 
machines with 45 clients connected, watching exactly one node. The clients 
would disconnect after moderate load was added to each of the zookeeper servers 
with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}

The {{dd}} command transferred data at a rate of about 4Mb/s.

The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}

It seems strange that such a moderate load should cause instability in the 
connection.

Very few other processes were running, the machines were setup to test the 
connection instability we have experienced. Clients performed no other read or 
mutation operations.

Although the documents state that minimal competing IO load should present on 
the zookeeper server, it seems reasonable that moderate IO should not cause 
problems in this case.

  was:
A zookeeper server under minimum load, with a number of clients watching 
exactly one node will fail to maintain the connection when the machine is 
subjected to moderate IO load.

In a specific test example we had three zookeeper servers running on dedicated 
machines with 45 clients connected, watching exactly one node. The clients 
would disconnect after moderate load was added to each of the zookeeper servers 
with the command:
{noformat}
dd if=/dev/urandom of=/dev/mapper/nimbula-test
{noformat}

The {{dd}} command transferred data at a rate of about 4Mb/s.

The same thing happens with
{noformat}
dd if=/dev/zero of=/dev/mapper/nimbula-test
{noformat}
transferring at a rate of about 120Mb/s. 

It seems strange that such a moderate load should cause instability in the 
connection.

Very few other processes were running, the machines were setup to test the 
connection instability we have experienced. Clients performed no other read or 
mutation operations.

Although the documents state that minimal competing IO load should present on 
the zookeeper server, it seems reasonable that moderate IO should not cause 
problems in this case.


 Zookeeper drops connections under moderate IO load
 --

 Key: ZOOKEEPER-885
 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
 Project: Zookeeper
  Issue Type: Bug
  Components: server
Affects Versions: 3.2.2
 Environment: Debian (Lenny)
 1Gb RAM
 swap disabled
 100Mb heap for zookeeper
Reporter: Alexandre Hardy
Priority: Critical
 Attachments: WatcherTest.java


 A zookeeper server under minimum load, with a number of clients watching 
 exactly one node will fail to maintain the connection when the machine is 
 subjected to moderate IO load.
 In a specific test example we had three zookeeper servers running on 
 dedicated machines with 45 clients connected, watching exactly one node. The 
 clients would disconnect after moderate load was added to each of the 
 zookeeper servers with the command:
 {noformat}
 dd if=/dev/urandom of=/dev/mapper/nimbula-test
 {noformat}
 The {{dd}} command transferred data at a rate of about 4Mb/s.
 The same thing happens with
 {noformat}
 dd if=/dev/zero of=/dev/mapper/nimbula-test
 {noformat}
 It seems strange that such a moderate load should cause instability in the 
 connection.
 Very few other processes were running, the machines were setup to test the 
 connection instability we have experienced. Clients performed no other read 
 or mutation operations.
 Although the documents state that minimal competing IO load should present on 
 the zookeeper server, it seems reasonable that moderate IO should not cause 
 problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.