[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917987#action_12917987
 ] 

Alexandre Hardy commented on ZOOKEEPER-885:
-------------------------------------------

Thanks Patrick.

I have now performed the same test with zookeeper-3.3.1 as distributed in the 
Cloudera CDH3 distribution. My results are fairly similar (I don't have hard 
numbers for comparison though). I still experience disconnects and session 
timeouts even though {{dd}} transfers at only 5Mb per second.

I have had improvements in stability with this test by adjusting the following:
{noformat}
echo 5 > /proc/sys/vm/dirty_ratio
echo 5 > /proc/sys/vm/dirty_background_ratio
{noformat}

These settings are sufficient to get stability on this simple test.

I'll work on getting some more concrete comparisons of zookeeper-3.2.2 and 
zookeeper-3.3.1.

The output of the "stat" command seems reasonable (0/10/91) during {{dd}}. It 
seems to me that the zookeeper server runs fairly well for a period of time, 
and then hits a period of instability, and then recovers again. Are pings also 
counted for server latency?

I suspect that IO scheduling / buffering may be partly responsible for the 
problem, and hence started testing the effect of {{dirty_ratio}} etc. At this 
stage I am presuming that {{dd}} manages to write a fair amount of data before 
the data is actually flushed to disk, and that when it is flushed all processes 
attempting to write to the device are stalled until the flush completes.

What is unclear to me at this point, is what gets written to disk? What would 
zookeeper be writing to disk if none of the clients are submitting any 
requests? Something to do with the session?

> Zookeeper drops connections under moderate IO load
> --------------------------------------------------
>
>                 Key: ZOOKEEPER-885
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-885
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.2.2
>         Environment: Debian (Lenny)
> 1Gb RAM
> swap disabled
> 100Mb heap for zookeeper
>            Reporter: Alexandre Hardy
>            Priority: Critical
>         Attachments: WatcherTest.java
>
>
> A zookeeper server under minimum load, with a number of clients watching 
> exactly one node will fail to maintain the connection when the machine is 
> subjected to moderate IO load.
> In a specific test example we had three zookeeper servers running on 
> dedicated machines with 45 clients connected, watching exactly one node. The 
> clients would disconnect after moderate load was added to each of the 
> zookeeper servers with the command:
> {noformat}
> dd if=/dev/urandom of=/dev/mapper/nimbula-test
> {noformat}
> The {{dd}} command transferred data at a rate of about 4Mb/s.
> The same thing happens with
> {noformat}
> dd if=/dev/zero of=/dev/mapper/nimbula-test
> {noformat}
> It seems strange that such a moderate load should cause instability in the 
> connection.
> Very few other processes were running, the machines were setup to test the 
> connection instability we have experienced. Clients performed no other read 
> or mutation operations.
> Although the documents state that minimal competing IO load should present on 
> the zookeeper server, it seems reasonable that moderate IO should not cause 
> problems in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to