[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768656#action_12768656
 ] 

Flavio Paiva Junqueira commented on ZOOKEEPER-512:
--------------------------------------------------

We have been testing this patch externally with Pat's fault injection framework 
that uses aspectj. It is difficult at this point to introduce his framework, so 
we have agreed to postpone adding such tests. The patch fixes some visible 
problems and passes previous tests.

> FLE election fails to elect leader
> ----------------------------------
>
>                 Key: ZOOKEEPER-512
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-512
>             Project: Zookeeper
>          Issue Type: Bug
>          Components: quorum, server
>    Affects Versions: 3.2.0
>            Reporter: Patrick Hunt
>            Assignee: Flavio Paiva Junqueira
>            Priority: Blocker
>             Fix For: 3.3.0
>
>         Attachments: jst.txt, log3_debug.tar.gz, logs.tar.gz, logs2.tar.gz, 
> t5_aj.tar.gz, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, ZOOKEEPER-512.patch, 
> ZOOKEEPER-512.patch, ZOOKEEPER-512.patch
>
>
> I was doing some fault injection testing of 3.2.1 with ZOOKEEPER-508 patch 
> applied and noticed that after some time the ensemble failed to re-elect a 
> leader.
> See the attached log files - 5 member ensemble. typically 5 is the leader
> Notice that after 16:23:50,525 no quorum is formed, even after 20 minutes 
> elapses w/no quorum
> environment:
> I was doing fault injection testing using aspectj. The faults are injected 
> into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio 
> (rand.nextFloat() <= .005 => throw IOException
> You can see when a fault is injected in the log via:
> 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@38] 
> - READPACKET FORCED FAIL
> vs a read/write that didn't force fail:
> 2009-08-19 16:57:09,568 - INFO  [Thread-74:readrequestfailsintermitten...@41] 
> - READPACKET OK
> otw standard code/config (straight fle quorum with 5 members)
> also see the attached jstack trace. this is for one of the servers. Notice in 
> particular that the number of sendworkers != the number of recv workers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to