Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-23 Thread Flavio Junqueira
Thomas, Could you open jiras and make available the logs for tests that failed for you?Thanks,-FlavioOn Oct 22, 2010, at 7:56 PM, Thomas Koch wrote:Mahadev Konar:Hi Thomas, Could you verify this by just testing the trunk without your patch? Youmight very well be right that those tests are a little flaky.As for the hudson builds, Nigel is working on getting the patch builds forzookeeper running. As soon as that gets fixed this flaky tests would showup more often.ThanksmahadevOn 10/20/10 11:48 PM, "Thomas Koch" tho...@koch.ro wrote:Hi,last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk.One of this builds failed:junit.framework.AssertionFailedError: Leader hasn't joined: 5 at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)I did this many builds of trunk, because in my quest to redo the clientnetty integration step by step I made one step which resulted in 2failed builds out of 8. The two failures were both:Hi Mahadev,as I've written, I did 42 builds of trunk over the night from which 2 failed and 8 builds of my patch during work time with 2 failures. I also did another round of builds of my patch during last night and got only 1 failure out of ~40 succesful builds.So I believe that the high failure rate of 2/8 from the initial round of patch builds is because I did this builds over the day while other developers also used other virtual machines on the same host.Have a nice weekend,Thomas Koch, http://www.koch.ro flaviojunqueiraresearch scientistf...@yahoo-inc.comdirect +34 93-183-8828avinguda diagonal 177, 8th floor, barcelona, 08018, esphone (408) 349 3300fax (408) 349 3301 

Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-22 Thread Mahadev Konar
Hi Thomas,
  Could you verify this by just testing the trunk without your patch? You
might very well be right that those tests are a little flaky.

As for the hudson builds, Nigel is working on getting the patch builds for
zookeeper running. As soon as that gets fixed this flaky tests would show up
more often. 

Thanks
mahadev


On 10/20/10 11:48 PM, Thomas Koch tho...@koch.ro wrote:

 Hi,
 
 last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk. One
 of this builds failed:
 
 junit.framework.AssertionFailedError: Leader hasn't joined: 5
 at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)
 
 I did this many builds of trunk, because in my quest to redo the client netty
 integration step by step I made one step which resulted in 2 failed builds out
 of 8. The two failures were both:
 
 junit.framework.AssertionFailedError: Threads didn't join
 at
 
org.apache.zookeeper.test.FLERestartTest.testLERestart(FLERestartTest.java:198
)
 
 I can't find any relationship between the above test and my changes. The test
 does not use the ZooKeeper client code at all. So I begin to believe that
 there are some Heisenbugs, Bohrbugs or Mandelbugs[1] in ZooKeeper that just
 happen to show up from time to time without any relationship to the current
 changes.
 
 I'll try to investigate the cause further, maybe there is some relationship
 I've not yet found. But if my assumption should apply, then these kind of bugs
 would be a strong argument in favor of refactoring. These bugs are best found
 by cleaning the code, most important implementing strict separation of
 concerns.
 
 Wouldn't you like to setup Hudson to build ZooKeeper trunk every half an hour?
 
 [1] http://en.wikipedia.org/wiki/Unusual_software_bug
 
 Best regards,
 
 Thomas Koch, http://www.koch.ro
 



Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-22 Thread Thomas Koch
Mahadev Konar:
 Hi Thomas,
   Could you verify this by just testing the trunk without your patch? You
 might very well be right that those tests are a little flaky.
 
 As for the hudson builds, Nigel is working on getting the patch builds for
 zookeeper running. As soon as that gets fixed this flaky tests would show
 up more often.
 
 Thanks
 mahadev
 
 On 10/20/10 11:48 PM, Thomas Koch tho...@koch.ro wrote:
  Hi,
  
  last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk.
  One of this builds failed:
  
  junit.framework.AssertionFailedError: Leader hasn't joined: 5
  
  at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)
  
  I did this many builds of trunk, because in my quest to redo the client
  netty integration step by step I made one step which resulted in 2
  failed builds out of 8. The two failures were both:
Hi Mahadev,

as I've written, I did 42 builds of trunk over the night from which 2 failed 
and 8 builds of my patch during work time with 2 failures. I also did another 
round of builds of my patch during last night and got only 1 failure out of 
~40 succesful builds.

So I believe that the high failure rate of 2/8 from the initial round of patch 
builds is because I did this builds over the day while other developers also 
used other virtual machines on the same host.

Have a nice weekend,

Thomas Koch, http://www.koch.ro


Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-21 Thread Thomas Koch
Hi,

last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk. One 
of this builds failed:

junit.framework.AssertionFailedError: Leader hasn't joined: 5
at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)

I did this many builds of trunk, because in my quest to redo the client netty 
integration step by step I made one step which resulted in 2 failed builds out 
of 8. The two failures were both:

junit.framework.AssertionFailedError: Threads didn't join
at 
org.apache.zookeeper.test.FLERestartTest.testLERestart(FLERestartTest.java:198)

I can't find any relationship between the above test and my changes. The test 
does not use the ZooKeeper client code at all. So I begin to believe that 
there are some Heisenbugs, Bohrbugs or Mandelbugs[1] in ZooKeeper that just 
happen to show up from time to time without any relationship to the current 
changes.

I'll try to investigate the cause further, maybe there is some relationship 
I've not yet found. But if my assumption should apply, then these kind of bugs 
would be a strong argument in favor of refactoring. These bugs are best found 
by cleaning the code, most important implementing strict separation of 
concerns.

Wouldn't you like to setup Hudson to build ZooKeeper trunk every half an hour?

[1] http://en.wikipedia.org/wiki/Unusual_software_bug

Best regards,

Thomas Koch, http://www.koch.ro


Re: Heisenbugs, Bohrbugs, Mandelbugs?

2010-10-21 Thread Patrick Hunt
On Wed, Oct 20, 2010 at 11:48 PM, Thomas Koch tho...@koch.ro wrote:

 Hi,

 last night I let my hudson server do 42 (sic) builds of ZooKeeper trunk.
 One
 of this builds failed:

 junit.framework.AssertionFailedError: Leader hasn't joined: 5
at org.apache.zookeeper.test.FLETest.testLE(FLETest.java:312)

 I did this many builds of trunk, because in my quest to redo the client
 netty
 integration step by step I made one step which resulted in 2 failed builds
 out
 of 8. The two failures were both:

 junit.framework.AssertionFailedError: Threads didn't join
at

 org.apache.zookeeper.test.FLERestartTest.testLERestart(FLERestartTest.java:198)


Hi Thomas, there's an open jira for this:
https://issues.apache.org/jira/browse/ZOOKEEPER-653
great if you'd like to address it.

I can't find any relationship between the above test and my changes. The
 test
 does not use the ZooKeeper client code at all. So I begin to believe that
 there are some Heisenbugs, Bohrbugs or Mandelbugs[1] in ZooKeeper that just
 happen to show up from time to time without any relationship to the current
 changes.

 I'll try to investigate the cause further, maybe there is some relationship
 I've not yet found. But if my assumption should apply, then these kind of
 bugs
 would be a strong argument in favor of refactoring. These bugs are best
 found
 by cleaning the code, most important implementing strict separation of
 concerns.


I believe the bug is in the test, rather than in the code. Forming a quorum
is non-deterministic, the test assumes that it's allowing enough time for
everyone to join, this may not be the case. The opposite may be true as well
however, it might be the case that something is really failing, however my
understanding from Flavio is that it's the former. The unfortunate thing is
that since we don't really know which it is, we sort of ignore these
failures. Really we should fix this issue for reals. Whatever that
means... Flavio perhaps you could give Thomas some insight, if you have
ideas he is motivated to help resolve.

Also notice that we are currently @Ignore ing a handful of tests. These are
also broken tests, tests which we really need to fix and bring back
online. The session moved in particular needs to be fixed (again,
non-deterministic test, probably could benefit from some refactoring,
however I think it's more a design for test issue).

Take a look at the clover output for some insight on areas that need more
testing and refactoring (coverage/complexity):
https://hudson.apache.org/hudson/view/ZooKeeper/job/ZooKeeper-trunk/clover/


 Wouldn't you like to setup Hudson to build ZooKeeper trunk every half an
 hour?


I wouldn't mind, but we'd probably get yelled at by the apache hudson
admins. :-) Hudson is a shared resource and we typically need to play
nice. Also there's been problems with hadoop on hudson for the past few
months, Nigel is working on that, might be a good thing to bring up again
once that's addressed (patch queue primarily).

Patrick