Patrick, Thank you for the background (and I hope you and Mahadev recover quickly).
On a plus note, I'm finding that this morning, @work rather than @home, the tests continue to completion. However, there are other issues that I'll bring up on the dev list, such as a requirement to have autoconf installed, and problems in the create-cppunit-configure task that can't exec libtoolize, fun stuff like tha. I need to proceed with the manual patches to branch-3.2, as I am under some time constraints to get our infrastructure deployed such that QA can start playing with it. However, I'll switch to 3.2.1 as soon as I can. -Todd > -----Original Message----- > From: Patrick Hunt [mailto:ph...@apache.org] > Sent: Friday, July 31, 2009 11:38 AM > To: zookeeper-user@hadoop.apache.org; Todd Greenwood > Subject: Re: test failures in branch-3.2 > > Hi Todd, > > Sorry for the clutter/confusion. Usually things aren't this cumbersome ;-) > > In particular: > 1 committer is on vacation > Mahadev's been out sick for multiple days > I'm sick but trying to hang in there, but def not 100% > > Hudson (CI) has been offline for effectively the past 3 weeks (that > gates all our commits) and is just now back but flaky. > > 3.2 had some bugs that we are trying to address, but the afore mentioned > issues are slowing us down. Otw we'd have all this straightened out by > now .... > > At this point you should move this discussion to the dev list - Apache > doesn't really like us to discuss code changes/futures here (user list). > On that list you'll also see the plan for upcoming releases - I mention > all this because we are actively working toward 3.2.1 which will include > the JIRAs slated for that release (I'm sure you've seen). > > If you can wait a bit you might be able to avoid some pain by using the > upcoming 3.2.1 release. Once the patches land into that branch your > issues will be resolved w/o you needing to manually apply patches, etc... > > > I did look at the files you attached - it looks fine so I'm not sure the > issue. The form of this test makes it harder - we are verifying that the > log contains sufficient information when a particular error occurs. We > fiddle with log4j in order to do this, which means that the log you are > including doesn't specify the problem. > > Try instrumenting this test with a try/catch around the content of the > test method (all the code in the failing method inside a big try/catch > is what I mean). Then print the error to std out as part of the catch. > That should shed some light. If you could debug it a bit that would help > - because we aren't seeing this in our environment. > > Again, sort of a moot point if you can wait a week or so... > > Regards, > > Patrick > > Todd Greenwood wrote: > > Inline. > > > >> -----Original Message----- > >> From: Patrick Hunt [mailto:ph...@apache.org] > >> Sent: Thursday, July 30, 2009 10:57 PM > >> To: zookeeper-user@hadoop.apache.org > >> Subject: Re: test failures in branch-3.2 > >> > >> Todd Greenwood wrote: > >>> Starting w/ branch-3.2 (no changes) I applied patches in this order: > >>> > >>> 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest > > fails. > >>> 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - > >>> PortAssignment.java. > >>> > >>> PortAssignment.java was added by Patrick as part of > > ZOOKEEPER-473.patch, > >>> which is a pretty hefty patch (> 2k lines) and touches a large > > number of > >>> files. > >> Hrm, those patches were probably created against the trunk. We'll have > >> to have separate patches for trunk and 3.2 branch on 481. > >> > >> If you could update the jira with this detail (481 needs two patches, > >> one for each branch) that would be great! > >> > > > > Done. > > > >>> 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails > > (jvm > >>> crashes). > >> 473 is "special" (unique) in the sense that it changes log4j while the > >> the vm is running. In general though it's a pretty boring test and > >> shouldn't be failing. > >> > >> Are you sure you have the right patch file? there are 2 patch files on > >> the JIRA for 473, make sure that you have the one from 7/16, NOT the > > one > >> from 7/15. Check that the patch file, the correct one should NOT > > contain > >> changes to build.xml or conf/log4j* files. If this still happens send > > me > >> your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email > >> for review. I'll take a look. > >> > > > > > > I've annotated the files w/ their date while downloading: > > 112700 2009-07-31 11:02 ZOOKEEPER-473-7-15.patch > > 110607 2009-07-31 11:01 ZOOKEEPER-473-7-16.patch > > > > It appears I applied the 7-16 patch, as that is the matching file size > > of the patch file I applied. > > > > If there are to be multiple patch files for multiple branches (3.2, > > trunk, etc.) would it make sense to lable the patch files accordingly? > > > > Requested files in attached tar. > > > > -Todd > > > >> Patrick > >> > >> > >>> [junit] Running > > org.apache.zookeeper.server.quorum.QuorumPeerMainTest > >>> [junit] Running > >>> org.apache.zookeeper.server.quorum.QuorumPeerMainTest > >>> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 > > sec > >>> [junit] Test > > org.apache.zookeeper.server.quorum.QuorumPeerMainTest > >>> FAILED (crashed) > >>> > >>> ------------ > >>> Test Log > >>> ------------ > >>> Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest > >>> Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > >>> > >>> Testcase: testBadPeerAddressInQuorum took 0.004 sec > >>> Caused an ERROR > >>> Forked Java VM exited abnormally. Please note the time in the report > >>> does not reflect the time until the VM exit. > >>> junit.framework.AssertionFailedError: Forked Java VM exited > > abnormally. > >>> Please note the time in the report does not reflect the time until > > the > >>> VM exit. > >>> > >>> -Todd > >>> > >>> -----Original Message----- > >>> From: Patrick Hunt [mailto:ph...@apache.org] > >>> Sent: Thursday, July 30, 2009 10:13 PM > >>> To: zookeeper-user@hadoop.apache.org > >>> Subject: Re: test failures in branch-3.2 > >>> > >>> Todd Greenwood wrote: > >>>> .... > >>>> [Todd] Yes, I believe "address in use" was the problem w/ FLETest. > > I > >>>> assumed it was a timing issue w/ respect to test A not fully > > releasing > >>>> resources before test B started. > >>> Might be, but actually I think it's related to this: > >>> http://hea-www.harvard.edu/~fine/Tech/addrinuse.html > >>> > >>> Patrick