Re: test failures in branch-3.2
Todd Greenwood wrote: On a plus note, I'm finding that this morning, @work rather than @home, the tests continue to completion. However, there are other issues that I'll bring up on the dev list, such as a requirement to have autoconf installed, and problems in the create-cppunit-configure task that can't exec libtoolize, fun stuff like tha. Great, good to hear. At some point figuring out what's up with your @home would be interesting to us. :-) Yes, there are some basic requirements such as autotool, cppunit, etc... but please do raise all this on the dev list. I need to proceed with the manual patches to branch-3.2, as I am under some time constraints to get our infrastructure deployed such that QA can start playing with it. However, I'll switch to 3.2.1 as soon as I can. Understood. Patrick -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Friday, July 31, 2009 11:38 AM To: zookeeper-user@hadoop.apache.org; Todd Greenwood Subject: Re: test failures in branch-3.2 Hi Todd, Sorry for the clutter/confusion. Usually things aren't this cumbersome ;-) In particular: 1 committer is on vacation Mahadev's been out sick for multiple days I'm sick but trying to hang in there, but def not 100% Hudson (CI) has been offline for effectively the past 3 weeks (that gates all our commits) and is just now back but flaky. 3.2 had some bugs that we are trying to address, but the afore mentioned issues are slowing us down. Otw we'd have all this straightened out by now At this point you should move this discussion to the dev list - Apache doesn't really like us to discuss code changes/futures here (user list). On that list you'll also see the plan for upcoming releases - I mention all this because we are actively working toward 3.2.1 which will include the JIRAs slated for that release (I'm sure you've seen). If you can wait a bit you might be able to avoid some pain by using the upcoming 3.2.1 release. Once the patches land into that branch your issues will be resolved w/o you needing to manually apply patches, etc... I did look at the files you attached - it looks fine so I'm not sure the issue. The form of this test makes it harder - we are verifying that the log contains sufficient information when a particular error occurs. We fiddle with log4j in order to do this, which means that the log you are including doesn't specify the problem. Try instrumenting this test with a try/catch around the content of the test method (all the code in the failing method inside a big try/catch is what I mean). Then print the error to std out as part of the catch. That should shed some light. If you could debug it a bit that would help - because we aren't seeing this in our environment. Again, sort of a moot point if you can wait a week or so... Regards, Patrick Todd Greenwood wrote: Inline. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:57 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: Starting w/ branch-3.2 (no changes) I applied patches in this order: 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - PortAssignment.java. PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, which is a pretty hefty patch (> 2k lines) and touches a large number of files. Hrm, those patches were probably created against the trunk. We'll have to have separate patches for trunk and 3.2 branch on 481. If you could update the jira with this detail (481 needs two patches, one for each branch) that would be great! Done. 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm crashes). 473 is "special" (unique) in the sense that it changes log4j while the the vm is running. In general though it's a pretty boring test and shouldn't be failing. Are you sure you have the right patch file? there are 2 patch files on the JIRA for 473, make sure that you have the one from 7/16, NOT the one from 7/15. Check that the patch file, the correct one should NOT contain changes to build.xml or conf/log4j* files. If this still happens send me your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email for review. I'll take a look. I've annotated the files w/ their date while downloading: 112700 2009-07-31 11:02 ZOOKEEPER-473-7-15.patch 110607 2009-07-31 11:01 ZOOKEEPER-473-7-16.patch It appears I applied the 7-16 patch, as that is the matching file size of the patch file I applied. If there are to be multiple patch files for multiple branches (3.2, trunk, etc.) would it make sense to lable the patch files accordingly? Requested files in attache
RE: test failures in branch-3.2
Patrick, Thank you for the background (and I hope you and Mahadev recover quickly). On a plus note, I'm finding that this morning, @work rather than @home, the tests continue to completion. However, there are other issues that I'll bring up on the dev list, such as a requirement to have autoconf installed, and problems in the create-cppunit-configure task that can't exec libtoolize, fun stuff like tha. I need to proceed with the manual patches to branch-3.2, as I am under some time constraints to get our infrastructure deployed such that QA can start playing with it. However, I'll switch to 3.2.1 as soon as I can. -Todd > -Original Message- > From: Patrick Hunt [mailto:ph...@apache.org] > Sent: Friday, July 31, 2009 11:38 AM > To: zookeeper-user@hadoop.apache.org; Todd Greenwood > Subject: Re: test failures in branch-3.2 > > Hi Todd, > > Sorry for the clutter/confusion. Usually things aren't this cumbersome ;-) > > In particular: >1 committer is on vacation >Mahadev's been out sick for multiple days >I'm sick but trying to hang in there, but def not 100% > > Hudson (CI) has been offline for effectively the past 3 weeks (that > gates all our commits) and is just now back but flaky. > > 3.2 had some bugs that we are trying to address, but the afore mentioned > issues are slowing us down. Otw we'd have all this straightened out by > now > > At this point you should move this discussion to the dev list - Apache > doesn't really like us to discuss code changes/futures here (user list). > On that list you'll also see the plan for upcoming releases - I mention > all this because we are actively working toward 3.2.1 which will include > the JIRAs slated for that release (I'm sure you've seen). > > If you can wait a bit you might be able to avoid some pain by using the > upcoming 3.2.1 release. Once the patches land into that branch your > issues will be resolved w/o you needing to manually apply patches, etc... > > > I did look at the files you attached - it looks fine so I'm not sure the > issue. The form of this test makes it harder - we are verifying that the > log contains sufficient information when a particular error occurs. We > fiddle with log4j in order to do this, which means that the log you are > including doesn't specify the problem. > > Try instrumenting this test with a try/catch around the content of the > test method (all the code in the failing method inside a big try/catch > is what I mean). Then print the error to std out as part of the catch. > That should shed some light. If you could debug it a bit that would help > - because we aren't seeing this in our environment. > > Again, sort of a moot point if you can wait a week or so... > > Regards, > > Patrick > > Todd Greenwood wrote: > > Inline. > > > >> -Original Message- > >> From: Patrick Hunt [mailto:ph...@apache.org] > >> Sent: Thursday, July 30, 2009 10:57 PM > >> To: zookeeper-user@hadoop.apache.org > >> Subject: Re: test failures in branch-3.2 > >> > >> Todd Greenwood wrote: > >>> Starting w/ branch-3.2 (no changes) I applied patches in this order: > >>> > >>> 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest > > fails. > >>> 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - > >>> PortAssignment.java. > >>> > >>> PortAssignment.java was added by Patrick as part of > > ZOOKEEPER-473.patch, > >>> which is a pretty hefty patch (> 2k lines) and touches a large > > number of > >>> files. > >> Hrm, those patches were probably created against the trunk. We'll have > >> to have separate patches for trunk and 3.2 branch on 481. > >> > >> If you could update the jira with this detail (481 needs two patches, > >> one for each branch) that would be great! > >> > > > > Done. > > > >>> 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails > > (jvm > >>> crashes). > >> 473 is "special" (unique) in the sense that it changes log4j while the > >> the vm is running. In general though it's a pretty boring test and > >> shouldn't be failing. > >> > >> Are you sure you have the right patch file? there are 2 patch files on > >> the JIRA for 473, make sure that you have the one from 7/16, NOT the > > one > >> from 7/15. Check that the patch file, the correct one should NOT > > contain > >> changes to build.xml or conf/log4j* file
Re: test failures in branch-3.2
Hi Todd, Sorry for the clutter/confusion. Usually things aren't this cumbersome ;-) In particular: 1 committer is on vacation Mahadev's been out sick for multiple days I'm sick but trying to hang in there, but def not 100% Hudson (CI) has been offline for effectively the past 3 weeks (that gates all our commits) and is just now back but flaky. 3.2 had some bugs that we are trying to address, but the afore mentioned issues are slowing us down. Otw we'd have all this straightened out by now At this point you should move this discussion to the dev list - Apache doesn't really like us to discuss code changes/futures here (user list). On that list you'll also see the plan for upcoming releases - I mention all this because we are actively working toward 3.2.1 which will include the JIRAs slated for that release (I'm sure you've seen). If you can wait a bit you might be able to avoid some pain by using the upcoming 3.2.1 release. Once the patches land into that branch your issues will be resolved w/o you needing to manually apply patches, etc... I did look at the files you attached - it looks fine so I'm not sure the issue. The form of this test makes it harder - we are verifying that the log contains sufficient information when a particular error occurs. We fiddle with log4j in order to do this, which means that the log you are including doesn't specify the problem. Try instrumenting this test with a try/catch around the content of the test method (all the code in the failing method inside a big try/catch is what I mean). Then print the error to std out as part of the catch. That should shed some light. If you could debug it a bit that would help - because we aren't seeing this in our environment. Again, sort of a moot point if you can wait a week or so... Regards, Patrick Todd Greenwood wrote: Inline. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:57 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: Starting w/ branch-3.2 (no changes) I applied patches in this order: 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - PortAssignment.java. PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, which is a pretty hefty patch (> 2k lines) and touches a large number of files. Hrm, those patches were probably created against the trunk. We'll have to have separate patches for trunk and 3.2 branch on 481. If you could update the jira with this detail (481 needs two patches, one for each branch) that would be great! Done. 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm crashes). 473 is "special" (unique) in the sense that it changes log4j while the the vm is running. In general though it's a pretty boring test and shouldn't be failing. Are you sure you have the right patch file? there are 2 patch files on the JIRA for 473, make sure that you have the one from 7/16, NOT the one from 7/15. Check that the patch file, the correct one should NOT contain changes to build.xml or conf/log4j* files. If this still happens send me your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email for review. I'll take a look. I've annotated the files w/ their date while downloading: 112700 2009-07-31 11:02 ZOOKEEPER-473-7-15.patch 110607 2009-07-31 11:01 ZOOKEEPER-473-7-16.patch It appears I applied the 7-16 patch, as that is the matching file size of the patch file I applied. If there are to be multiple patch files for multiple branches (3.2, trunk, etc.) would it make sense to lable the patch files accordingly? Requested files in attached tar. -Todd Patrick [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) Test Log Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec Testcase: testBadPeerAddressInQuorum took 0.004 sec Caused an ERROR Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. -Todd -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: [Todd] Yes, I believe "addr
RE: test failures in branch-3.2
Inline. > -Original Message- > From: Patrick Hunt [mailto:ph...@apache.org] > Sent: Thursday, July 30, 2009 10:57 PM > To: zookeeper-user@hadoop.apache.org > Subject: Re: test failures in branch-3.2 > > Todd Greenwood wrote: > > Starting w/ branch-3.2 (no changes) I applied patches in this order: > > > > 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. > > 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - > > PortAssignment.java. > > > > PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, > > which is a pretty hefty patch (> 2k lines) and touches a large number of > > files. > > Hrm, those patches were probably created against the trunk. We'll have > to have separate patches for trunk and 3.2 branch on 481. > > If you could update the jira with this detail (481 needs two patches, > one for each branch) that would be great! > Done. > > 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm > > crashes). > > 473 is "special" (unique) in the sense that it changes log4j while the > the vm is running. In general though it's a pretty boring test and > shouldn't be failing. > > Are you sure you have the right patch file? there are 2 patch files on > the JIRA for 473, make sure that you have the one from 7/16, NOT the one > from 7/15. Check that the patch file, the correct one should NOT contain > changes to build.xml or conf/log4j* files. If this still happens send me > your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email > for review. I'll take a look. > I've annotated the files w/ their date while downloading: 112700 2009-07-31 11:02 ZOOKEEPER-473-7-15.patch 110607 2009-07-31 11:01 ZOOKEEPER-473-7-16.patch It appears I applied the 7-16 patch, as that is the matching file size of the patch file I applied. If there are to be multiple patch files for multiple branches (3.2, trunk, etc.) would it make sense to lable the patch files accordingly? Requested files in attached tar. -Todd > Patrick > > > > [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest > > [junit] Running > > org.apache.zookeeper.server.quorum.QuorumPeerMainTest > > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > > [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest > > FAILED (crashed) > > > > > > Test Log > > > > Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest > > Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > > > > Testcase: testBadPeerAddressInQuorum took 0.004 sec > > Caused an ERROR > > Forked Java VM exited abnormally. Please note the time in the report > > does not reflect the time until the VM exit. > > junit.framework.AssertionFailedError: Forked Java VM exited abnormally. > > Please note the time in the report does not reflect the time until the > > VM exit. > > > > -Todd > > > > -Original Message- > > From: Patrick Hunt [mailto:ph...@apache.org] > > Sent: Thursday, July 30, 2009 10:13 PM > > To: zookeeper-user@hadoop.apache.org > > Subject: Re: test failures in branch-3.2 > > > > Todd Greenwood wrote: > >> > >> [Todd] Yes, I believe "address in use" was the problem w/ FLETest. I > >> assumed it was a timing issue w/ respect to test A not fully releasing > >> resources before test B started. > > > > Might be, but actually I think it's related to this: > > http://hea-www.harvard.edu/~fine/Tech/addrinuse.html > > > > Patrick patch-verification-473.tar.gz Description: patch-verification-473.tar.gz
Re: test failures in branch-3.2
Todd Greenwood wrote: Starting w/ branch-3.2 (no changes) I applied patches in this order: 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - PortAssignment.java. PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, which is a pretty hefty patch (> 2k lines) and touches a large number of files. Hrm, those patches were probably created against the trunk. We'll have to have separate patches for trunk and 3.2 branch on 481. If you could update the jira with this detail (481 needs two patches, one for each branch) that would be great! 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm crashes). 473 is "special" (unique) in the sense that it changes log4j while the the vm is running. In general though it's a pretty boring test and shouldn't be failing. Are you sure you have the right patch file? there are 2 patch files on the JIRA for 473, make sure that you have the one from 7/16, NOT the one from 7/15. Check that the patch file, the correct one should NOT contain changes to build.xml or conf/log4j* files. If this still happens send me your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email for review. I'll take a look. Patrick [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) Test Log Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec Testcase: testBadPeerAddressInQuorum took 0.004 sec Caused an ERROR Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. -Todd -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: [Todd] Yes, I believe "address in use" was the problem w/ FLETest. I assumed it was a timing issue w/ respect to test A not fully releasing resources before test B started. Might be, but actually I think it's related to this: http://hea-www.harvard.edu/~fine/Tech/addrinuse.html Patrick
RE: test failures in branch-3.2
Patrick/Flavio - Starting w/ branch-3.2 (no changes) I applied patches in this order: 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - PortAssignment.java. PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, which is a pretty hefty patch (> 2k lines) and touches a large number of files. 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm crashes). [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) Test Log Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec Testcase: testBadPeerAddressInQuorum took 0.004 sec Caused an ERROR Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. -Todd -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: > > [Todd] Yes, I believe "address in use" was the problem w/ FLETest. I > assumed it was a timing issue w/ respect to test A not fully releasing > resources before test B started. Might be, but actually I think it's related to this: http://hea-www.harvard.edu/~fine/Tech/addrinuse.html Patrick
Re: test failures in branch-3.2
Todd Greenwood wrote: [Todd] Yes, I believe "address in use" was the problem w/ FLETest. I assumed it was a timing issue w/ respect to test A not fully releasing resources before test B started. Might be, but actually I think it's related to this: http://hea-www.harvard.edu/~fine/Tech/addrinuse.html Patrick
RE: test failures in branch-3.2
Patrick, inline. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 9:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: > The build succeeds, but not the all of the tests. In previous test runs, > I noticed an error in org.apache.zookeeper.test.FLETest. It was not able > to bind to a port or something. Now, after a machine reboot, I'm getting > different failures. "address in use"? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). [Todd] Yes, I believe "address in use" was the problem w/ FLETest. I assumed it was a timing issue w/ respect to test A not fully releasing resources before test B started. > branch-3.2 $ ant test > > [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest > FAILED (crashed) > [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED > > Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) [Todd] --- $ uname -a Linux TODDG01LT 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 x86_64 GNU/Linux $ which java /home/toddg/bin/x64/java/jdk1.6.0_13/bin/java $ java -version java version "1.6.0_13" Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) Memory = 4GB [Todd] --- If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java [Todd] Will try this once my local build is working and report back. I'll open a separate mail thread on applying patches. > My goal here is to get to a known state (all tests succeeding or have > workarounds for the failures). Following that, I plan to apply the > patches Flavio recommended for a WAN deploy (479 and 481). After I > verify that the tests continue to run, I'll package this up and deploy > it to our WAN for testing. Sounds like a good plan. > So, are these known issues? Do the tests normally run en masse, or do > some of the tests hold on to resources and prevent other tests from > passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same "didn't join" that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
Re: test failures in branch-3.2
well try running these two tests individually and see if they always fail or just occassionally. that will be a good start (and the env detail). Patrick Todd Greenwood wrote: No edits to conf/log4j.properties. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 9:25 PM To: Patrick Hunt Cc: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 btw QuorumPeerMainTest uses the CONSOLE appender which is setup in conf/log4j.properties, now that I think of it perhaps not such a good idea :-) If you edited cong/log4j.properties it may be causing the test to fail, did you do this? (if you run the test by itself using -Dtestcase does it always fail?) I've entered a jira to address this: https://issues.apache.org/jira/browse/ZOOKEEPER-492 Patrick Patrick Hunt wrote: Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. "address in use"? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same "didn't join" that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
RE: test failures in branch-3.2
No edits to conf/log4j.properties. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 9:25 PM To: Patrick Hunt Cc: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 btw QuorumPeerMainTest uses the CONSOLE appender which is setup in conf/log4j.properties, now that I think of it perhaps not such a good idea :-) If you edited cong/log4j.properties it may be causing the test to fail, did you do this? (if you run the test by itself using -Dtestcase does it always fail?) I've entered a jira to address this: https://issues.apache.org/jira/browse/ZOOKEEPER-492 Patrick Patrick Hunt wrote: > Todd Greenwood wrote: >> The build succeeds, but not the all of the tests. In previous test runs, >> I noticed an error in org.apache.zookeeper.test.FLETest. It was not able >> to bind to a port or something. Now, after a machine reboot, I'm getting >> different failures. > > "address in use"? That's a problem in the test framework pre-3.3. In 3.3 > (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem > with the test framework though and not a real problem, it shows up > occasionally (depends on timing). > >> branch-3.2 $ ant test >> >> [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest >> FAILED (crashed) >> [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED >> >> Test logs for these two tests attached. > > This is unusual though - looking at the log it seems that the JVM itself > crashed for the QPMainTest! for HQT we are seeing: > > junit.framework.AssertionFailedError: Threads didn't join > > which Flavio mentioned to me once is possible to happen but not a real > problem (he can elaborate). > > What version of java are you using? OS, other environment that might be > interesting? (vm? etc...) You might try looking at the jvm crash dump > file (I think it's in /tmp) > > If you run each of these two tests individually do they run? example: > ant -Dtestcase=FLENewEpochTest test-core-java > >> My goal here is to get to a known state (all tests succeeding or have >> workarounds for the failures). Following that, I plan to apply the >> patches Flavio recommended for a WAN deploy (479 and 481). After I >> verify that the tests continue to run, I'll package this up and deploy >> it to our WAN for testing. > > Sounds like a good plan. > >> So, are these known issues? Do the tests normally run en masse, or do >> some of the tests hold on to resources and prevent other tests from >> passing? > > Typically they do run to completion, but occasionally on my machine > (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some > random failure due to address in use, or the same "didn't join" that you > saw. Usually I see this if I'm multitasking (vs just letting the tests > run w/o using the box). As I said this is addressed in 3.3 (address > reuse at the very least, and I haven't see the other issues). > > Patrick > >
Re: test failures in branch-3.2
btw QuorumPeerMainTest uses the CONSOLE appender which is setup in conf/log4j.properties, now that I think of it perhaps not such a good idea :-) If you edited cong/log4j.properties it may be causing the test to fail, did you do this? (if you run the test by itself using -Dtestcase does it always fail?) I've entered a jira to address this: https://issues.apache.org/jira/browse/ZOOKEEPER-492 Patrick Patrick Hunt wrote: Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. "address in use"? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same "didn't join" that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
Re: test failures in branch-3.2
Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. "address in use"? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same "didn't join" that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
Re: test failures in branch-3.2
Todd, On Jul 30, 2009, at 5:08 PM, Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. This issue might be fixed in trunk, but not in the 3.2 distribution. branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED HierarchicalQuorumTest is supposed to fail until you apply the patches I mentioned. I don't know what could have caused the crash of the jvm in the other one. -Flavio
Re: test
Testing please ignore. mahadev On 8/12/08 3:46 PM, "Patrick Hunt" <[EMAIL PROTECTED]> wrote: > just a test, please ignore