RE: Zookeeper WAN Configuration
Patrick - Thank you, I'll proceed accordingly. -Todd -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Wednesday, July 29, 2009 10:30 PM To: zookeeper-user@hadoop.apache.org Subject: Re: Zookeeper WAN Configuration [Todd] What is the recommended policy regarding patching zookeeper locally? As an external user, should I patch and compile in the trunk or in the branch (branch-3.2)? I've looked at : http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute http://wiki.apache.org/hadoop/HowToRelease And both of these seem well thought out but aimed at commiters commiting to the trunk. In your context (want 3.2 features) you probably want to build based on the 3.2 tag, that way you are working off a known quantity. I'd suggest strongly that as part of your build you document the source base and which patches/changes you have applied. Having this information will be critical for you (or someone using your build) in case bugs have to be filed, or further changes/patches have to be applied, etc... Patrick
Re: bad svn url : test-patch
Hi Todd, Yes this happens with the branch 3.2. The test-patch link is broken becasuse of the hadoop split. This file is used for hudson test environment. It isnt used anywhere else, so the svn co otherwise should be fine. We should fix it anyways. Thanks mahadev On 7/30/09 2:57 PM, Todd Greenwood to...@audiencescience.com wrote: FYI - looks like there is a bad url in svn... $ svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 branch-3.2 ... Abranch-3.2/build.xml Fetching external item into 'branch-3.2/src/java/test/bin' svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist This does not repro w/ 3.1: $ svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.1 branch-3.1 -Todd
RE: bad svn url : test-patch
Thanks Mahadev. -Original Message- From: Mahadev Konar [mailto:maha...@yahoo-inc.com] Sent: Thursday, July 30, 2009 3:00 PM To: zookeeper-user@hadoop.apache.org Subject: Re: bad svn url : test-patch Hi Todd, Yes this happens with the branch 3.2. The test-patch link is broken becasuse of the hadoop split. This file is used for hudson test environment. It isnt used anywhere else, so the svn co otherwise should be fine. We should fix it anyways. Thanks mahadev On 7/30/09 2:57 PM, Todd Greenwood to...@audiencescience.com wrote: FYI - looks like there is a bad url in svn... $ svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2 branch-3.2 ... Abranch-3.2/build.xml Fetching external item into 'branch-3.2/src/java/test/bin' svn: URL 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch' doesn't exist This does not repro w/ 3.1: $ svn co http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.1 branch-3.1 -Todd
Re: test failures in branch-3.2
Todd, On Jul 30, 2009, at 5:08 PM, Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. This issue might be fixed in trunk, but not in the 3.2 distribution. branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED HierarchicalQuorumTest is supposed to fail until you apply the patches I mentioned. I don't know what could have caused the crash of the jvm in the other one. -Flavio
Re: test failures in branch-3.2
btw QuorumPeerMainTest uses the CONSOLE appender which is setup in conf/log4j.properties, now that I think of it perhaps not such a good idea :-) If you edited cong/log4j.properties it may be causing the test to fail, did you do this? (if you run the test by itself using -Dtestcase does it always fail?) I've entered a jira to address this: https://issues.apache.org/jira/browse/ZOOKEEPER-492 Patrick Patrick Hunt wrote: Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. address in use? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same didn't join that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
RE: test failures in branch-3.2
No edits to conf/log4j.properties. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 9:25 PM To: Patrick Hunt Cc: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 btw QuorumPeerMainTest uses the CONSOLE appender which is setup in conf/log4j.properties, now that I think of it perhaps not such a good idea :-) If you edited cong/log4j.properties it may be causing the test to fail, did you do this? (if you run the test by itself using -Dtestcase does it always fail?) I've entered a jira to address this: https://issues.apache.org/jira/browse/ZOOKEEPER-492 Patrick Patrick Hunt wrote: Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. address in use? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same didn't join that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
Re: test failures in branch-3.2
well try running these two tests individually and see if they always fail or just occassionally. that will be a good start (and the env detail). Patrick Todd Greenwood wrote: No edits to conf/log4j.properties. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 9:25 PM To: Patrick Hunt Cc: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 btw QuorumPeerMainTest uses the CONSOLE appender which is setup in conf/log4j.properties, now that I think of it perhaps not such a good idea :-) If you edited cong/log4j.properties it may be causing the test to fail, did you do this? (if you run the test by itself using -Dtestcase does it always fail?) I've entered a jira to address this: https://issues.apache.org/jira/browse/ZOOKEEPER-492 Patrick Patrick Hunt wrote: Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. address in use? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same didn't join that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
RE: test failures in branch-3.2
Patrick, inline. -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 9:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: The build succeeds, but not the all of the tests. In previous test runs, I noticed an error in org.apache.zookeeper.test.FLETest. It was not able to bind to a port or something. Now, after a machine reboot, I'm getting different failures. address in use? That's a problem in the test framework pre-3.3. In 3.3 (current svn trunk) I fixed it but it's not in 3.2.x. This is a problem with the test framework though and not a real problem, it shows up occasionally (depends on timing). [Todd] Yes, I believe address in use was the problem w/ FLETest. I assumed it was a timing issue w/ respect to test A not fully releasing resources before test B started. branch-3.2 $ ant test [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED Test logs for these two tests attached. This is unusual though - looking at the log it seems that the JVM itself crashed for the QPMainTest! for HQT we are seeing: junit.framework.AssertionFailedError: Threads didn't join which Flavio mentioned to me once is possible to happen but not a real problem (he can elaborate). What version of java are you using? OS, other environment that might be interesting? (vm? etc...) You might try looking at the jvm crash dump file (I think it's in /tmp) [Todd] --- $ uname -a Linux TODDG01LT 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC 2009 x86_64 GNU/Linux $ which java /home/toddg/bin/x64/java/jdk1.6.0_13/bin/java $ java -version java version 1.6.0_13 Java(TM) SE Runtime Environment (build 1.6.0_13-b03) Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode) Memory = 4GB [Todd] --- If you run each of these two tests individually do they run? example: ant -Dtestcase=FLENewEpochTest test-core-java [Todd] Will try this once my local build is working and report back. I'll open a separate mail thread on applying patches. My goal here is to get to a known state (all tests succeeding or have workarounds for the failures). Following that, I plan to apply the patches Flavio recommended for a WAN deploy (479 and 481). After I verify that the tests continue to run, I'll package this up and deploy it to our WAN for testing. Sounds like a good plan. So, are these known issues? Do the tests normally run en masse, or do some of the tests hold on to resources and prevent other tests from passing? Typically they do run to completion, but occasionally on my machine (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some random failure due to address in use, or the same didn't join that you saw. Usually I see this if I'm multitasking (vs just letting the tests run w/o using the box). As I said this is addressed in 3.3 (address reuse at the very least, and I haven't see the other issues). Patrick
RE: test failures in branch-3.2
Patrick/Flavio - Starting w/ branch-3.2 (no changes) I applied patches in this order: 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - PortAssignment.java. PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, which is a pretty hefty patch ( 2k lines) and touches a large number of files. 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm crashes). [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) Test Log Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec Testcase: testBadPeerAddressInQuorum took 0.004 sec Caused an ERROR Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. -Todd -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: [Todd] Yes, I believe address in use was the problem w/ FLETest. I assumed it was a timing issue w/ respect to test A not fully releasing resources before test B started. Might be, but actually I think it's related to this: http://hea-www.harvard.edu/~fine/Tech/addrinuse.html Patrick
Re: test failures in branch-3.2
Todd Greenwood wrote: Starting w/ branch-3.2 (no changes) I applied patches in this order: 1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails. 2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file - PortAssignment.java. PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch, which is a pretty hefty patch ( 2k lines) and touches a large number of files. Hrm, those patches were probably created against the trunk. We'll have to have separate patches for trunk and 3.2 branch on 481. If you could update the jira with this detail (481 needs two patches, one for each branch) that would be great! 3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm crashes). 473 is special (unique) in the sense that it changes log4j while the the vm is running. In general though it's a pretty boring test and shouldn't be failing. Are you sure you have the right patch file? there are 2 patch files on the JIRA for 473, make sure that you have the one from 7/16, NOT the one from 7/15. Check that the patch file, the correct one should NOT contain changes to build.xml or conf/log4j* files. If this still happens send me your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email for review. I'll take a look. Patrick [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest FAILED (crashed) Test Log Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec Testcase: testBadPeerAddressInQuorum took 0.004 sec Caused an ERROR Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. -Todd -Original Message- From: Patrick Hunt [mailto:ph...@apache.org] Sent: Thursday, July 30, 2009 10:13 PM To: zookeeper-user@hadoop.apache.org Subject: Re: test failures in branch-3.2 Todd Greenwood wrote: [Todd] Yes, I believe address in use was the problem w/ FLETest. I assumed it was a timing issue w/ respect to test A not fully releasing resources before test B started. Might be, but actually I think it's related to this: http://hea-www.harvard.edu/~fine/Tech/addrinuse.html Patrick