RE: Zookeeper WAN Configuration

2009-07-30 Thread Todd Greenwood
Patrick - Thank you, I'll proceed accordingly. -Todd

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Wednesday, July 29, 2009 10:30 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: Zookeeper WAN Configuration

 [Todd] What is the recommended policy regarding patching zookeeper
 locally? As an external user, should I patch and compile in the trunk
or
 in the branch (branch-3.2)? 
 
 I've looked at :
 http://wiki.apache.org/hadoop/ZooKeeper/HowToContribute
 http://wiki.apache.org/hadoop/HowToRelease
 
 And both of these seem well thought out but aimed at commiters
commiting
 to the trunk. 
 

In your context (want 3.2 features) you probably want to build based on 
the 3.2 tag, that way you are working off a known quantity. I'd suggest 
strongly that as part of your build you document the source base and 
which patches/changes you have applied. Having this information will be 
critical for you (or someone using your build) in case bugs have to be 
filed, or further changes/patches have to be applied, etc...

Patrick


Re: bad svn url : test-patch

2009-07-30 Thread Mahadev Konar
Hi Todd,
  Yes this happens with the branch 3.2. The test-patch  link is broken
becasuse of the hadoop split. This file is used for hudson test environment.
It isnt used anywhere else, so the svn co otherwise should be fine. We
should fix it anyways.

Thanks
mahadev


On 7/30/09 2:57 PM, Todd Greenwood to...@audiencescience.com wrote:

 FYI - looks like there is a bad url in svn...
 
 $ svn co
 http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
 branch-3.2
 
 ...
 Abranch-3.2/build.xml
 
 Fetching external item into 'branch-3.2/src/java/test/bin'
 svn: URL
 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
 doesn't exist
 
 This does not repro w/ 3.1:
 
 $ svn co
 http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.1
 branch-3.1
 
 -Todd
 



RE: bad svn url : test-patch

2009-07-30 Thread Todd Greenwood
Thanks Mahadev.

-Original Message-
From: Mahadev Konar [mailto:maha...@yahoo-inc.com] 
Sent: Thursday, July 30, 2009 3:00 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: bad svn url : test-patch

Hi Todd,
  Yes this happens with the branch 3.2. The test-patch  link is broken
becasuse of the hadoop split. This file is used for hudson test
environment.
It isnt used anywhere else, so the svn co otherwise should be fine. We
should fix it anyways.

Thanks
mahadev


On 7/30/09 2:57 PM, Todd Greenwood to...@audiencescience.com wrote:

 FYI - looks like there is a bad url in svn...
 
 $ svn co
 http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.2
 branch-3.2
 
 ...
 Abranch-3.2/build.xml
 
 Fetching external item into 'branch-3.2/src/java/test/bin'
 svn: URL
 'http://svn.apache.org/repos/asf/hadoop/common/nightly/test-patch'
 doesn't exist
 
 This does not repro w/ 3.1:
 
 $ svn co
 http://svn.apache.org/repos/asf/hadoop/zookeeper/branches/branch-3.1
 branch-3.1
 
 -Todd
 



Re: test failures in branch-3.2

2009-07-30 Thread Flavio Junqueira

Todd,

On Jul 30, 2009, at 5:08 PM, Todd Greenwood wrote:

The build succeeds, but not the all of the tests. In previous test  
runs,
I noticed an error in org.apache.zookeeper.test.FLETest. It was not  
able
to bind to a port or something. Now, after a machine reboot, I'm  
getting

different failures.



This issue might be fixed in trunk, but not in the 3.2 distribution.


branch-3.2 $ ant test

[junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
FAILED (crashed)
[junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED



HierarchicalQuorumTest is supposed to fail until you apply the patches  
I mentioned. I don't know what could have caused the crash of the jvm  
in the other one.


-Flavio


Re: test failures in branch-3.2

2009-07-30 Thread Patrick Hunt
btw QuorumPeerMainTest uses the CONSOLE appender which is setup in 
conf/log4j.properties, now that I think of it perhaps not such a good 
idea :-)


If you edited cong/log4j.properties it may be causing the test to fail, 
did you do this? (if you run the test by itself using -Dtestcase does it 
always fail?)


I've entered a jira to address this:
https://issues.apache.org/jira/browse/ZOOKEEPER-492

Patrick

Patrick Hunt wrote:

Todd Greenwood wrote:

The build succeeds, but not the all of the tests. In previous test runs,
I noticed an error in org.apache.zookeeper.test.FLETest. It was not able
to bind to a port or something. Now, after a machine reboot, I'm getting
different failures. 


address in use? That's a problem in the test framework pre-3.3. In 3.3 
(current svn trunk) I fixed it but it's not in 3.2.x. This is a problem 
with the test framework though and not a real problem, it shows up 
occasionally (depends on timing).



branch-3.2 $ ant test

[junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
FAILED (crashed)
[junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED

Test logs for these two tests attached.


This is unusual though - looking at the log it seems that the JVM itself 
crashed for the QPMainTest! for HQT we are seeing:


junit.framework.AssertionFailedError: Threads didn't join

which Flavio mentioned to me once is possible to happen but not a real 
problem (he can elaborate).


What version of java are you using? OS, other environment that might be 
interesting? (vm? etc...) You might try looking at the jvm crash dump 
file (I think it's in /tmp)


If you run each of these two tests individually do they run? example:
ant -Dtestcase=FLENewEpochTest test-core-java


My goal here is to get to a known state (all tests succeeding or have
workarounds for the failures). Following that, I plan to apply the
patches Flavio recommended for a WAN deploy (479 and 481). After I
verify that the tests continue to run, I'll package this up and deploy
it to our WAN for testing. 


Sounds like a good plan.


So, are these known issues? Do the tests normally run en masse, or do
some of the tests hold on to resources and prevent other tests from
passing?


Typically they do run to completion, but occasionally on my machine 
(java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some 
random failure due to address in use, or the same didn't join that you 
saw. Usually I see this if I'm multitasking (vs just letting the tests 
run w/o using the box). As I said this is addressed in 3.3 (address 
reuse at the very least, and I haven't see the other issues).


Patrick




RE: test failures in branch-3.2

2009-07-30 Thread Todd Greenwood
No edits to conf/log4j.properties.

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, July 30, 2009 9:25 PM
To: Patrick Hunt
Cc: zookeeper-user@hadoop.apache.org
Subject: Re: test failures in branch-3.2

btw QuorumPeerMainTest uses the CONSOLE appender which is setup in 
conf/log4j.properties, now that I think of it perhaps not such a good 
idea :-)

If you edited cong/log4j.properties it may be causing the test to fail, 
did you do this? (if you run the test by itself using -Dtestcase does it

always fail?)

I've entered a jira to address this:
https://issues.apache.org/jira/browse/ZOOKEEPER-492

Patrick

Patrick Hunt wrote:
 Todd Greenwood wrote:
 The build succeeds, but not the all of the tests. In previous test
runs,
 I noticed an error in org.apache.zookeeper.test.FLETest. It was not
able
 to bind to a port or something. Now, after a machine reboot, I'm
getting
 different failures. 
 
 address in use? That's a problem in the test framework pre-3.3. In
3.3 
 (current svn trunk) I fixed it but it's not in 3.2.x. This is a
problem 
 with the test framework though and not a real problem, it shows up 
 occasionally (depends on timing).
 
 branch-3.2 $ ant test

 [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
 FAILED (crashed)
 [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED

 Test logs for these two tests attached.
 
 This is unusual though - looking at the log it seems that the JVM
itself 
 crashed for the QPMainTest! for HQT we are seeing:
 
 junit.framework.AssertionFailedError: Threads didn't join
 
 which Flavio mentioned to me once is possible to happen but not a real

 problem (he can elaborate).
 
 What version of java are you using? OS, other environment that might
be 
 interesting? (vm? etc...) You might try looking at the jvm crash dump 
 file (I think it's in /tmp)
 
 If you run each of these two tests individually do they run? example:
 ant -Dtestcase=FLENewEpochTest test-core-java
 
 My goal here is to get to a known state (all tests succeeding or have
 workarounds for the failures). Following that, I plan to apply the
 patches Flavio recommended for a WAN deploy (479 and 481). After I
 verify that the tests continue to run, I'll package this up and
deploy
 it to our WAN for testing. 
 
 Sounds like a good plan.
 
 So, are these known issues? Do the tests normally run en masse, or do
 some of the tests hold on to resources and prevent other tests from
 passing?
 
 Typically they do run to completion, but occasionally on my machine 
 (java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some 
 random failure due to address in use, or the same didn't join that
you 
 saw. Usually I see this if I'm multitasking (vs just letting the tests

 run w/o using the box). As I said this is addressed in 3.3 (address 
 reuse at the very least, and I haven't see the other issues).
 
 Patrick
 
 


Re: test failures in branch-3.2

2009-07-30 Thread Patrick Hunt
well try running these two tests individually and see if they always 
fail or just occassionally. that will be a good start (and the env detail).


Patrick

Todd Greenwood wrote:

No edits to conf/log4j.properties.

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, July 30, 2009 9:25 PM

To: Patrick Hunt
Cc: zookeeper-user@hadoop.apache.org
Subject: Re: test failures in branch-3.2

btw QuorumPeerMainTest uses the CONSOLE appender which is setup in 
conf/log4j.properties, now that I think of it perhaps not such a good 
idea :-)


If you edited cong/log4j.properties it may be causing the test to fail, 
did you do this? (if you run the test by itself using -Dtestcase does it


always fail?)

I've entered a jira to address this:
https://issues.apache.org/jira/browse/ZOOKEEPER-492

Patrick

Patrick Hunt wrote:

Todd Greenwood wrote:

The build succeeds, but not the all of the tests. In previous test

runs,

I noticed an error in org.apache.zookeeper.test.FLETest. It was not

able

to bind to a port or something. Now, after a machine reboot, I'm

getting
different failures. 

address in use? That's a problem in the test framework pre-3.3. In
3.3 

(current svn trunk) I fixed it but it's not in 3.2.x. This is a
problem 
with the test framework though and not a real problem, it shows up 
occasionally (depends on timing).



branch-3.2 $ ant test

[junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
FAILED (crashed)
[junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED

Test logs for these two tests attached.

This is unusual though - looking at the log it seems that the JVM
itself 

crashed for the QPMainTest! for HQT we are seeing:

junit.framework.AssertionFailedError: Threads didn't join

which Flavio mentioned to me once is possible to happen but not a real



problem (he can elaborate).

What version of java are you using? OS, other environment that might
be 
interesting? (vm? etc...) You might try looking at the jvm crash dump 
file (I think it's in /tmp)


If you run each of these two tests individually do they run? example:
ant -Dtestcase=FLENewEpochTest test-core-java


My goal here is to get to a known state (all tests succeeding or have
workarounds for the failures). Following that, I plan to apply the
patches Flavio recommended for a WAN deploy (479 and 481). After I
verify that the tests continue to run, I'll package this up and

deploy
it to our WAN for testing. 

Sounds like a good plan.


So, are these known issues? Do the tests normally run en masse, or do
some of the tests hold on to resources and prevent other tests from
passing?
Typically they do run to completion, but occasionally on my machine 
(java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some 
random failure due to address in use, or the same didn't join that
you 

saw. Usually I see this if I'm multitasking (vs just letting the tests


run w/o using the box). As I said this is addressed in 3.3 (address 
reuse at the very least, and I haven't see the other issues).


Patrick




RE: test failures in branch-3.2

2009-07-30 Thread Todd Greenwood
Patrick, inline.

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, July 30, 2009 9:13 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: test failures in branch-3.2

Todd Greenwood wrote:
 The build succeeds, but not the all of the tests. In previous test
runs,
 I noticed an error in org.apache.zookeeper.test.FLETest. It was not
able
 to bind to a port or something. Now, after a machine reboot, I'm
getting
 different failures. 

address in use? That's a problem in the test framework pre-3.3. In 3.3

(current svn trunk) I fixed it but it's not in 3.2.x. This is a problem 
with the test framework though and not a real problem, it shows up 
occasionally (depends on timing).

[Todd] Yes, I believe address in use was the problem w/ FLETest. I
assumed it was a timing issue w/ respect to test A not fully releasing
resources before test B started.

 branch-3.2 $ ant test
 
 [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
 FAILED (crashed)
 [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED
 
 Test logs for these two tests attached.

This is unusual though - looking at the log it seems that the JVM itself

crashed for the QPMainTest! for HQT we are seeing:

junit.framework.AssertionFailedError: Threads didn't join

which Flavio mentioned to me once is possible to happen but not a real 
problem (he can elaborate).

What version of java are you using? OS, other environment that might be 
interesting? (vm? etc...) You might try looking at the jvm crash dump 
file (I think it's in /tmp)

[Todd] ---
$ uname -a
Linux TODDG01LT 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC
2009 x86_64 GNU/Linux

$ which java
/home/toddg/bin/x64/java/jdk1.6.0_13/bin/java

$ java -version
java version 1.6.0_13
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

Memory = 4GB
[Todd] ---

If you run each of these two tests individually do they run? example:
ant -Dtestcase=FLENewEpochTest test-core-java

[Todd] Will try this once my local build is working and report back.
I'll open a separate mail thread on applying patches.

 My goal here is to get to a known state (all tests succeeding or have
 workarounds for the failures). Following that, I plan to apply the
 patches Flavio recommended for a WAN deploy (479 and 481). After I
 verify that the tests continue to run, I'll package this up and deploy
 it to our WAN for testing. 

Sounds like a good plan.

 So, are these known issues? Do the tests normally run en masse, or do
 some of the tests hold on to resources and prevent other tests from
 passing?

Typically they do run to completion, but occasionally on my machine 
(java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some 
random failure due to address in use, or the same didn't join that you

saw. Usually I see this if I'm multitasking (vs just letting the tests 
run w/o using the box). As I said this is addressed in 3.3 (address 
reuse at the very least, and I haven't see the other issues).

Patrick




RE: test failures in branch-3.2

2009-07-30 Thread Todd Greenwood
Patrick/Flavio -

Starting w/ branch-3.2 (no changes) I applied patches in this order:

1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails.
2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file -
PortAssignment.java.

PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch,
which is a pretty hefty patch ( 2k lines) and touches a large number of
files. 

3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm
crashes).

[junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[junit] Running
org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
FAILED (crashed)


Test Log

Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest
Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec 

Testcase: testBadPeerAddressInQuorum took 0.004 sec 
Caused an ERROR
Forked Java VM exited abnormally. Please note the time in the report
does not reflect the time until the VM exit.
junit.framework.AssertionFailedError: Forked Java VM exited abnormally.
Please note the time in the report does not reflect the time until the
VM exit.

-Todd

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, July 30, 2009 10:13 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: test failures in branch-3.2

Todd Greenwood wrote:
 
 [Todd] Yes, I believe address in use was the problem w/ FLETest. I
 assumed it was a timing issue w/ respect to test A not fully releasing
 resources before test B started.

Might be, but actually I think it's related to this:
http://hea-www.harvard.edu/~fine/Tech/addrinuse.html

Patrick


Re: test failures in branch-3.2

2009-07-30 Thread Patrick Hunt

Todd Greenwood wrote:

Starting w/ branch-3.2 (no changes) I applied patches in this order:

1. Apply ZOOKEEPER-479.patch. Builds, but HierarchicalQuorumTest fails.
2. Apply ZOOKEEPER-481.patch. Fails to build, b/c of missing file -
PortAssignment.java.

PortAssignment.java was added by Patrick as part of ZOOKEEPER-473.patch,
which is a pretty hefty patch ( 2k lines) and touches a large number of
files. 


Hrm, those patches were probably created against the trunk. We'll have 
to have separate patches for trunk and 3.2 branch on 481.


If you could update the jira with this detail (481 needs two patches, 
one for each branch) that would be great!



3. Apply ZOOKEEPER-473.patch. Builds, but QuorumPeerMainTest fails (jvm
crashes).


473 is special (unique) in the sense that it changes log4j while the 
the vm is running. In general though it's a pretty boring test and 
shouldn't be failing.


Are you sure you have the right patch file? there are 2 patch files on 
the JIRA for 473, make sure that you have the one from 7/16, NOT the one 
from 7/15. Check that the patch file, the correct one should NOT contain 
changes to build.xml or conf/log4j* files. If this still happens send me 
your build.xml, conf/log4j* and QuroumPeerMainTest.java files in email 
for review. I'll take a look.


Patrick



[junit] Running org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[junit] Running
org.apache.zookeeper.server.quorum.QuorumPeerMainTest
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
[junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
FAILED (crashed)


Test Log

Testsuite: org.apache.zookeeper.server.quorum.QuorumPeerMainTest
Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec 

Testcase: testBadPeerAddressInQuorum took 0.004 sec 
Caused an ERROR

Forked Java VM exited abnormally. Please note the time in the report
does not reflect the time until the VM exit.
junit.framework.AssertionFailedError: Forked Java VM exited abnormally.
Please note the time in the report does not reflect the time until the
VM exit.

-Todd

-Original Message-
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, July 30, 2009 10:13 PM

To: zookeeper-user@hadoop.apache.org
Subject: Re: test failures in branch-3.2

Todd Greenwood wrote:


[Todd] Yes, I believe address in use was the problem w/ FLETest. I
assumed it was a timing issue w/ respect to test A not fully releasing
resources before test B started.


Might be, but actually I think it's related to this:
http://hea-www.harvard.edu/~fine/Tech/addrinuse.html

Patrick