Patrick, inline.

-----Original Message-----
From: Patrick Hunt [mailto:ph...@apache.org] 
Sent: Thursday, July 30, 2009 9:13 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: test failures in branch-3.2

Todd Greenwood wrote:
> The build succeeds, but not the all of the tests. In previous test
runs,
> I noticed an error in org.apache.zookeeper.test.FLETest. It was not
able
> to bind to a port or something. Now, after a machine reboot, I'm
getting
> different failures. 

"address in use"? That's a problem in the test framework pre-3.3. In 3.3

(current svn trunk) I fixed it but it's not in 3.2.x. This is a problem 
with the test framework though and not a real problem, it shows up 
occasionally (depends on timing).

[Todd] Yes, I believe "address in use" was the problem w/ FLETest. I
assumed it was a timing issue w/ respect to test A not fully releasing
resources before test B started.

> branch-3.2 $ ant test
> 
> [junit] Test org.apache.zookeeper.server.quorum.QuorumPeerMainTest
> FAILED (crashed)
> [junit] Test org.apache.zookeeper.test.HierarchicalQuorumTest FAILED
> 
> Test logs for these two tests attached.

This is unusual though - looking at the log it seems that the JVM itself

crashed for the QPMainTest! for HQT we are seeing:

junit.framework.AssertionFailedError: Threads didn't join

which Flavio mentioned to me once is possible to happen but not a real 
problem (he can elaborate).

What version of java are you using? OS, other environment that might be 
interesting? (vm? etc...) You might try looking at the jvm crash dump 
file (I think it's in /tmp)

[Todd] ---------------------------
$ uname -a
Linux TODDG01LT 2.6.28-14-generic #47-Ubuntu SMP Sat Jul 25 01:19:55 UTC
2009 x86_64 GNU/Linux

$ which java
/home/toddg/bin/x64/java/jdk1.6.0_13/bin/java

$ java -version
java version "1.6.0_13"
Java(TM) SE Runtime Environment (build 1.6.0_13-b03)
Java HotSpot(TM) 64-Bit Server VM (build 11.3-b02, mixed mode)

Memory = 4GB
[Todd] ---------------------------

If you run each of these two tests individually do they run? example:
ant -Dtestcase=FLENewEpochTest test-core-java

[Todd] Will try this once my local build is working and report back.
I'll open a separate mail thread on applying patches.

> My goal here is to get to a known state (all tests succeeding or have
> workarounds for the failures). Following that, I plan to apply the
> patches Flavio recommended for a WAN deploy (479 and 481). After I
> verify that the tests continue to run, I'll package this up and deploy
> it to our WAN for testing. 

Sounds like a good plan.

> So, are these known issues? Do the tests normally run en masse, or do
> some of the tests hold on to resources and prevent other tests from
> passing?

Typically they do run to completion, but occasionally on my machine 
(java 1.6, linux32bit, 1.6g single core cpu, 1gigmem) I'll get some 
random failure due to address in use, or the same "didn't join" that you

saw. Usually I see this if I'm multitasking (vs just letting the tests 
run w/o using the box). As I said this is addressed in 3.3 (address 
reuse at the very least, and I haven't see the other issues).

Patrick


Reply via email to