Hi Bill,

Here are the details of the problem with Tomcat 3.3 and Cactus.

Apparently, Cactus's sample test suite run against Tomcat 3.3 will
occasionally fail on the main Gump system (a 300Mhz system running
Linux) and reliably fails Vincent Massol's laptop (1Gig+ system
running Windows XP).  The failure is the same for both.  In the
"testPostMethod" test, the following exception is thrown on by the
Cactus client:

java.net.SocketException: Connection aborted by peer: JVM_recv in
            socket input stream read
    at java.net.SocketInputStream.socketRead(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:86)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:186)
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:225)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:280)
    at java.io.FilterInputStream.read(FilterInputStream.java:114)
    at java.io.PushbackInputStream.read(PushbackInputStream.java:164)
    at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:649)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:613)
    at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:621)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream
            (HttpURLConnection.java:506)
    at org.apache.cactus.client.AutoReadHttpURLConnection.getInputStream
            (AutoReadHttpURLConnection.java;
             org/apache/cactus/util/log/LogAspect.java(1k):123)
    at org.apache.cactus.client.AbstractHttpClient.callRunTest
            (AbstractHttpClient.java;
             org/apache/cactus/util/log/LogAspect.java(1k):192)
    at org.apache.cactus.client.AbstractHttpClient.doTest$ajcPostAround10
            (AbstractHttpClient.java;
             org/apache/cactus/util/log/LogAspect.java(1k):119)
    at org.apache.cactus.client.AbstractHttpClient.doTest
            (AbstractHttpClient.java;
             org/apache/cactus/util/log/LogAspect.java(1k):1204)
    at org.apache.cactus.AbstractTestCase.runGenericTest
            (AbstractTestCase.java:437)
    at org.apache.cactus.ServletTestCase.runTest(ServletTestCase.java:130)
    at org.apache.cactus.AbstractTestCase.runBare(AbstractTestCase.java:385)
    at junit.framework.TestResult$1.protect(TestResult.java:106)
    at junit.framework.TestResult.runProtected(TestResult.java:124)
    at junit.framework.TestResult.run(TestResult.java:109)
    at junit.framework.TestCase.run(TestCase.java:131)
    at junit.framework.TestSuite.runTest(TestSuite.java:173)
    at junit.framework.TestSuite.run(TestSuite.java:168)
    at junit.framework.TestSuite.runTest(TestSuite.java:173)
    at junit.framework.TestSuite.run(TestSuite.java:168)
    at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run
            (JUnitTestRunner.java:231)
    at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main
            (JUnitTestRunner.java:409)

I have read in Microsoft documentation that it is possible for the sending end of
a socket to be closed before the receiving end retrieves all of the data.  The
receiving end can get a "socket is closed" error (I forget the symbol) and not be
able to retrieve all of the data.  I was able to see this in action when working
on the Jserv connector a while ago.  The client socket was busy trying to send
some post data that wasn't going to be read.  When Tomcat, 3.2.x in this case,
closed the socket at the end of handling the POST request, the client received an
error and wasn't able to read the response.  I believe the form of error that
occurs is one that leads to the "Connection aborted by peer" as seen above.

My assumption is that the same kind of thing is happening to Cactus, but in
a simpler situation.  The Tomcat side shows no errors occurring.  Inspecting
source for sun.net.www.http.HttpClient, the exception is occuring while it is
trying to read the "HTTP/..." at the beginning of the response.  

The "testPostMethod" is executed by a controller servlet, but as far as I can tell,
no POST data is sent and a minimal response is returned.  Introducing a sleep(0)
on the Tomcat side just before the socket.close() appeared to fix the problem on
Vincent's laptop.  My assumption is that this forces the Tomcat thread to yield
and give the client thread a chance to receive the response before the socket gets
closed.  Unfortunately, this sleep(0) causes the failure on Gump to go from
infrequent to frequent.  My second round of kluging didn't help, so I reverted to
the sleep(0) last night. 

I know very little about *nix internal operation and can't explain why sleep(0)
causes harm.  On my RH Linux 7.1 at home, I'm not able to get sleep(0) to cause the
Cactus failure, but sleep(1) consistently does.  I assume the threading of Linux
sockets is just different from the way it is handled on Windows.  Unfortunately I
am unable to duplicate this failure on my Windows system as it is a measly
500Mhz PIII.

I don't see that Tomcat 3.3 is doing anything wrong.  I suspect that Tomcat
3.2.x doesn't show the failure because it has enough extra overhead to make
the failure very unlikely.  I haven't investigated enough with Tomcat 4.x or
Resin to see if anything, such as "keep-alive", is coming into play to
explain not seeing the failure.  Also, my understanding of all that
Cactus is doing isn't complete yet.

If you, or anyone else, has any additional insight as to what is going on or
how to investigate or address this, I would be very interested.  However,
unless something simple is found quickly, I don't plan on holding up
3.3.1-beta1 just for this.

Cheers,
Larry

P.S. Thanks for fixing Bug 6234

P.P.S. Full logs of the failure may be downloaded using:

<http://www.apache.org/~larryi/cactus_client.log>
<http://www.apache.org/~larryi/cactus_server.log>
<http://www.apache.org/~larryi/tomcat.log>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to