On 12/11/2020 14:19, Eric Robinson wrote:
>> From: Mark Thomas <ma...@apache.org>

<snip/>

>> I keep coming back to this. Something triggered this problem (note that
>> trigger not necessarily the same as root cause). Given that the app, Tomcat
>> and JVM versions didn't change that again points to some other component.
>>
> 
> Perfectly understandable. It's the oldest question in the diagnostic 
> playbook. What changed? I wish I had an answer. Whatever it was, if impacted 
> both upstream servers.
> 
>> Picking just one of the wild ideas I've had is there some sort of firewall, 
>> IDS,
>> IPS etc. that might be doing connection tracking and is, for some reason,
>> getting it wrong and closing the connection in error?
>>
> 
> Three is no firewall or IDS software running on the upstreams. The only thing 
> that comes to mind that may have been installed during that timeframe is 
> Sophos antivirus and Solar Winds RMM. Sophos was the first thing I disabled 
> when I saw the packet issues.

ACK.

>>>> The aim with this logging is to provide evidence of whether or not
>>>> there is a file descriptor handling problem in the JRE. My
>>>> expectation is that with these logs we will have reached the limit of
>>>> what we can do with Tomcat but will be able to point you in the right
>> direction for further investigation.

I've had a chance to review these logs.

To answer your (offlist) question about the HTTP/1.1 vs. HTTP/1.0 in the
Nginx logs I *think* the Nginx logs are showing that the request
received by Nginx is using HTTP/1.1.

The logging does not indicate any issue with Java's handling of file
descriptors. The file descriptor associated with the socket where the
request fails is only observed to be associated with the socket where
the request fails. There is no indication that the file descriptor is
corrupted nor is there any indication that another thread tries to use
the same file descriptor.

I dug a little into the exception where the write fails:

java.net.SocketException: Bad file descriptor (Write failed)
        at java.net.SocketOutputStream.socketWrite0(Native Method)
        at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
        at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
        at
org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndpoint.java:1491)
        at
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:247)
        at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
        at
org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutputBuffer.java:183)
...


I took a look at the JRE source code. That exception is triggered by an
OS level error (9, EBADF, "Bad file descriptor") when the JRE makes the
OS call to write to the socket.

Everything I have found online points to one of two causes for such an
error:
a) the socket has already been closed
b) the OS has run out of file descriptors

There is no indication that the JRE or Tomcat or the application is doing a)
Previous investigations have ruled out b)

The wireshark trace indicates that the socket is closed before the write
takes place which suggests a) rather more than b). Even so, I'd be
tempted to double check b) and maybe try running Tomcat with
-XX:+MaxFDLimit just to be sure.

If you haven't already, I think now is the time to follow Paul
Carter-Brown's advice from earlier in this thread and use strace to see
what is going on between the JRE and the OS. The aim being to answer the
question "what is triggering the socket close"

I can try and help interpret that log but I am far from an expert. You
may want to seek help elsewhere.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to