Eric,

On 11/3/20 10:05, Eric Robinson wrote:
-----Original Message-----
From: Eric Robinson <eric.robin...@psmnv.com>
Sent: Tuesday, November 3, 2020 8:21 AM
To: Tomcat Users List <users@tomcat.apache.org>
Subject: RE: Weirdest Tomcat Behavior Ever?

From: Mark Thomas <ma...@apache.org>
Sent: Tuesday, November 3, 2020 2:06 AM
To: Tomcat Users List <users@tomcat.apache.org>
Subject: Re: Weirdest Tomcat Behavior Ever?

On 02/11/2020 12:16, Eric Robinson wrote:

<snip/>

Gotcha, thanks for the clarification. Let's see what happens when
the users
start hitting it at 8:00 am Eastern.

Progress. The first attempt to write to the socket triggers the
following
exception:

02-Nov-2020 14:33:54.083 FINE [http-bio-3016-exec-13]
org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write
[301361476]
  java.net.SocketException: Bad file descriptor (Write failed)
     at java.net.SocketOutputStream.socketWrite0(Native Method)
     at
java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
     at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
     at

org.apache.tomcat.util.net.JIoEndpoint$DebugOutputStream.write(JIoEndp
oint.java:1409)
     at
org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalO
ut
putBuffer.java:247)
     at
org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480)
     at

org.apache.coyote.http11.InternalOutputBuffer.endRequest(InternalOutpu
t
Buffer.java:183)
     at
org.apache.coyote.http11.AbstractHttp11Processor.action(AbstractHttp11
Pr
ocessor.java:761)
     at org.apache.coyote.Response.action(Response.java:174)
     at org.apache.coyote.Response.finish(Response.java:274)
     at
org.apache.catalina.connector.OutputBuffer.close(OutputBuffer.java:322)
     at

org.apache.catalina.connector.Response.finishResponse(Response.java:537)
     at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java
:4
80)
     at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp1
1P
rocessor.java:1083)
     at

org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(A
b
stractProtocol.java:640)
     at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint
.ja
va:321)
     at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.j
av
a:1149)
     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
ja
va:624)
     at

org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThr
ead.java:61)
     at java.lang.Thread.run(Thread.java:748)

Because this is an instance of an IOException, Tomcat assumes it has
been caused by the client dropping the connection and silently
swallows it. I'll be changing that later today so the exception is
logged as DEBUG level for all new Tomcat versions.

Possible causes of "java.net.SocketException: Bad file descriptor"
I've been able to find are:

1. OS running out of file descriptors.

2.Trying to use a closed socket.

I want to review the source code to see if there are any others.

I don't think we are seeing 2 as there is no indication of the Socket,
InputStream or OutputStream being closed in the logs.

That leaves 1. Possible causes here are a file descriptor leak or
normal operations occasionally needing more than the current limit. I
don't think it is a leak as I'd expect to see many more errors of this
type after the first and we aren't seeing that. That leaves the
possibility of the current limit being a little too low.

My recommendation at this point is to increase the limit for file descriptors.
Meanwhile, I'll look at the JRE source to see if there are any other
possible triggers for this exception.

Mark



On the tomcat server, max open file descriptors is currently 2853957

[root@001app01a ~]# sysctl fs.file-max
fs.file-max = 2853957

Most of the time, the number of open files appears to run about 600,000.

  What do you think of watching the open file count and seeing if the number
gets up around the ceiling when the socket write failure occurs? Something
like...

[root@001app01a ~]#  while [ TRUE ];do FILES=$(lsof|wc -l);echo "$(date
+%H:%M:%S) $FILES";done
09:11:15 591671
09:11:35 627347
09:11:54 626851
09:12:11 626429
09:12:26 545748
09:12:42 548578
09:12:58 551487
09:13:14 516700
09:13:30 513312
09:13:45 512830
09:14:02 533338
09:14:18 568233
09:14:35 570158
09:14:51 566269
09:15:07 547389
09:15:23 544203
09:15:38 546395

It's not ideal; as it seems to take 15-20 seconds to count them using lsof.




Wait, never mind. I realized the per-process limits are what matters.
I checked, and nofile was set to 4096 for the relevant java process.
:)

Your <Connector> configuration is really simple:

    <Connector port="3016" protocol="HTTP/1.1"
               connectionTimeout="20000"
               redirectPort="8443" />

How many of these do you have per server.xml? The default maxConnections for BIO is 8192 which was double your previous ulimit. If you have 10 of them, then you could potentially have 20 times the number of incoming connections than you have file descriptors available.

Have you noticed whether or not a large number of (simultaneous) connections are required to trigger this behavior? Mark said "[t]he first attempt to write to the socket triggers the following exception." I'm not sure if that was the *very first* write, or just the first write we care about.

I did...

# prlimit --pid 8730 --nofile=16384:16384

That should give java some extra breathing room if the issue is max
open files, right?
That will heavily depend upon your <Connector> configurations.

You might want to run your "lsof" monitor on that one single JVM process while you are instrumenting things. Perhaps we'll see that these errors are triggered as soon as you hit e.g. 4096 file descriptors.

You might actually want to *restore* the ulimit and then reproduce, looking at the file-descriptor count.

You said that this behavior was "new" and you've been running fine for 15 years or so. Is there any reason to suspect that recently you have gotten significantly more traffic than in the past? Maybe you've been getting close to the edge for 15 years and you have finally spilled over it.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to