from an occasional long list of CLOSE_WAIT connections). A GC will happen
from time to time (*), which will get rid of these connections. And those
CLOSE_WAIT connections do not consume a lot of resources, so you'll never
notice.
Until at some point, the number of these CLOSE_WAIT connections gets just
at the point where the OS can't swallow any more of them, and then you have
a big problem.
(*) and this is the "insidious squared" part : the smaller the Heap, the
more often a GC will happen, so the sooner these CLOSE_WAIT connections
will disappear. Conversely, by increasing the Heap size, you leave more
time between GCs, and make the problem more likely to happen.
"Our sites still functions normally with no cpu spikes during this build up
until around 60,000 connections, but then the server refuses further
connections and a manual Tomcat restart is required."
yes, the connection limit is a 16 bit short count minus some reserved
addresses. So your system should become unresponsive, you've run out of
ports (the 16 bit value in a TCP connection).
netstat -na should give you your connection state when this happens, and
that is helpful debug information.
Filip
On Thu, Jun 19, 2014 at 2:44 PM, André Warnier <a...@ice-sa.com> wrote:
Konstantin Kolinko wrote:
2014-06-19 17:10 GMT+04:00 Lars Engholm Johansen <lar...@gmail.com>:
I will try to force a GC next time I am at the console about to
restart a
Tomcat where one of the http-nio-80-ClientPoller-x threads have died
and
connection count is exploding.
But I do not see this as a solution - can you somehow deduct why this
thread died from the outcome from a GC?
Nobody said that a thread died because of GC.
The GC that Andre suggested was to get rid of some of CLOSE_WAIT
connections in netstat output, in case if those are owned by some
abandoned and non properly closed I/O classes that are still present
in JVM memory.
Exactly, thanks Konstantin for clarifying.
I was going per the following in the original post :
"Our sites still functions normally with no cpu spikes during this build
up
until around 60,000 connections, but then the server refuses further
connections and a manual Tomcat restart is required."
CLOSE_WAIT is a normal state for a TCP connection, but it should not
normally last long.
It indicates basically that the other side has closed the connection, and
that this side should do the same. But it doesn't, and as long as it
doesn't the connection remains in the CLOSE_WAIT state. It's like
"half-closed", but not entirely, and as long as it isn't, the OS cannot
get
rid of it.
For a more precise explanation, Google for "TCP CLOSE_WAIT state".
I have noticed in the past, with some Linux versions, that when the
number
of such CLOSE_WAIT connections goes above a certain level (several
hundred), the TCP/IP stack can become totally unresponsive and not accept
any new connections at all, on any port.
In my case, this was due to the following kind of scenario :
Some class Xconnection instantiates an object, and upon creation this
object opens a TCP connection to something. This object is now used as an
"alias" for this connection. Time passes, and finally the object goes
out
of scope (e.g. the reference to it is set to "null"), and one may believe
that the underlying connection gets closed as a side-effect. But it
doesn't, not as long as this object is not actually garbage-collected,
which triggers the actual object destruction and the closing of the
underlying connection.
Forcing a GC is a way to provoke this (and restarting Tomcat another, but
more drastic).
If a forced GC gets rid of your many CLOSE_WAIT connections and makes
your
Tomcat operative again, that would be a sign that something similar to
the
above is occurring; and then you would need to look in your application
for
the oversight. (e.g. the class should have a "close" method (closing the
underlying connection), which should be invoked before letting the object
go out of scope).
The insidious part is that everything may look fine for a long time
(apart
from an occasional long list of CLOSE_WAIT connections). A GC will
happen
from time to time (*), which will get rid of these connections. And
those
CLOSE_WAIT connections do not consume a lot of resources, so you'll never
notice.
Until at some point, the number of these CLOSE_WAIT connections gets just
at the point where the OS can't swallow any more of them, and then you
have
a big problem.
That sounds a bit like your case, doesn't it ?
(*) and this is the "insidious squared" part : the smaller the Heap, the
more often a GC will happen, so the sooner these CLOSE_WAIT connections
will disappear. Conversely, by increasing the Heap size, you leave more
time between GCs, and make the problem more likely to happen.
I believe that the rest below may be either a consequence, or a red
herring, and I would first eliminate the above as a cause.
And could an Exception/Error in Tomcat thread
http-nio-80-ClientPoller-0
or http-nio-80-ClientPoller-1 make the thread die with no Stacktrace
in
the Tomcat logs?
A critical error (java.lang.ThreadDeath,
java.lang.VirtualMachineError) will cause death of a thread.
A subtype of the latter is java.lang.OutOfMemoryError.
As of now, such errors are passed through and are not logged by
Tomcat, but are logged by java.lang.ThreadGroup.uncaughtException().
ThreadGroup prints them to System.err (catalina.out).
Best regards,
Konstantin Kolinko
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org