Konstantin Kolinko wrote:
2014-06-19 17:10 GMT+04:00 Lars Engholm Johansen <lar...@gmail.com>:
I will try to force a GC next time I am at the console about to restart a
Tomcat where one of the http-nio-80-ClientPoller-x threads have died and
connection count is exploding.

But I do not see this as a solution - can you somehow deduct why this
thread died from the outcome from a GC?

Nobody said that a thread died because of GC.

The GC that Andre suggested was to get rid of some of CLOSE_WAIT
connections in netstat output, in case if those are owned by some
abandoned and non properly closed I/O classes that are still present
in JVM memory.

Exactly, thanks Konstantin for clarifying.

I was going per the following in the original post :
"Our sites still functions normally with no cpu spikes during this build up
until around 60,000 connections, but then the server refuses further
connections and a manual Tomcat restart is required."

CLOSE_WAIT is a normal state for a TCP connection, but it should not normally 
last long.
It indicates basically that the other side has closed the connection, and that this side should do the same. But it doesn't, and as long as it doesn't the connection remains in the CLOSE_WAIT state. It's like "half-closed", but not entirely, and as long as it isn't, the OS cannot get rid of it.
For a more precise explanation, Google for "TCP CLOSE_WAIT state".

I have noticed in the past, with some Linux versions, that when the number of such CLOSE_WAIT connections goes above a certain level (several hundred), the TCP/IP stack can become totally unresponsive and not accept any new connections at all, on any port.
In my case, this was due to the following kind of scenario :
Some class Xconnection instantiates an object, and upon creation this object opens a TCP connection to something. This object is now used as an "alias" for this connection. Time passes, and finally the object goes out of scope (e.g. the reference to it is set to "null"), and one may believe that the underlying connection gets closed as a side-effect. But it doesn't, not as long as this object is not actually garbage-collected, which triggers the actual object destruction and the closing of the underlying connection.
Forcing a GC is a way to provoke this (and restarting Tomcat another, but more 
drastic).

If a forced GC gets rid of your many CLOSE_WAIT connections and makes your Tomcat operative again, that would be a sign that something similar to the above is occurring; and then you would need to look in your application for the oversight. (e.g. the class should have a "close" method (closing the underlying connection), which should be invoked before letting the object go out of scope).

The insidious part is that everything may look fine for a long time (apart from an occasional long list of CLOSE_WAIT connections). A GC will happen from time to time (*), which will get rid of these connections. And those CLOSE_WAIT connections do not consume a lot of resources, so you'll never notice. Until at some point, the number of these CLOSE_WAIT connections gets just at the point where the OS can't swallow any more of them, and then you have a big problem.

That sounds a bit like your case, doesn't it ?

(*) and this is the "insidious squared" part : the smaller the Heap, the more often a GC will happen, so the sooner these CLOSE_WAIT connections will disappear. Conversely, by increasing the Heap size, you leave more time between GCs, and make the problem more likely to happen.


I believe that the rest below may be either a consequence, or a red herring, and I would first eliminate the above as a cause.


And could an Exception/Error in Tomcat thread  http-nio-80-ClientPoller-0
 or  http-nio-80-ClientPoller-1  make the thread die with no Stacktrace in
the Tomcat logs?


A critical error (java.lang.ThreadDeath,
java.lang.VirtualMachineError) will cause death of a thread.

A subtype of the latter is java.lang.OutOfMemoryError.

As of now, such errors are passed through and are not logged by
Tomcat, but are logged by java.lang.ThreadGroup.uncaughtException().
ThreadGroup prints them to System.err (catalina.out).


Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to