Re: Connection count explosion due to thread http-nio-80-ClientPoller-x death

André Warnier Thu, 26 Jun 2014 08:10:34 -0700

Lars Engholm Johansen wrote:

Thanks for all the replies guys.


Have you observed a performance increase by setting

acceptorThreadCount to 4 instead of a lower number? I'm just curious.



No, but this was the consensus after elongated discussions in my team. We
have 12 cpu cores - better save than sorry. I know that the official docs
reads "although you would never really need more than 2" :-)

The GC that Andre suggested was to get rid of some of CLOSE_WAIT

connections in netstat output, in case if those are owned by some
abandoned and non properly closed I/O classes that are still present
in JVM memory.



Please check out the "open connections" graph at http://imgur.com/s4fOUte
As far as I interpret, we only have a slight connection count growth during
the days until the poller thread die. These may or may not disappear by
forcing a GC, but the amount is not problematic until we hit the
http-nio-80-ClientPoller-x
thread death.

Just to make sure : what kind of connections does this graph actually show ? in which TCPstate ? does it count only the "established", or also the "FIN_WAIT", "CLOSE_WAIT","LISTEN" etc.. ?


The insidious part is that everything may look fine for a long time (apart

from an occasional long list of CLOSE_WAIT connections).  A GC will happen
from time to time (*), which will get rid of these connections.  And those
CLOSE_WAIT connections do not consume a lot of resources, so you'll never
notice.
Until at some point, the number of these CLOSE_WAIT connections gets just
at the point where the OS can't swallow any more of them, and then you have
a big problem.
(*) and this is the "insidious squared" part : the smaller the Heap, the
more often a GC will happen, so the sooner these CLOSE_WAIT connections
will disappear.  Conversely, by increasing the Heap size, you leave more
time between GCs, and make the problem more likely to happen.



You are correct. The bigger the Heap size the rarer a GC will happen - and
we have set aside 32GiB of ram. But again, referring to my "connection
count" graph, a missing close in the code does not seem to be the culprit.

A critical error (java.lang.ThreadDeath,

java.lang.VirtualMachineError) will cause death of a thread.
A subtype of the latter is java.lang.OutOfMemoryError.



I just realized that StackOverflowError is also a subclass of
VirtualMachineError,
and remembered that we due to company historical reasons had configured the
JVM stack size to 256KiB (down from the default 1GiB on 64 bit machines).
This was to support a huge number of threads on limited memory in the past.
I have now removed the -Xss jvm parameter and are exited if this solves our
poller thread problems.
Thanks for the hint, Konstantin.

I promise to report back to you guys :-)



On Fri, Jun 20, 2014 at 2:49 AM, Filip Hanik <fi...@hanik.com> wrote:

"Our sites still functions normally with no cpu spikes during this build up
until around 60,000 connections, but then the server refuses further
connections and a manual Tomcat restart is required."

yes, the connection limit is a 16 bit short count minus some reserved
addresses. So your system should become unresponsive, you've run out of
ports (the 16 bit value in a TCP connection).

netstat -na should give you your connection state when this happens, and
that is helpful debug information.

Filip




On Thu, Jun 19, 2014 at 2:44 PM, André Warnier <a...@ice-sa.com> wrote:

Konstantin Kolinko wrote:

2014-06-19 17:10 GMT+04:00 Lars Engholm Johansen <lar...@gmail.com>:

I will try to force a GC next time I am at the console about to

restart a

Tomcat where one of the http-nio-80-ClientPoller-x threads have died

and

connection count is exploding.

But I do not see this as a solution - can you somehow deduct why this
thread died from the outcome from a GC?

Nobody said that a thread died because of GC.

The GC that Andre suggested was to get rid of some of CLOSE_WAIT
connections in netstat output, in case if those are owned by some
abandoned and non properly closed I/O classes that are still present
in JVM memory.

Exactly, thanks Konstantin for clarifying.

I was going per the following in the original post :

"Our sites still functions normally with no cpu spikes during this build

up

until around 60,000 connections, but then the server refuses further
connections and a manual Tomcat restart is required."

CLOSE_WAIT is a normal state for a TCP connection, but it should not
normally last long.
It indicates basically that the other side has closed the connection, and
that this side should do the same. But it doesn't, and as long as it
doesn't the connection remains in the CLOSE_WAIT state.  It's like
"half-closed", but not entirely, and as long as it isn't, the OS cannot

get

rid of it.
For a more precise explanation, Google for "TCP CLOSE_WAIT state".

I have noticed in the past, with some Linux versions, that when the

number

of such CLOSE_WAIT connections goes above a certain level (several
hundred), the TCP/IP stack can become totally unresponsive and not accept
any new connections at all, on any port.
In my case, this was due to the following kind of scenario :
Some class Xconnection instantiates an object, and upon creation this
object opens a TCP connection to something. This object is now used as an
"alias" for this connection.  Time passes, and finally the object goes

out

of scope (e.g. the reference to it is set to "null"), and one may believe
that the underlying connection gets closed as a side-effect.  But it
doesn't, not as long as this object is not actually garbage-collected,
which triggers the actual object destruction and the closing of the
underlying connection.
Forcing a GC is a way to provoke this (and restarting Tomcat another, but
more drastic).

If a forced GC gets rid of your many CLOSE_WAIT connections and makes

your

Tomcat operative again, that would be a sign that something similar to

the

above is occurring; and then you would need to look in your application

for

the oversight. (e.g. the class should have a "close" method (closing the
underlying connection), which should be invoked before letting the object
go out of scope).

The insidious part is that everything may look fine for a long time

(apart

from an occasional long list of CLOSE_WAIT connections).  A GC will

happen

from time to time (*), which will get rid of these connections.  And

those

CLOSE_WAIT connections do not consume a lot of resources, so you'll never
notice.
Until at some point, the number of these CLOSE_WAIT connections gets just
at the point where the OS can't swallow any more of them, and then you

have

a big problem.

That sounds a bit like your case, doesn't it ?

(*) and this is the "insidious squared" part : the smaller the Heap, the
more often a GC will happen, so the sooner these CLOSE_WAIT connections
will disappear.  Conversely, by increasing the Heap size, you leave more
time between GCs, and make the problem more likely to happen.


I believe that the rest below may be either a consequence, or a red
herring, and I would first eliminate the above as a cause.

 And could an Exception/Error in Tomcat thread

 http-nio-80-ClientPoller-0

 or  http-nio-80-ClientPoller-1  make the thread die with no Stacktrace
in
the Tomcat logs?

A critical error (java.lang.ThreadDeath,
java.lang.VirtualMachineError) will cause death of a thread.

A subtype of the latter is java.lang.OutOfMemoryError.

As of now, such errors are passed through and are not logged by
Tomcat, but are logged by java.lang.ThreadGroup.uncaughtException().
ThreadGroup prints them to System.err (catalina.out).


Best regards,
Konstantin Kolinko

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Connection count explosion due to thread http-nio-80-ClientPoller-x death

Reply via email to