Hi Chris
My expectation from the backlog is:

1. Connections that can be handled directly will be accepted and work
will begin

2. Connections that cannot be handled will accumulate in the backlog

3. Connections that exceed the backlog will get "connection refused"

There are caveats, I would imagine. For instance, do the connections in
the backlog have any kind of server-side timeouts associated with them
-- what is, will they ever get discarded from the queue without ever
being handled by the bound process (assuming the bound process doesn't
terminate or anything weird like that)? Do the clients have any timeouts
associated with them?

Does the above *not* happen? On which platform? Is this only with NIO?
I am not a Linux level TCP expert, but what I believe is that the TCP layer has its timeouts and older connection requests will get discarded from the queue etc. Typically a client will have a TCP level timeout as well, i.e. the time it will wait for the other party to accept its SYN packet. My testing has been primarily on Linux / Ubuntu.

Leaving everything to the TCP backlog makes the end clients see nasty RSTs when Tomcat is under load instead of connection refused - and could prevent the client from performing a clean fail-over when one Tomcat node is overloaded.
So you are eliminating the backlog entirely? Or are you allowing the
backlog to work as "expected"? Does closing and re-opening the socket
clear the existing backlog (which would cancel a number of waiting
though not technically accepted connections, I think), or does it retain
the backlog? Since you are re-binding, I would imagine that the backlog
gets flushed every time there is a "pause".
I am not sure how the backlog would work under different operating systems and conditions etc. However, the code I've shared shows how a pure Java program could take better control of the underlying TCP behavior - as visible to its clients.
What about performance effects of maintaining a connector-wide counter
of "active" connections, plus pausing and resuming the channel -- plus
re-connects by clients that have been dropped from the backlog?
What the UltraESB does by default is to stop accepting new connections after a threshold is reached (e.g. 4096) and remain paused until the active connections drops back to another threshold (e.g. 3073). Each of these parameters are user configurable, and depends on the maximum number of connections each node is expected to handle. Maintaining connector wide counts in my experience does not cause any performance effects, neither re-connects by clients - as whats expected in reality is for a hardware load balancer to forward requests that are "refused" by one node, to another node, which hopefully is not loaded.

Such a fail-over can take place immediately, cleanly and without any cause of confusion even if the backend service is not idempotent. This is clearly not the case when a TCP/HTTP connection is accepted and then met with a hard RST after a part or a full request has been sent to it.
I'm concerned that all of your bench tests appear to be done using
telnet with a single acceptable connection. What if you allow 1000
simultaneous connections and test it under some real load so we can see
how such a solution would behave.
Clearly the example I shared was just to illustrate this with a pure Java program. We usually conduct performance tests over half a dozen open source ESBs with concurrency levels of 20,40,80,160,320,640,1280 and 2560 and payload sizes of 0.5, 1, 5, 10 and 100K bytes. You can see some of the scenarios here http://esbperformance.org. We privately conduct performance tests beyond 2560 to much higher levels. We used a HttpComponents based EchoService as our backend service all this time, and it behaved very well with all load levels. However some weeks back we accepted a contribution which was an async servlet to be deployed on Tomcat as it was considered more "real world". The issues I noticed was when running high load levels over this servet deployed on Tomcat, especially when the response was being delayed to simulate realistic behavior.

Although we do not Tomcat ourselves, our customers do. I am also not calling this a bug - but as an area for possible improvement. If the Tomcat users, developers and the PMC thinks this is worthwhile to pursue, I believe it would be a good enhancement - maybe even a good GSoc project. As a fellow member of the ASF and a committer on multiple projects/years, I believed it was my duty to bring this to the attention of the Tomcat community.

regards
asankha

--
Asankha C. Perera
AdroitLogic, http://adroitlogic.org

http://esbmagic.blogspot.com




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to