[ a little late here, but hey.. ]

On Fri, 12 Dec 2003, David Rees wrote:

| On Fri, December 12, 2003 at 2:12 pm, Adam Fisk wrote:
| >
| > I'd be happy to send my data to the group if people are interested.
| > Aside from memory, I was surprised to find that the effect on CPU was
| > negligible (not much of a benefit from no context-switching between
| > threads) -- CPU was virtually the same in both cases.  So, the scaling
| > benefits on Windows basically come from not having to allocate more
| > memory to new threads.  I'm unfortunately not as familiar with the
| > Tomcat code as I'd like to be, but I assume it makes intelligent use of
| > thread pooling, which may even the memory benefits of NIO negligible.
| > At the same time, though, NIO may remove some of the constraints
| > introduced by thread pooling, possibly allowing Tomcat to handle heavier
| > loads without blowing up.  An optimized NIO server would if anything
| > out-perform a blocking server, but maybe by not that much.
|
| On current Linux systems, once you start getting 500+ processes/threads
| active on a typical machine, you will find that context switching starts
| taking up a significant amount of system time, especially if you decide to
| run any system moniting tools (like ps, or top).  This is better with the
| upcoming 2.6 kernels, but still doesn't scale to thousands of active
| threads very well.
|
| However, given that you need a thread anyway to server any dynamic
| content, I don't see NIO helping that much for your typical web
| application.  I could see NIO helping scale the serving of static content
| which would be useful where people are using Tomcat standalone.  Maybe
| someone can prove me wrong.  ;-)

I've also been wondering about NIO for some time, and whether a Servlet
Container ever could benefit from it.

What about this idea: if you have two types of threads, one set of Servlet
threads, and then one (or more) threads that do all the IO, and a set of
-memory buffers- inbetween, what would you get?

See, the thing is about caching - and letting the Servlet distance itself
from the client. The normal case is that the output stream is more or less
"directly attached" to the client. But if you do this other thing, having
a memory buffer inbetween, the servlet thread could do its stash, and
exit, immediately. Hopefully it could do several such operations within
-one- context switch, instead of having to IO block "on every byte
output". (Btw: i do realize that the Servlet spec allows for a buffer on
the output stream:  this is something slightly different: a forced
total-buffering of the servlet's output)

So, one could tag servlets with "NIO-able" true/false. NIO-able are
servlets that typically don't produce more than 10k of data (or something
like that), and that doesn't need to "hang onto the client", typically a
servlet outputting and flushing (thus, "sending") one byte a second while
it does the heavy database query.
  Thus, if a servlet needs this direct link to the client, or produces
very large "result sets", it can't be "NIOed", in the first case because
it -needs- this direct link between the server thread and the client, and
in the second case because the memory-buffer requirement would be to
heavy.

If one have loads of memory, this idea could potentially speed up
processing a bit by not spening an excessive amount of CPU time in context
switches. One have to realize that the clients' line-speed (and slowstart
on the TCP stack) often will lead to a servlet having to spend multiple
context switches before being able to exit due to the blocking of the IO -
this could be totally eliminated in many cases using this aproach of
dividing a dynamic page generation into two totally distinct steps: the
generation of the page, and the sending of that resulting page over the
network.

The connector architecture in earlier tomcats used to not be able to queue
requests; if no worker thread was available, it immediately returned a
"server full" response. I then advocated an approach of having a set of
worker threads (a limited set), and a line of incoming requests (that then
might have a higher number of "slots" than the number of available
workers), and then a dispatcher that stands inbetween. I believe that
something along the same lines have been implemented now.  This suggestion
here is something along the same lines - the whole idea is to reduce the
real CPU time spent in context switches - a context switch isn't just the
registers and memory maps, it is also flushing of the CPU's Lx caches!

-- 
Mvh,
Endre Stølsvik               M[+47 93054050] F[+47 51625182]
Developer @ CoreTrek AS         -  http://www.coretrek.com/
CoreTrek corporate portal / EIP -  http://www.corelets.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to