On Feb 24, 2009, at 4:16 PM, Sam Hokin wrote:

Juha Laiho wrote:
One tool that I haven't yet seen suggested is 'strace', the Linux system call tracer. This will show all the calls your application makes to the operating system. As you say the application is mostly idle during the
delay, it is, in one way or another, waiting for some OS service to
complete. 'strace' should provide you with timestamped information on
what OS services were called, with which arguments, and how long did
it take for them to return with results. 'strace' will leave you with
a huge file (or a set of huge files, depending on the options you use),
and going through them will take some time - but you'll most likely
also find what causes the delay.

Thanks, Juha. Actually Pieter suggested it a little while ago, and I've been trying to get some information out of strace. The best I can do is to put strace in front of the java command that's inside catalina.sh. That's the command that shows with ps -ef when Tomcat is running. BUT, I get nothing out of strace when I make page requests on a site, it just shows output during Tomcat startup. So, I've not figured out how to get strace to say what the JVM is during the delay. jstack has led us to a stalled File.exists() in one case, but we don't know what file it's looking for. And I'm not convinced that File.exists() is the only method that's stalling.

Since this problem exists only on a production server, a server on which I must still serve at least two customer sites (due to DNS issues) in addition to our own and any others I put on there, I'm a bit restricted in terms of how much I can muck with it (not that I haven't brought those live sites down for awkward periods of time with the diagnosis I've attempted so far). I wish I had a test environment on another server that replicates this issue, but my other two servers run Tomcat perfectly fast, and since I don't understand what's causing the problem, I cannot make one of my other servers reproduce it.

Another diagnostic problem is that undeploying a context with the Tomcat /manager app, and then starting it again, does NOT reset this problem - the response to a JSP request is immediate (provided it had been requested since the last Tomcat startup). This problem is only reset on a given JSP if I restart Tomcat entirely; I can reproduce it by creating fresh JSPs with new names and requesting them.

But, clearly, the key diagnostic issue is finding out WHAT is going on during the delay that a JSP incurs when it is first requested of a given Tomcat instance. I've not been able to find out from strace. I'll give truss -f and truss -ff a try.


How about just using tcpdump during the long delay and see what the machine is doing network wise ?

man tcpdump

János
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to