I've got a very large web application (about 300 objects and about 1000 pages) which uses mostly straight JSP. This gets a reasonable number of hits with approximately 200 concurrant sessions operating.
Recently, we introduced something thats causing something resembling a thread deadlock. Some unknown event occurs, then things start grinding to halt as threads get backed up. When this happens they only way to get out is to hard kill the server (e.g. - orions shutdown doesn't work, and kill -TERM doesn't work). This only really occurs under load, and we cannot reproduce it in a development environment (even with loading tools). We've crawled through every line of code carefully and have found some obscure race-conditions we hadn't considered (race conditions we never actually had occur). But so far nothing that has would fix our real problem, so I'm fairly convinced that I'm not going to find it easily by looking at java code. Now I've tried jdb and of course I can only see suspended threads (which is not too useful) and I've tried jprobe but that only shows the parent threads state. I even tried strace/truss but thats too lowlevel to make out whats happening. I'm starting to use 'kill -3' but that again only shows the parent thread. Does anyone have an suggestions on doing runtime debug on the thread level? I'd really just like to see whats actually happening in the locked threads. Anyone?