Hi folks. I guess this isn't question related so much to Avalon, but more like general Linux question whom I know pretty much nothing about.
goes for most people here :D. Our little bits of limited knowledge might combine to form a solution though.....
I'm running my Fortress server app, using Event package for worker
threads (through CommandManager). 'ps aux' command shows some number
of my app's processes, in fact, these are my app's threads as I
managed to figure out. And now, my system administrator keeps
informing me constantly how my app's processes/threads keep changing status from normal -> zombie and that I should fix the problem. I
dunno even what 'zombie' process/thread means, and I'm java developer
who is supose to resolve bugs in form of java exceptions, and not
this system problems which I don't know anything about.
a zombie process is basically an ill-behaving creature that might mess up the system (usually because it is dead but not quite dead, whereas it should be quite dead, hence the name zombie). But since linux identifies it as zombie, it's constrained to the graveyard, so that won't actually happen. The only real danger with zombies is when you have lots of them. You see, the term "zombie process" fits :D
For some reason I can't remember (I read a book about OS kernels some time ago but forgot most of it again), it is not quite possible to always avoid zombie processes on linux (I think they don't exist in windows NT whereas I think windows 98-based stuff just doesn't bother to check for them at all, but anyways, you won't be able to reproduce the problem there). The trick is to watch for them and make sure they go away.
I guess that ideally, the thread pool / event manager should complain that you are not managing your threads properly when it detects a zombie process. I also guess that such functionality does not exist because the problem hasn't popped up before. Let's dig in! (I just love it when I totally don't understand stuff :D)
if your use of threads does not result in exceptions (and our use of doug lea's concurrent package means that we pretty much handle threads relatively nicely in fortress/event), that's usually a JVM problem.........
.........unless you are calling some native code somewhere. Where it is basically a problem with Runtime.exec(), not making much sense to me. In that case, IIRC, you need to do some cleanup:
final Runtime runtime = Runtime.getRuntime();
final Process childProcess = runtime.exec( /* ... stuff ... */ );
childProcess.waitFor();
final int result = child.exitValue();
if(result != 0)
; // usually bad; you need to sigkill -s 18* this process IIRC;
// I think exitValue() also used to call some more native
// code that would tell linux "kill this process", but
// I hope that's changed :D* I think that's the code for killing zombie processes. I dunno; google might help.
Now, does util.concurrent create Processes? There could be a bug there or in excalibur-event. Or maybe you're not returning worker threads when you're done with them. Try adding debug logging that tracks what happens to your worker threads. If you are requesting them, then just forgetting about them without ever using them again, that's your "thread leak".
Also, try the newest version of your JVM, the newest version of util.concurrent (newer than what we distribute, I think) and the newest version of excalibur-event.
If the problem doesn't go away, send lots of information our way (like logs set on DEBUG, your config files and some actual process activity logs your sysadmin should be able to provide). The problem with these semi-native environment things is that they are very difficult to debug.
So, did anyone experience java 'zombie' threads on Linux before and noticed why this happens ?
see above...Mostly it happens when you interface with native code and/or native process/thread management. If you dabble with the stuff Runtime exposes you can potentially create a lot of problems for yourself if you don't do bookkeeping. If you don't, the JDK should keep the books for you. Also, process issues used to happen a lot on old JDKs IIRC. Think they have all that fixed since 1.3.1 though.....at least this is the first in a long time I've heard of any problems regarding these.
Anything ? I'm really stuck ...
yeah, this is tricky stuff! Do note that, if your app doesn't create lots of zombie processes, this should be harmless. If your application exits normally, the JVM will also clean up all your zombie processes. The only issue is when these zombie processes become so big in number they become a resource hog. I think processes are pretty cheap on recent linux kernels though. In that case, you could opt to just restart your application once a day if you're on a deadline :D
Regardless of what you do, please do provide the exact version of the JVM you're using, as well as info on your linux version (distro, version, kernel version), so it may be possible to reproduce your problem.
disclaimer: this is really not an area of expertise for me!
good luck (and let us now how you progress!),
- Leo
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
