On Fri, May 04, 2007 at 07:30:36PM +0200, Jan Ploski wrote: > I am experimenting with UML in a HPC cluster. What I do is basically start > up 60 instances all at once, a bunch of instances on each hardware node, > using the resource manager TORQUE. Each instance gets a different umid. > The instances are configured to boot up, execute a job and halt after > that. Most of the times it works very well. However, every now and then > some instance of the 60 will get stuck with the infamous "INIT: Id 0 > respawning too fast" message at boot and consequently neither run the job > nor terminate. > > So far I have found mentions of two possible causes for this problem: 1) > wrong name of the tty device in inittab 2) /lib/tls problem. Neither > applies in my case (/dev/tty0 is correct, and I have already renamed > /lib/tls, just in case).
These would cause problems all the time, not sporadically as you're seeing. > > As I can reproduce the problem "statistically" (quite reliably in the > cluster context) but not at will when running a single instance from the > command line, my question is: how should I proceed about troubleshooting > it? Are there any locations in the UML kernel code where I could insert > some debug statements (or maybe delays? maybe the problem is > timing-related somehow?) to gather useful diagnostic information? Is it possible that it is caused by confusion about how quickly real time is progressing compared to how much computation is happening in that time? By default, UML will match its time to the host, with the effect that, on a busy system, it will see time progressing quickly compared to the work it's doing. If so, then disable CONFIG_UML_REAL_TIME_CLOCK, and use 2.6.21-rc7-mm2, which has a fix in this area, and see if that makes any difference. Jeff -- Work email - jdike at linux dot intel dot com ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ User-mode-linux-user mailing list User-mode-linux-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user