Hello, I am experimenting with UML in a HPC cluster. What I do is basically start up 60 instances all at once, a bunch of instances on each hardware node, using the resource manager TORQUE. Each instance gets a different umid. The instances are configured to boot up, execute a job and halt after that. Most of the times it works very well. However, every now and then some instance of the 60 will get stuck with the infamous "INIT: Id 0 respawning too fast" message at boot and consequently neither run the job nor terminate.
So far I have found mentions of two possible causes for this problem: 1) wrong name of the tty device in inittab 2) /lib/tls problem. Neither applies in my case (/dev/tty0 is correct, and I have already renamed /lib/tls, just in case). As I can reproduce the problem "statistically" (quite reliably in the cluster context) but not at will when running a single instance from the command line, my question is: how should I proceed about troubleshooting it? Are there any locations in the UML kernel code where I could insert some debug statements (or maybe delays? maybe the problem is timing-related somehow?) to gather useful diagnostic information? Best regards - Jan Ploski -- Dipl.-Inform. (FH) Jan Ploski OFFIS Betriebliches Informationsmanagement Escherweg 2 - 26121 Oldenburg - Germany Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202 E-Mail: [EMAIL PROTECTED] - URL: http://www.offis.de ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ User-mode-linux-user mailing list User-mode-linux-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user