I'm having a heck of a time figuring out why. On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via sudo after startup. sudo /etc/init.d/sgeexecd.myclustername start
It just says "OK" and no other output, yet the daemon isn't running. I added the "-x" option to '#!/bin/sh -x" so I can debug it … I see it gets up to the "exec 1> /dev/null 2>&1" which effectively eliminates any further debug output… So I comment out that line and run again. Now I can see it launches sge_execd, and the exit status is 0, so the "touch" on the following line does indeed create the lock file. The "qping" loop immediately after that in the script … exits with 0 status, on the first try. And still, there is no process running at the end of that script. I modify the startup script to perform the qping 5 times unconditionally. I see the first time, it has exit value 0, and all subsequent times, it has exit value 1. This means it is indeed running for a very short period of time, but then it dies in less than a second. Any ideas what the problem is? This is a machine that we recently reinstalled the OS, and we're reinstalling sgeexecd by the same process it was previously installed. _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
