I'm having a heck of a time figuring out why.

On rhel6, /etc/init.d/sgeexecd.myclustername script is run at startup, or via 
sudo after startup.
sudo /etc/init.d/sgeexecd.myclustername start

It just says "OK" and no other output, yet the daemon isn't running.

I added the "-x" option to '#!/bin/sh -x" so I can debug it … 
I see it gets up to the "exec 1> /dev/null 2>&1" which effectively eliminates 
any further debug output…
So I comment out that line and run again.
Now I can see it launches sge_execd, and the exit status is 0, so the "touch" 
on the following line does indeed create the lock file.
The "qping" loop immediately after that in the script … exits with 0 status, on 
the first try.

And still, there is no process running at the end of that script.

I modify the startup script to perform the qping 5 times unconditionally.  I 
see the first time, it has exit value 0, and all subsequent times, it has exit 
value 1. This means it is indeed running for a very short period of time, but 
then it dies in less than a second.

Any ideas what the problem is?

This is a machine that we recently reinstalled the OS, and we're reinstalling 
sgeexecd by the same process it was previously installed.
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to