In the message dated: Wed, 02 Apr 2014 16:19:14 -0400,
The pithy ruminations from [email protected] on
<[gridengine users] SGE starter_method breaks OpenMPI> were:
=>
=> We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
=> 'starter_method' that's affecting OpenMPI jobs.
=>
=> Our starter_method script is "/lab/bin/starter" and contains:
=>
=> ---------------------------------------------
=> [line 1] #!/bin/bash -l
=> [line 2] # initialize modules, then run whatever was given
=> [line 3] . /usr/share/Modules/init/bash
=> [line 4]
=> [line 5] # check if "module" is declared as a function
=> [line 6] declare -f -F module 1> /dev/null 2>&1
=> [line 7] if [ $? = 0 ] ; then
=> [line 8] # there is a module function, export it
=> [line 9] export -f module
=> [line 10] fi
=> [line 11]
=> [line 12] printf "Debugging. About to:\n\texec \"${@}\"\n" 1>&2
=> [line 12] exec "${@}"
=> ---------------------------------------------
=>
=> (The debugging statement is not normally active.)
=>
=> This works fine for serial jobs.
=>
=> However, OpenMPI (1.3.3) jobs fail to start. It appears as if the
=> starter_method is somehow corrupting the environment passed to mpirun.
=>
[SNIP!]
Following up, the problem was that OpenMPI was clearing the environment,
so $SGE_STARTER_SHELL_PATH (among others) was unset when line 12 was
called, meaning that exec() was passed lots of environment variable
being set as the inital 'command', instead of the name of an executable,
causing an error. This was not a problem with non-MPI jobs. I'm checking
OpenMPI 1.7.2 for the same issue.
The fix has been reported before[1] in response to the same symptoms
with qrshd. My starter method is now:
----------------------------
#!/bin/bash -l
##########################################################
# initialize modules, then run whatever was given
. /usr/share/Modules/init/bash
if [ $? = 0 ] ; then
# there is a module function, export it
export -f module
fi
##########################################################
DEFSHELL=/bin/bash
LOGINFLAG=""
if [ "X$SGE_STARTER_SHELL_PATH" = "X" ] ; then
SHELL=$DEFSHELL
else
# yet another sanity check
if [ ! -x $SGE_STARTER_SHELL_PATH ] ; then
SHELL=$DEFSHELL
else
SHELL=$SGE_STARTER_SHELL_PATH
fi
fi
if [ "X$SGE_STARTER_USE_LOGIN_SHELL" == "Xtrue" ] ; then
LOGINFLAG="-l"
fi
exec $LOGINFLAG $SHELL -c "$*"
----------------------------
and this is working fine. Passing the original arguments as "$@" caused
problems with some quoting, where $* has been more successful. I'm sure
that clever users will break this with embedded spaces, semi-colons, etc.
Mark
[1] http://gridengine.org/pipermail/users/2012-October/004960.html
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users