Hi,

Am 02.04.2014 um 22:19 schrieb [email protected]:

> We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
> 'starter_method' that's affecting OpenMPI jobs.
> 
> Following  previous discussions on the list[1], we're using the
> 'environment modules' package, and using a starter_method to initialize
> the user's environment as if it was a login shell and to export the
> "module" function.
> 
> Our starter_method script is "/lab/bin/starter" and contains:
> 
> ---------------------------------------------
> [line  1]     #!/bin/bash -l
> [line  2]     # initialize modules, then run whatever was given
> [line  3]     . /usr/share/Modules/init/bash
> [line  4]     
> [line  5]     #  check if "module" is declared as a function
> [line  6]     declare -f -F module 1> /dev/null 2>&1
> [line  7]     if [ $? = 0 ] ; then
> [line  8]             # there is a module function, export it
> [line  9]             export -f module
> [line 10]     fi
> [line 11]     
> [line 12]     printf "Debugging. About to:\n\texec \"${@}\"\n" 1>&2
> [line 12]     exec "${@}"
> ---------------------------------------------
> 
> (The debugging statement is not normally active.)
> 
> This works fine for serial jobs.
> 
> However, OpenMPI (1.3.3) jobs fail to start. It appears as if the

Before digging deeper into the issue, I would suggest to update to a more 
recent version.


> starter_method is somehow corrupting the environment passed to mpirun.
> 
> For example, I submitted a job when I was in the directory
> /lab/home/bergman/sge_job_output.
> 
> The starter_method script reports:
> 
>       Debugging: About to:
>               exec OPAL_PREFIX=/lab/bin/openmpi/sge; export OPAL_PREFIX;\
>                        PATH=/lab/bin/openmpi/bin:$PATH ;\
>                        export PATH ;\
>                        LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH 
> ;\
>                        export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted

This doesn't look working. `exec` expects a PATH to an executable, but not a 
series of bash commands. As an idea you can try something like:

exec bash -c 'export FOO=Hallo; echo $FOO'

How do you submit/start the job? These parameters forwarded to the starter 
method are usually send to the original executable/scripts as parameters.

BTW: Open MPI uses `qrsh -inherit -V ...` automatically when compiled with a 
tight integration and should set the variables already on its own. But there is 
also the option -x to `mpiexec` to set special ones you need.

-- Reuti


> which looks fine. However, that is followed by the error:
> 
>       /lab/bin/starter: line 12: 
> /lab/home/bergman/sge_job_output/OPAL_PREFIX=/lab/bin/openmpi/sge;\
>                        export OPAL_PREFIX;\
>                        PATH=/lab/bin/openmpi/bin:$PATH ;\
>                        export PATH ;\
>                        LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH 
> ;\
>                        export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
> 
> [Lines broken for readability.]
> 
> 
> The odd thing is that it the current working directory (where the SGE
> job was submitted) is pre-peneded to the definition of the OPAL_PREFIX
> variable. This is consistent, regardless of where the SGE job is launched (~,
> /tmp, etc.).
> 
> Any suggestions?
> 
> Thanks,
> 
> Mark
> 
> 
> 
> [1] http://gridengine.org/pipermail/users/2014-January/007121.html
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to