Hi,
Am 02.04.2014 um 22:19 schrieb [email protected]:
> We're bringing up SoGE 8.1.6 and I've run into a problem with the use of a
> 'starter_method' that's affecting OpenMPI jobs.
>
> Following previous discussions on the list[1], we're using the
> 'environment modules' package, and using a starter_method to initialize
> the user's environment as if it was a login shell and to export the
> "module" function.
>
> Our starter_method script is "/lab/bin/starter" and contains:
>
> ---------------------------------------------
> [line 1] #!/bin/bash -l
> [line 2] # initialize modules, then run whatever was given
> [line 3] . /usr/share/Modules/init/bash
> [line 4]
> [line 5] # check if "module" is declared as a function
> [line 6] declare -f -F module 1> /dev/null 2>&1
> [line 7] if [ $? = 0 ] ; then
> [line 8] # there is a module function, export it
> [line 9] export -f module
> [line 10] fi
> [line 11]
> [line 12] printf "Debugging. About to:\n\texec \"${@}\"\n" 1>&2
> [line 12] exec "${@}"
> ---------------------------------------------
>
> (The debugging statement is not normally active.)
>
> This works fine for serial jobs.
>
> However, OpenMPI (1.3.3) jobs fail to start. It appears as if the
Before digging deeper into the issue, I would suggest to update to a more
recent version.
> starter_method is somehow corrupting the environment passed to mpirun.
>
> For example, I submitted a job when I was in the directory
> /lab/home/bergman/sge_job_output.
>
> The starter_method script reports:
>
> Debugging: About to:
> exec OPAL_PREFIX=/lab/bin/openmpi/sge; export OPAL_PREFIX;\
> PATH=/lab/bin/openmpi/bin:$PATH ;\
> export PATH ;\
> LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH
> ;\
> export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
This doesn't look working. `exec` expects a PATH to an executable, but not a
series of bash commands. As an idea you can try something like:
exec bash -c 'export FOO=Hallo; echo $FOO'
How do you submit/start the job? These parameters forwarded to the starter
method are usually send to the original executable/scripts as parameters.
BTW: Open MPI uses `qrsh -inherit -V ...` automatically when compiled with a
tight integration and should set the variables already on its own. But there is
also the option -x to `mpiexec` to set special ones you need.
-- Reuti
> which looks fine. However, that is followed by the error:
>
> /lab/bin/starter: line 12:
> /lab/home/bergman/sge_job_output/OPAL_PREFIX=/lab/bin/openmpi/sge;\
> export OPAL_PREFIX;\
> PATH=/lab/bin/openmpi/bin:$PATH ;\
> export PATH ;\
> LD_LIBRARY_PATH=/lab/bin/openmpi/lib:$LD_LIBRARY_PATH
> ;\
> export LD_LIBRARY_PATH ; /lab/bin/openmpi/bin/orted
>
> [Lines broken for readability.]
>
>
> The odd thing is that it the current working directory (where the SGE
> job was submitted) is pre-peneded to the definition of the OPAL_PREFIX
> variable. This is consistent, regardless of where the SGE job is launched (~,
> /tmp, etc.).
>
> Any suggestions?
>
> Thanks,
>
> Mark
>
>
>
> [1] http://gridengine.org/pipermail/users/2014-January/007121.html
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users