On Thu, Apr 11, 2013 at 11:31:12AM +0200, Reuti wrote:
> Am 11.04.2013 um 10:41 schrieb Bernard Massot:
> > I'm new to parallel computing and have a problem with Open MPI jobs in
> 0) Which version of Open MPI are you using?
1.4.2 
 
> This is a fine setup. I assume the setting in SGE's configuration is:
> 
> $ qconf -sconf
> ...
> qlogin_command               builtin
> qlogin_daemon                builtin
> rlogin_command               builtin
> rlogin_daemon                builtin
> rsh_command                  builtin
> rsh_daemon                   builtin
No. I have the default Debian configuration, which is :
rlogin_daemon                /usr/sbin/sshd -i
rlogin_command               /usr/bin/ssh
qlogin_daemon                /usr/sbin/sshd -i
qlogin_command               /usr/share/gridengine/qlogin-wrapper
rsh_daemon                   /usr/sbin/sshd -i
rsh_command                  /usr/bin/ssh
But I think it has never been a problem.

> So Open MPI should detect it's running under SGE and issue `qrsh
> -inherit ...` in the end without the necessity to have somewhere ssh
> inside the cluster around in case it would start something on
> additional nodes. But this is not intended by your setup anyway due to
> the PE definition.
Even when a job fails, strace shows that mpirun used "qrsh -inherit ...".

> 1) Is the `mpiexec` the users use the one you supplied or can it
> happen, that by accident they used a different `mpiexec` or even
> compiled their application with a different MPI library?
It's not another mpirun (strace confirmed that).

> 2) As the used jobscript is available on the node in
> $SGE_JOB_SPOOL_DIR/job_scripts (the directory specified during
> installation to be used by the exechosts): do they show anything
> unusual regarding the PATH settings?
No.
 
> 3) Are all slots for a job coming from one queue only, or are the
> slots collected from several queues on one and the same exechost?
> I.e.: is the same PE "orte" attached to more than one queue?
I only have one queue.

Any other idea ?
-- 
Bernard Massot
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to