On Thu, Apr 11, 2013 at 11:31:12AM +0200, Reuti wrote: > Am 11.04.2013 um 10:41 schrieb Bernard Massot: > > I'm new to parallel computing and have a problem with Open MPI jobs in > 0) Which version of Open MPI are you using? 1.4.2 > This is a fine setup. I assume the setting in SGE's configuration is: > > $ qconf -sconf > ... > qlogin_command builtin > qlogin_daemon builtin > rlogin_command builtin > rlogin_daemon builtin > rsh_command builtin > rsh_daemon builtin No. I have the default Debian configuration, which is : rlogin_daemon /usr/sbin/sshd -i rlogin_command /usr/bin/ssh qlogin_daemon /usr/sbin/sshd -i qlogin_command /usr/share/gridengine/qlogin-wrapper rsh_daemon /usr/sbin/sshd -i rsh_command /usr/bin/ssh But I think it has never been a problem.
> So Open MPI should detect it's running under SGE and issue `qrsh > -inherit ...` in the end without the necessity to have somewhere ssh > inside the cluster around in case it would start something on > additional nodes. But this is not intended by your setup anyway due to > the PE definition. Even when a job fails, strace shows that mpirun used "qrsh -inherit ...". > 1) Is the `mpiexec` the users use the one you supplied or can it > happen, that by accident they used a different `mpiexec` or even > compiled their application with a different MPI library? It's not another mpirun (strace confirmed that). > 2) As the used jobscript is available on the node in > $SGE_JOB_SPOOL_DIR/job_scripts (the directory specified during > installation to be used by the exechosts): do they show anything > unusual regarding the PATH settings? No. > 3) Are all slots for a job coming from one queue only, or are the > slots collected from several queues on one and the same exechost? > I.e.: is the same PE "orte" attached to more than one queue? I only have one queue. Any other idea ? -- Bernard Massot _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
