Hi, Am 15.07.2011 um 21:14 schrieb Terry Dontje:
> On 7/15/2011 1:46 PM, Paul Kapinos wrote: >> Hi OpenMPI volks (and Oracle/Sun experts), >> >> we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our >> cluster. In the part of the cluster where LDAP is activated, the mpiexec >> does not try to spawn tasks on remote nodes at all, but exits with an error >> message alike below. If 'strace -f' the mpiexec, no exec of "ssh" can be >> found at all. Wondering, mpiexec tries to look into /etc/passwd (where user >> is not in, because using LDAP!). >> > Note this is an area that should be no different than from stock Open MPI. > I would suspect that the message might be coming from ssh. I wouldn't > suspect mpiexec would be looking into /etc/passwd at all, why would it need > to. the output you listed is titled "[unknown-user]". Maybe referring to the password file is a wrong simplification. The test is also on the master node of the parallel job by an usual `getpwuid`. The /etc/nsswitch.conf is fine an the `mpiexec` machine? On this node the user is known too? Can they login because they have no passphrase or because they have an agent running, or did you setup hostbased authentication? > It should just be using ssh. Can you manually ssh to the same node? >> On the old part of the cluster, where NIS is used as the autentification >> method, Sun MPI runs very fine. >> >> So, is Suns MPI compatible with LDAP autotentification method at all? >> > In as far as whatever launcher you use is compatible with LDAP. >> Best wishes, >> >> Paul >> >> >> P.S. in both parts if the cluster, me (login marked as xxxxx here) can login >> to any node by ssh without need to type the password. >From the headnode of the cluster to a node or also between nodes? -- Reuti >> >> >> >> -------------------------------------------------------------------------- >> The user (xxxxx) is unknown to the system (i.e. there is no corresponding >> entry in the password file). Please contact your system administrator >> for a fix. >> -------------------------------------------------------------------------- >> [cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: Fatal >> in file plm_rsh_module.c at line 1058 >> -------------------------------------------------------------------------- >> >> >> >> _______________________________________________ >> users mailing list >> >> [email protected] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > -- > <Mail-Anhang.gif> > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle - Performance Technologies > 95 Network Drive, Burlington, MA 01803 > Email [email protected] > > > > _______________________________________________ > users mailing list > [email protected] > http://www.open-mpi.org/mailman/listinfo.cgi/users
