Hi Terry, Reuti,

good news: we've solved/workarounded the problem with CT/8.2.1c :o)

the "fix" was easy: we used the 64bit version of the 'mpiexec' instead of [previously-used as default] 32bit version. The 64bit version version works now with both NIS and LDAP autentification modi. The32bit version works with the NIS-autentificated part of our cluster, only.

Thanks for your help!

Best wishes
Paul Kapinos



Reuti wrote:
Hi,

Am 15.07.2011 um 21:14 schrieb Terry Dontje:

On 7/15/2011 1:46 PM, Paul Kapinos wrote:
Hi OpenMPI volks (and Oracle/Sun experts), we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of our cluster. In the part of the cluster where LDAP is activated, the mpiexec does not try to spawn tasks on remote nodes at all, but exits with an error message alike below. If 'strace -f' the mpiexec, no exec of "ssh" can be found at all. Wondering, mpiexec tries to look into /etc/passwd (where user is not in, because using LDAP!).
Note this is an area that should be no different than from stock Open MPI.

"should not" but it is :o)
However, I compare CT/8.2.1c with self-compiled OpenMPI/1.4.3 which are far different releases. And they behave definitely in different way: in selv-compiled OpenMPI both 32bit and 64bit mpiexecs work with NIS and with LDAP, and the CT/8.2.1c mpiexec in 32bit does work with NIS only.



I would suspect that the message might be coming from ssh.  I wouldn't suspect 
mpiexec would be looking into /etc/passwd at all, why would it need to.

the output you listed is titled "[unknown-user]". Maybe referring to the 
password file is a wrong simplification. The test is also on the master node of the 
parallel job by an usual `getpwuid`. The /etc/nsswitch.conf is fine an the `mpiexec` 
machine?

On this node the user is known too? Can they login because they have no 
passphrase or because they have an agent running, or did you setup hostbased 
authentication?

my user is known on each node and is allowed to log in (without password) from any to any node. In /etc/passwd there is no password for my user; all auth thins are done by NIS or LDAP. (sorry I cannot tell more because this is admin stuff, but as said: "ssh" works from any to any node without password). /etc/nsswitch.conf seem to be fine (it works now with the 64bit version of mpiexec :o)




 It should just be using ssh.  Can you manually ssh to the same node?
On the old part of the cluster, where NIS is used as the autentification method, Sun MPI runs very fine. So, is Suns MPI compatible with LDAP autotentification method at all?
In as far as whatever launcher you use is compatible with LDAP.
Best wishes, Paul

P.S. in both parts if the cluster, me (login marked as xxxxx here) can login to any node by ssh without need to type the password.

From the headnode of the cluster to a node or also between nodes?

-- Reuti




-------------------------------------------------------------------------- The user (xxxxx) is unknown to the system (i.e. there is no corresponding entry in the password file). Please contact your system administrator for a fix. -------------------------------------------------------------------------- [cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: Fatal in file plm_rsh_module.c at line 1058 --------------------------------------------------------------------------

--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to