Title: LAM/MPI 6.5.9

Hello,
I am using Redhat 9 (2.4.20-8), OSCAR 3 and
I'm trying to 'retrofit' lam-6.5.9 on the cluster
to run Abaqus 6.4-1

I followed the 'install' suggested in the "OSCAR
Cluster Admin w/ C3" document (i.e. I manually
compiled and pushed out the Lam-6.5.9 package
and didn't RPM it. I tried rpm'n first, but it seemed
to have broke the cluster and I had to reinstall)

I'm at a point now where I can switch (using switcher)
to lam-6.5.9 and lamboot on the head node w/no errors
but when I run lamboot plus my hostfile (for all the nodes)
I get the following error message (note that I can manually
ssh to remote nodes, but I cannot rsh.. do I need to be
able to? Lam-7.0 is set up properly.. and in that case,
I still cannot rsh to remote nodes).. what else should I
change to allow lam-6.5.9 to work? (note I have done
nothing beyond the exact steps I outlined here):

----- error message -------

$ lamboot hostfile

LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University

computenode1.na.luk.com: Connection refused
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "computenode1.na.luk.com".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.

LAM tried to use the remote agent command "/usr/bin/rsh"
to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote
agent, or some other configuration type of error in your .cshrc or
.profile file.  The following is a list of items that you may wish to
check on the remote node:

        - You have an account and can login to the remote machine
        - Incorrect permissions on your home directory (should
          probably be 0755)
        - Incorrect permissions on your $HOME/.rhosts file (if you are
          using rsh -- they should probably be 0644)
        - You have an entry in the remote $HOME/.rhosts file (if you
          are using rsh) for the machine and username that you are
          running from
        - Your .cshrc/.profile must not print anything out to the
          standard error
        - Your .cshrc/.profile should set a correct TERM type
        - Your .cshrc/.profile should set the SHELL environment
          variable to your default shell

Try invoking the following command at the unix command line:

        /usr/bin/rsh computenode1.na.luk.com -n echo $SHELL

You will need to configure your local setup such that you will *not*
be prompted for a password to invoke this command on the remote node.
No output should be printed from the remote node before the output of
the command is displayed.

When you can get this command to execute successfully by hand, LAM
will probably be able to function properly.
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------

LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
_____________________________________________________




David C. Jackson
LAN Specialist, IT
LuK Incorporated
Phone: 330.202.6187
E-Mail: dave.jackson@luk-us.com


Reply via email to