I have my cluster behind at least two different firewalls, so I just let it use rsh 
and pummeled my security settings until everything worked.  I got really tired of 
seeing that error message before I was done though. :)

In order to use rsh, your head node needs to have a name besides localhost, and I 
would suggest putting that information as well as the names for all your cluster 
machines in /etc/hosts.  I also put them in /etc/hosts.equiv and /etc/hosts.bak 
(though I really don't know what the later one does).  hosts.equiv lists machines that 
are treated as "the same machine".  However if you use ssh, none of this is an issue.

Using ssh would be the beter solution, and I think the OSCAR install should make this 
much easier.  If you can get lam to use it I think the no-password bit (which is 
non-trivial) should already be taken care of.  It might involve recompiling the 
source.  Theres some documentation included with the source code if I recall corectly. 
 I saw some details on it somewhere...

CCing the list to keep folks in the loop (and because I don't have a good answer for 
you).

Original Message -----------------------
This is all base install, I went step for step
from the OSCAR 3.0 installation manual. How can
I switch the lam(6.5.9) to use ssh instead of rsh
tho (as you suggested)? Thanks so much for your help.

-----Original Message-----
From: Michael Edwards [mailto:[EMAIL PROTECTED]
Sent: Monday, April 12, 2004 7:55 PM
To: [EMAIL PROTECTED]
Subject: RE: [Oscar-users] LAM/MPI 6.5.9


Are the neccesary rsh packages even included on the node installs?  I know
when I was setting things up manually thats one thing I tended to forget to
add.  I also am not sure that the packet filtering oscar puts in doesn't
intentionally block rsh type packets after instalation.  You can set it up
to use ssh though, like the 7.0 istallation in OSCAR 3.0 does.

I had problems with getting the permisions set up so rsh would work.  If no
one else has an easy answer I still have one node set up that I did by hand
and I can look at the hosts files compared to the ones OSCAR uses (if it
does).  rsh assumes a much more trusting computing environment than the
setup used in OSCAR 3.0. Original Message ----------------------- Hello, I
am using Redhat 9 (2.4.20-8), OSCAR 3 and I'm trying to 'retrofit' lam-6.5.9
on the cluster to run Abaqus 6.4-1

I followed the 'install' suggested in the "OSCAR
Cluster Admin w/ C3" document (i.e. I manually
compiled and pushed out the Lam-6.5.9 package
and didn't RPM it. I tried rpm'n first, but it seemed
to have broke the cluster and I had to reinstall)

I'm at a point now where I can switch (using switcher)
to lam-6.5.9 and lamboot on the head node w/no errors
but when I run lamboot plus my hostfile (for all the nodes)
I get the following error message (note that I can manually
ssh to remote nodes, but I cannot rsh.. do I need to be
able to? Lam-7.0 is set up properly.. and in that case,
I still cannot rsh to remote nodes).. what else should I
change to allow lam-6.5.9 to work? (note I have done
nothing beyond the exact steps I outlined here):

----- error message -------

$ lamboot hostfile

LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University

computenode1.na.luk.com: Connection refused
----------------------------------------------------------------------------
-
LAM failed to execute a process on the remote node
"computenode1.na.luk.com". LAM was not trying to invoke any LAM-specific
commands yet -- we were simply trying to determine what shell was being used
on the remote host.

LAM tried to use the remote agent command "/usr/bin/rsh"
to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote agent, or
some other configuration type of error in your .cshrc or .profile file.  The
following is a list of items that you may wish to check on the remote node:

        - You have an account and can login to the remote machine
        - Incorrect permissions on your home directory (should
          probably be 0755)
        - Incorrect permissions on your $HOME/.rhosts file (if you are
          using rsh -- they should probably be 0644)
        - You have an entry in the remote $HOME/.rhosts file (if you
          are using rsh) for the machine and username that you are
          running from
        - Your .cshrc/.profile must not print anything out to the
          standard error
        - Your .cshrc/.profile should set a correct TERM type
        - Your .cshrc/.profile should set the SHELL environment
          variable to your default shell

Try invoking the following command at the unix command line:

        /usr/bin/rsh computenode1.na.luk.com -n echo $SHELL

You will need to configure your local setup such that you will *not* be
prompted for a password to invoke this command on the remote node. No output
should be printed from the remote node before the output of the command is
displayed.

When you can get this command to execute successfully by hand, LAM will
probably be able to function properly.
----------------------------------------------------------------------------
-
----------------------------------------------------------------------------
-
lamboot encountered some error (see above) during the boot process, and will
now attempt to kill all nodes that it was previously able to boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may have
LAM daemons still running on remote nodes.
----------------------------------------------------------------------------
-

LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
_____________________________________________________




David C. Jackson
LAN Specialist, IT
LuK Incorporated
Phone: 330.202.6187
E-Mail: [EMAIL PROTECTED]



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to