I have my cluster behind at least two different firewalls, so I just let it use rsh and pummeled my security settings until everything worked. I got really tired of seeing that error message before I was done though. :)
In order to use rsh, your head node needs to have a name besides localhost, and I would suggest putting that information as well as the names for all your cluster machines in /etc/hosts. I also put them in /etc/hosts.equiv and /etc/hosts.bak (though I really don't know what the later one does). hosts.equiv lists machines that are treated as "the same machine". However if you use ssh, none of this is an issue. Using ssh would be the beter solution, and I think the OSCAR install should make this much easier. If you can get lam to use it I think the no-password bit (which is non-trivial) should already be taken care of. It might involve recompiling the source. Theres some documentation included with the source code if I recall corectly. I saw some details on it somewhere... CCing the list to keep folks in the loop (and because I don't have a good answer for you). Original Message ----------------------- This is all base install, I went step for step from the OSCAR 3.0 installation manual. How can I switch the lam(6.5.9) to use ssh instead of rsh tho (as you suggested)? Thanks so much for your help. -----Original Message----- From: Michael Edwards [mailto:[EMAIL PROTECTED] Sent: Monday, April 12, 2004 7:55 PM To: [EMAIL PROTECTED] Subject: RE: [Oscar-users] LAM/MPI 6.5.9 Are the neccesary rsh packages even included on the node installs? I know when I was setting things up manually thats one thing I tended to forget to add. I also am not sure that the packet filtering oscar puts in doesn't intentionally block rsh type packets after instalation. You can set it up to use ssh though, like the 7.0 istallation in OSCAR 3.0 does. I had problems with getting the permisions set up so rsh would work. If no one else has an easy answer I still have one node set up that I did by hand and I can look at the hosts files compared to the ones OSCAR uses (if it does). rsh assumes a much more trusting computing environment than the setup used in OSCAR 3.0. Original Message ----------------------- Hello, I am using Redhat 9 (2.4.20-8), OSCAR 3 and I'm trying to 'retrofit' lam-6.5.9 on the cluster to run Abaqus 6.4-1 I followed the 'install' suggested in the "OSCAR Cluster Admin w/ C3" document (i.e. I manually compiled and pushed out the Lam-6.5.9 package and didn't RPM it. I tried rpm'n first, but it seemed to have broke the cluster and I had to reinstall) I'm at a point now where I can switch (using switcher) to lam-6.5.9 and lamboot on the head node w/no errors but when I run lamboot plus my hostfile (for all the nodes) I get the following error message (note that I can manually ssh to remote nodes, but I cannot rsh.. do I need to be able to? Lam-7.0 is set up properly.. and in that case, I still cannot rsh to remote nodes).. what else should I change to allow lam-6.5.9 to work? (note I have done nothing beyond the exact steps I outlined here): ----- error message ------- $ lamboot hostfile LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University computenode1.na.luk.com: Connection refused ---------------------------------------------------------------------------- - LAM failed to execute a process on the remote node "computenode1.na.luk.com". LAM was not trying to invoke any LAM-specific commands yet -- we were simply trying to determine what shell was being used on the remote host. LAM tried to use the remote agent command "/usr/bin/rsh" to invoke "echo $SHELL" on the remote node. This usually indicates an authentication problem with the remote agent, or some other configuration type of error in your .cshrc or .profile file. The following is a list of items that you may wish to check on the remote node: - You have an account and can login to the remote machine - Incorrect permissions on your home directory (should probably be 0755) - Incorrect permissions on your $HOME/.rhosts file (if you are using rsh -- they should probably be 0644) - You have an entry in the remote $HOME/.rhosts file (if you are using rsh) for the machine and username that you are running from - Your .cshrc/.profile must not print anything out to the standard error - Your .cshrc/.profile should set a correct TERM type - Your .cshrc/.profile should set the SHELL environment variable to your default shell Try invoking the following command at the unix command line: /usr/bin/rsh computenode1.na.luk.com -n echo $SHELL You will need to configure your local setup such that you will *not* be prompted for a password to invoke this command on the remote node. No output should be printed from the remote node before the output of the command is displayed. When you can get this command to execute successfully by hand, LAM will probably be able to function properly. ---------------------------------------------------------------------------- - ---------------------------------------------------------------------------- - lamboot encountered some error (see above) during the boot process, and will now attempt to kill all nodes that it was previously able to boot (if any). Please wait for LAM to finish; if you interrupt this process, you may have LAM daemons still running on remote nodes. ---------------------------------------------------------------------------- - LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University _____________________________________________________ David C. Jackson LAN Specialist, IT LuK Incorporated Phone: 330.202.6187 E-Mail: [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=click _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
