Hi Jerome:

You can either:

1) Turn on rsh on the cluster nodes (this is turned off by default on
OSCAR cluster)
2) Use ssh instead of rsh... export LAMRSH='ssh -x' (I think that's the
correct environment variable).

Cheers,

Bernard 

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Lefevre Jerome
> Sent: Monday, March 21, 2005 15:05
> To: [email protected]
> Subject: [Oscar-users] RE: lamboot not found
> 
> Hi,
> 
> Now, my default MPI propagates cross the cluster, because my 
> home was not mounted. I think my HomeDir was not mounted 
> because i boot first nodes before cluster. Cexec mount -a and 
> all is right now. "cexec switcher mpi" 
> show me my new default MPI "Lam-oscar-7.0-ifort"
> 
> To test my LAM configuration, i edit by hand 
> Lam-7.0-ifort/etc/lam-bhost.def on my front-end with my nodes 
> and frontend.
> 
> However, if i type lamboot -v, Lam complains about :
> n-1<6228> ssi:boot:base:linear:booting n0 
> (node1.cluster.ird.nc) ERROR : node1.cluster.ird.nc : 
> connection refused
> 
> The following told about rsh...
> 
> I have a doubt : When I configure Lam-7.0.6 from source, i 
> omit to specify configure --with-rsh="ssh -x" ? Is matter ?
> 
> See below my output from "lamboot -d"
> 
> Many thanks
> 
> jerome
> 
> LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
> 
> n-1<6372> ssi:boot:base: looking for boot schema in following 
> directories:
> n-1<6372> ssi:boot:base:   <current directory>
> n-1<6372> ssi:boot:base:   $TROLLIUSHOME/etc
> n-1<6372> ssi:boot:base:   $LAMHOME/etc
> n-1<6372> ssi:boot:base:   /opt/lam-7.0-ifort/etc
> n-1<6372> ssi:boot:base: looking for boot schema file:
> n-1<6372> ssi:boot:base:   lam-bhost.def
> n-1<6372> ssi:boot:base: found boot schema: 
> /opt/lam-7.0-ifort/etc/lam-bhost.defn-1<6372> ssi:boot:rsh: 
> found the following hosts:
> n-1<6372> ssi:boot:rsh:   n0 node1.cluster.ird.nc (cpu=2)
> n-1<6372> ssi:boot:rsh:   n1 node2.cluster.ird.nc (cpu=2)
> n-1<6372> ssi:boot:rsh:   n2 node3.cluster.ird.nc (cpu=2)
> n-1<6372> ssi:boot:rsh:   n3 editr.cluster.ird.nc (cpu=2)
> n-1<6372> ssi:boot:rsh: resolved hosts:
> n-1<6372> ssi:boot:rsh:   n0 node1.cluster.ird.nc --> 192.168.150.1
> n-1<6372> ssi:boot:rsh:   n1 node2.cluster.ird.nc --> 192.168.150.2
> n-1<6372> ssi:boot:rsh:   n2 node3.cluster.ird.nc --> 192.168.150.3
> n-1<6372> ssi:boot:rsh:   n3 editr.cluster.ird.nc --> 
> 192.168.150.50 (origin)
> n-1<6372> ssi:boot:rsh: starting RTE procs n-1<6372> 
> ssi:boot:base:linear: starting n-1<6372> 
> ssi:boot:base:server: opening server TCP socket n-1<6372> 
> ssi:boot:base:server: opened port 37073 n-1<6372> 
> ssi:boot:base:linear: booting n0 (node1.cluster.ird.nc) 
> n-1<6372> ssi:boot:rsh: starting lamd on 
> (node1.cluster.ird.nc) n-1<6372> ssi:boot:rsh: starting on n0 
> (node1.cluster.ird.nc): hboot -t -c lam-conf.lamd -d -s -I 
> "-H 192.168.150.50 -P 37073 -n 0 -o 3"
> n-1<6372> ssi:boot:rsh: launching remotely n-1<6372> 
> ssi:boot:rsh: attempting to execute "rsh node1.cluster.ird.nc 
> -n echo $SHELL"
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> node1.cluster.ird.nc: Connection refused
> --------------------------------------------------------------
> ---------------
> LAM failed to execute a process on the remote node 
> "node1.cluster.ird.nc".
> LAM was not trying to invoke any LAM-specific commands yet -- 
> we were simply trying to determine what shell was being used 
> on the remote host.
> 
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
> 
> This usually indicates an authentication problem with the 
> remote agent, or some other configuration type of error in 
> your .cshrc or .profile file.  The following is a list of 
> items that you may wish to check on the remote node:
> 
>          - You have an account and can login to the remote machine
>          - Incorrect permissions on your home directory (should
>            probably be 0755)
>          - Incorrect permissions on your $HOME/.rhosts file 
> (if you are
>            using rsh -- they should probably be 0644)
>          - You have an entry in the remote $HOME/.rhosts file (if you
>            are using rsh) for the machine and username that you are
>            running from
>          - Your .cshrc/.profile must not print anything out to the
>            standard error
>          - Your .cshrc/.profile should set a correct TERM type
>          - Your .cshrc/.profile should set the SHELL environment
>            variable to your default shell
> 
> Try invoking the following command at the unix command line:
> 
>          rsh node1.cluster.ird.nc -n echo $SHELL
> 
> You will need to configure your local setup such that you 
> will *not* be prompted for a password to invoke this command 
> on the remote node.
> No output should be printed from the remote node before the 
> output of the command is displayed.
> 
> When you can get this command to execute successfully by 
> hand, LAM will probably be able to function properly.
> --------------------------------------------------------------
> ---------------
> n-1<6372> ssi:boot:base:linear: Failed to boot n0 
> (node1.cluster.ird.nc) n-1<6372> ssi:boot:base:server: 
> closing server socket n-1<6372> ssi:boot:base:linear: aborted!
> --------------------------------------------------------------
> ---------------
> lamboot encountered some error (see above) during the boot 
> process, and will now attempt to kill all nodes that it was 
> previously able to boot (if any).
> 
> Please wait for LAM to finish; if you interrupt this process, 
> you may have LAM daemons still running on remote nodes.
> --------------------------------------------------------------
> ---------------
> lamboot: wipe -- nothing to do
> lamboot did NOT complete successfully
> 
> 
> 
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide Read honest & 
> candid reviews on hundreds of IT Products from real users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Oscar-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oscar-users
> 


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_ide95&alloc_id396&op=click
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to