I'm cc'ing the oscar-users list, so that the LAM/MPI folks can give better feedback on that problem... I make some comments at the end though, so scroll down.
On Tue, 2003-12-02 at 14:20, jix kicks wrote: > hello sir.. I am currently working on mpi programs.I am using > oscar-2.3.1 on my cluster.I am unable to run mpi program on my cluster > through lam I am getting errors during lamboot.I am succeding at > recon. This the error i am getting while running the lam on my > machine. > > lamboot -vd myhosts > > LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University > > lamboot: boot schema file: myhosts > lamboot: opening hostfile myhosts > lamboot: found the following hosts: > lamboot: n0 192.168.10.203 > lamboot: n1 192.168.10.204 > lamboot: resolved hosts: > lamboot: n0 192.168.10.203 --> 192.168.10.203 > lamboot: n1 192.168.10.204 --> 192.168.10.204 > lamboot: found 2 host node(s) > lamboot: origin node is 0 (192.168.10.203) > Executing hboot on n0 (192.168.10.203 - 1 CPU)... > lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H > 192.168.1 > 0.203 -P 53453 -n 0 -o 0 "" > hboot: process schema = "/etc/lam/lam-conf.lam" > hboot: found /usr/bin/lamd > hboot: performing tkill > hboot: tkill > hboot: booting... > hboot: fork /usr/bin/lamd > hboot: attempting to execute > [1] 15626 lamd -H 192.168.10.203 -P 53453 -n 0 -o 0 -d > Executing hboot on n1 (192.168.10.204 - 1 CPU)... > lamboot: attempting to execute "/usr/b! in/ssh -x -a 192.168.10.204 -n > echo $SHELL > " > lamboot: got remote shell /bin/bash > lamboot: attempting to execute "/usr/bin/ssh -x -a 192.168.10.204 -n > hboot -t -c > lam-conf.lam -d -v -s -I "-H 192.168.10.203 -P 53453 -n 1 -o 0 "" > base: cannot find process schema (null): > ----------------------------------------------------------------------------- > lamboot encountered some error (see above) during the boot process, > and will now attempt to kill all nodes that it was previously able to > boot (if any). > > Please wait for LAM to finish; if you interrupt this process, you may > have LAM daemons still running on remote nodes. > ----------------------------------------------------------------------------- > wipe ... > > LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University > > Executing tkill on n0 (192.168.10.203)... > Executing tkill on n1 (192.168.10.204)... > lamboot did NOT complete successfully > > I am getting this error please help me .I had created the users after > the oscar-installed and configured ssh not to ask any password.reply SSH should work automatically to allow users to ssh without passwords due to the shared /home filesystem. You may need to wait for the user information to propogate to the client nodes. This should take no longer than 15 minutes, but can be forced by running /opt/opium/bin/sync_users --force Jason ------------------------------------------------------- This SF.net email is sponsored by: SF.net Giveback Program. Does SourceForge.net help you be more productive? Does it help you create better code? SHARE THE LOVE, and help us help YOU! Click Here: http://sourceforge.net/donate/ _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
