On Thu, 28 Mar 2002, Senthil Kandasamy wrote: > I am pretty sure that lamboot is running. I even tried to do a lamclean and > then did another lamboot -v lamhosts. Even after that the same pbs error > happens. Any ideas??
This is an easy one (although probably not obvious until you stop and think about it). When using PBS, you are never supposed to do anything "outside of PBS" - you should use PBS for everything, including starting up the LAM runtime environment (running lamboot). This is actually pretty easy - in your case, all you need to do is have a PBS script that looks something like this: ---- #PBS -S /bin/sh #PBS -l nodes=1 #PBS -q normal #PBS -N tielema #PBS -j oe echo "I ran on `hostname`" # bring up LAM environment on allocated nodes # (which are listed in the filename contained in $PBS_NODEFILE) lamboot $PBS_NODEFILE # use mpirun to run my MPI binary with all allocated nodes mpirun C ls # shut down LAM environment lamhalt --- You were getting the "no lamd running" error message because LAM tries to be smart under PBS. If setup a certain way, PBS can allocate two seperate jobs to the same node. If LAM wasn't smart, the two jobs would collide - so we have to play some tricks to avoid colliding - hence the error message that you saw. Since your lamboot was not under the same PBS job, it was completely invisible to the mpirun running under PBS. > Hope to get this "computer" problem fixed ASAP so that I can worry about > "science" That's what we're here for. Hopefully, this should be the end of it:). Brian -- Brian Barrett LAM/MPI developer and all around nice guy Have a LAM/MPI day: http://www.lam-mpi.org/ _______________________________________________ Oscar-users mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/oscar-users
