On Thu, 28 Mar 2002, Senthil Kandasamy wrote:

> I am pretty sure that lamboot is running. I even tried to do a lamclean and
> then did another lamboot -v lamhosts.  Even after that the same pbs error
> happens.  Any ideas??

This is an easy one (although probably not obvious until you stop and 
think about it).  When using PBS, you are never supposed to do anything 
"outside of PBS" - you should use PBS for everything, including starting 
up the LAM runtime environment (running lamboot).  This is actually pretty 
easy - in your case, all you need to do is have a PBS script that looks 
something like this:

----
#PBS -S /bin/sh
#PBS -l nodes=1
#PBS -q normal
#PBS -N tielema
#PBS -j oe
echo "I ran on `hostname`"

# bring up LAM environment on allocated nodes
# (which are listed in the filename contained in $PBS_NODEFILE)
lamboot $PBS_NODEFILE

# use mpirun to run my MPI binary with all allocated nodes
mpirun C ls

# shut down LAM environment
lamhalt
---

You were getting the "no lamd running" error message because LAM tries to 
be smart under PBS.  If setup a certain way, PBS can allocate two seperate 
jobs to the same node.  If LAM wasn't smart, the two jobs would collide - 
so we have to play some tricks to avoid colliding - hence the error 
message that you saw.  Since your lamboot was not under the same PBS job, 
it was completely invisible to the mpirun running under PBS.

> Hope to get this "computer" problem fixed ASAP so that I can worry about
> "science"

That's what we're here for.  Hopefully, this should be the end of it:).

Brian

-- 
  Brian Barrett
  LAM/MPI developer and all around nice guy
  Have a LAM/MPI day: http://www.lam-mpi.org/



_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to