On Thu, 5 Sep 2002, khoa nguyen wrote:

> I'm using oscar1.2 for RedHat 7.2 , and as I try to
> compile and run our programs w/ LAM, here is the error
> message when I try to run our codes w/ mpirun:  (all
> compiling steps w/ lam-mpicc working fine):

<SNIP CODE>

> ********************
> any suggestions about this?  I do run lamboot in every
> node manually before calling mpirun, so I wonder is
> that because I haven't set up LAM environment
> correctly or somehow?

First, the problem with the segfault.  The failure is occuring in one off 
the internal send functions from inside a call to MPI_Bcast().  Given the 
parameters to MPI_Bcast, there are two options for what the problem is.  
First, at some point previous in the application, you scribbled on the 
memory that holds the the MPI communicator.  Second, your buffer / length 
(which is given by both the count and Dtype parameters) was invalid, so 
when LAM went to read out of that buffer, things went "badly".

Without seeing the code, it is not possible for me to tell exactly what is 
wrong.  Using a debugger might be of some use.  Using a memory-checking 
toool like Purify would probably expose the problem.

When you say "run lamboot in every node", are you running a seperate 
"lamboot" command on every node, or running lamboot once with a hostfile 
with all the nodes in it?  Lamboot should only be run once - it takes care 
of starting the LAM environment on each node in the machine.  You might 
want to take a look at the LAM/MPI faq:

  http://www.lam-mpi.org/faq/

One thing just occurred to me (how's that for thinking while you type...), 
it is possible that the problem is because your application has fewer 
nodes that it is expecting.  If you have hard coded assumptions about the 
size of MPI_COMM_WORLD that aren't being met, that could cause some 
problems.  I only bring this up because if you are running seperate copies 
of lamboot on each node, thatn your world size will be at most 1.  which 
could cause problems...


Hope this helps,

Brian

-- 
  Brian Barrett
  Graduate Student, Open Systems Lab, Indiana University
  http://www.osl.iu.edu/~brbarret/




-------------------------------------------------------
This sf.net email is sponsored by: OSDN - Tired of that same old
cell phone?  Get a new here for FREE!
https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to