[OMPI users] Using openmpi within python and crashes

2009-07-09 Thread John R. Cary
Our scenario is that we are running python, then importing a module 
written in Fortran.

We run via:

mpiexec -n 8 -x PYTHONPATH -x SIDL_DLL_PATH python tokHsmNP8.py

where the script calls into Fortran to call MPI_Init.

On 8 procs (but not one) we get hangs in the code (on some machines but 
not others!). 
Hard to tell precisely where, because it is in a PETSc method.


Running with valgrind

mpiexec -n 8 -x PYTHONPATH -x SIDL_DLL_PATH valgrind python tokHsmNP8.py

gives a crash, with some salient output:

==936==
==936== Syscall param sched_setaffinity(mask) points to unaddressable 
byte(s)

==936==at 0x39336DAA79: syscall (in /lib64/libc-2.10.1.so)
==936==by 0x10BCBD58: opal_paffinity_linux_plpa_api_probe_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)
==936==by 0x10BCE054: opal_paffinity_linux_plpa_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)
==936==by 0x10BCC9F9: 
opal_paffinity_linux_plpa_have_topology_information (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)
==936==by 0x10BCBBFF: linux_module_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)
==936==by 0x10BC99C3: opal_paffinity_base_select (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)
==936==by 0x10B9DB83: opal_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-pal.so.0.0.0)
==936==by 0x10920C6C: orte_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libopen-rte.so.0.0.0)
==936==by 0x10579D06: ompi_mpi_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libmpi.so.0.0.0)
==936==by 0x10599175: PMPI_Init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libmpi.so.0.0.0)
==936==by 0x10E2BDF4: mpi_init (in 
/usr/local/openmpi-1.3.2-notorque/lib/libmpi_f77.so.0.0.0)
==936==by 0xDF30A1F: uedge_mpiinit_ (in 
/home/research/cary/projects/facetsall-iter/physics/uedge/par/build/uedgeC.so)

==936==  Address 0x0 is not stack'd, malloc'd or (recently) free'd

This makes me think that our call to mpi_init is wrong.  At

 http://www.mcs.anl.gov/research/projects/mpi/www/www3/MPI_Init.html

it says

 Because the Fortran and C versions of MPI_Init 
 are 
different, there is a restriction on who can call MPI_Init 
. The 
version (Fortran or C) must match the main
 program. That is, if the main program is in C, then the C version of 
MPI_Init 
 must be 
called. If the main program is in Fortran, the Fortran version must be 
called.


Should I infer from this that since python is a C code, one must call 
the C version of MPI_Init (with argc, argv)?


Or since the module is written mostly in Fortran with mpi calls of only 
the Fortran variety, I can initialize

with the Fortran MPI_Init?

Thanks.John Cary








Re: [OMPI users] bulding rpm

2009-07-09 Thread rahmani

- Original Message -
From: "Jeff Squyres" 
To: "Open MPI Users" 
Sent: Thursday, July 9, 2009 10:10:30 AM (GMT-0500) America/New_York
Subject: Re: [OMPI users] bulding rpm

On Jul 9, 2009, at 10:22 AM, rahmani wrote:

> yes, they are intel library and all are in LD_LIBRARY_PATH
> /usr/local/openmpi/intel/1.3.2/bin/mpif90 --showme
> gfortran -I/usr/local/include -pthread -I/usr/local/lib -L/usr/local/ 
> lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,-- 
> export-dynamic -lnsl -lutil -lm -ldl
>
> ldd /usr/local/openmpi/intel/1.3.2/bin/mpirun
> linux-vdso.so.1 =>  (0x7fff555fd000)
> libopen-rte.so.0 => /usr/local/lib/libopen-rte.so.0  
> (0x7fa64d154000)
> libopen-pal.so.0 => /usr/local/lib/libopen-pal.so.0  
> (0x7fa64cef2000)
> libdl.so.2 => /lib64/libdl.so.2 (0x7fa64ccee000)
> libnsl.so.1 => /lib64/libnsl.so.1 (0x7fa64cad6000)
> libutil.so.1 => /lib64/libutil.so.1 (0x7fa64c8d3000)
> libm.so.6 => /lib64/libm.so.6 (0x7fa64c67d000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fa64c466000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x7fa64c24a000)
> libc.so.6 => /lib64/libc.so.6 (0x7fa64bef1000)
> /lib64/ld-linux-x86-64.so.2 (0x7fa64d3b2000)
>
> I had another openmpi in my computer. I configure openmpi 1.3 with  
> gnu compiler and --prefix=/usr/local
>
> I can not use both of them?
>

You can; you just need to set your LD_LIBRARY_PATH appropriately.

Specifically, when you run "mpirun" (or "mpif90" or ..."), it looks to  
find libopen-rte.so.  The first one that it is finding is the "wrong"  
one -- the one in /usr/local.  You therefore get "wrong" results  
because it's behaving like the one installed in /usr/local.

You can probably prefix your LD_LIBRARY_PATH with /usr/local/openmpi/ 
intel/1.3.2/lib and then it'll use the "right" ilibopen-rte.so.

Make sense?

-- 
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi,
very thanks for your help
i set LD_LIBRARY_PATH=/opt/intel/Compiler/11.0/069/lib/intel64/ 
then when I use mpif90 and mpirun with full path it works correctly:
 /usr/local/openmpi/intel/1.3.2/bin/mpif90
ifort: command line error: no files specified; for help type "ifort -help"

 /usr/local/openmpi/intel/1.3.2/bin/mpirun -np 2 hostname
suse11
suse11

Thanks you
With best regards



Re: [OMPI users] Segfault when using valgrind

2009-07-09 Thread Justin Luitjens
I was able to get rid of  the segfaults/invalid reads by disabling the
shared memory path.  They still reported an error with uninitialized memory
in the same spot which I believe is due to the struct being padded for
alignment.  I added a supression and was able to get past this part just
fine.

Thanks,
Justin

On Thu, Jul 9, 2009 at 5:16 AM, Jeff Squyres  wrote:

> On Jul 7, 2009, at 11:47 AM, Justin wrote:
>
>  (Sorry if this is posted twice, I sent the same email yesterday but it
>> never appeared on the list).
>>
>>
> Sorry for the delay in replying.  FWIW, I got your original message as
> well.
>
>  Hi,  I am attempting to debug a memory corruption in an mpi program
>> using valgrind.  However, when I run with valgrind I get semi-random
>> segfaults and valgrind messages with the openmpi library.  Here is an
>> example of such a seg fault:
>>
>> ==6153==
>> ==6153== Invalid read of size 8
>> ==6153==at 0x19102EA0: (within /usr/lib/openmpi/lib/openmpi/
>> mca_btl_sm.so)
>>
>>  ...
>
>> ==6153==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
>> ^G^G^GThread "main"(pid 6153) caught signal SIGSEGV at address (nil)
>> (segmentation violation)
>>
>> Looking at the code for our isend at SFC.h:298 does not seem to have any
>> errors:
>>
>> =
>>  MergeInfo myinfo,theirinfo;
>>
>>  MPI_Request srequest, rrequest;
>>  MPI_Status status;
>>
>>  myinfo.n=n;
>>  if(n!=0)
>>  {
>>myinfo.min=sendbuf[0].bits;
>>myinfo.max=sendbuf[n-1].bits;
>>  }
>>  //cout << rank << " n:" << n << " min:" << (int)myinfo.min << "max:"
>> << (int)myinfo.max << endl;
>>
>>  MPI_Isend(&myinfo,sizeof(MergeInfo),MPI_BYTE,to,0,Comm,&srequest);
>> ==
>>
>> myinfo is a struct located on the stack, to is the rank of the processor
>> that the message is being sent to, and srequest is also on the stack.
>> In addition this message is waited on prior to exiting this block of
>> code so they still exist on the stack.  When I don't run with valgrind
>> my program runs past this point just fine.
>>
>>
> Strange.  I can't think of an immediate reason as to why this would happen
> -- does it also happen if you use a blocking send (vs. an Isend)?  Is myinfo
> a complex object, or a variable-length object?
>
>
>  I am currently using openmpi 1.3 from the debian unstable branch.  I
>> also see the same type of segfault in a different portion of the code
>> involving an MPI_Allgather which can be seen below:
>>
>> ==
>> ==22736== Use of uninitialised value of size 8
>> ==22736==at 0x19104775: mca_btl_sm_component_progress
>> (opal_list.h:322)
>> ==22736==by 0x1382CE09: opal_progress (opal_progress.c:207)
>> ==22736==by 0xB404264: ompi_request_default_wait_all (condition.h:99)
>> ==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual
>> (coll_tuned_util.c:55)
>> ==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck
>> (coll_tuned_util.h:60)
>> ==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121)
>> ==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728)
>> ==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc:537)
>> ==22736==by 0x6465457:
>> Uintah::Grid::problemSetup(Uintah::Handle const&,
>> Uintah::ProcessorGroup const*, bool) (Grid.cc:866)
>> ==22736==by 0x8345759: Uintah::SimulationController::gridSetup()
>> (SimulationController.cc:243)
>> ==22736==by 0x834F418: Uintah::AMRSimulationController::run()
>> (AMRSimulationController.cc:117)
>> ==22736==by 0x4089AE: main (sus.cc:629)
>> ==22736==
>> ==22736== Invalid read of size 8
>> ==22736==at 0x19104775: mca_btl_sm_component_progress
>> (opal_list.h:322)
>> ==22736==by 0x1382CE09: opal_progress (opal_progress.c:207)
>> ==22736==by 0xB404264: ompi_request_default_wait_all (condition.h:99)
>> ==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual
>> (coll_tuned_util.c:55)
>> ==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck
>> (coll_tuned_util.h:60)
>> ==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121)
>> ==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728)
>> ==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc:537)
>> ==22736==by 0x6465457:
>> Uintah::Grid::problemSetup(Uintah::Handle const&,
>> Uintah::ProcessorGroup const*, bool) (Grid.cc:866)
>> ==22736==by 0x8345759: Uintah::SimulationController::gridSetup()
>> (SimulationController.cc:243)
>> ==22736==by 0x834F418: Uintah::AMRSimulationController::run()
>> (AMRSimulationController.cc:117)
>> ==22736==by 0x4089AE: main (sus.cc:629)
>> 
>>
>> Are these problems with openmpi and is there any known work arounds?
>>
>>
>
> These are new to me.  The problem does seem to occur with OMPI's shared
> memory device; you might want to try a different point-to-point 

Re: [OMPI users] fault tolerance in open mpi

2009-07-09 Thread Durga Choudhury
Although I have perhaps the least experience on the topic in this
list, I will take a shot; more experienced people, please correct me:

MPI standards specify communication mechanism, not fault tolerance at
any level. You may achieve network tolerance at the IP level by
implementing 'equal cost multipath' routes (which means two equally
capable NIC cards connecting to the same destination and modifying the
kernel routing table to use both cards; the kernel will dynamically
load balance.). At the MAC level, you can achieve the same effect by
trunking multiple network cards.

You can achieve process level fault tolerance by a checkpointing
scheme such as BLCR, which has been tested to work with OpenMPI (and
other processes as well)

Durga

On Thu, Jul 9, 2009 at 4:57 AM, vipin kumar wrote:
>
> Hi all,
>
> I want to know whether open mpi supports Network and process fault tolerance
> or not? If there is any example demonstrating these features that will be
> best.
>
> Regards,
> --
> Vipin K.
> Research Engineer,
> C-DOTB, India
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] bulding rpm

2009-07-09 Thread Jeff Squyres

On Jul 9, 2009, at 10:22 AM, rahmani wrote:


yes, they are intel library and all are in LD_LIBRARY_PATH
/usr/local/openmpi/intel/1.3.2/bin/mpif90 --showme
gfortran -I/usr/local/include -pthread -I/usr/local/lib -L/usr/local/ 
lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,-- 
export-dynamic -lnsl -lutil -lm -ldl


ldd /usr/local/openmpi/intel/1.3.2/bin/mpirun
linux-vdso.so.1 =>  (0x7fff555fd000)
libopen-rte.so.0 => /usr/local/lib/libopen-rte.so.0  
(0x7fa64d154000)
libopen-pal.so.0 => /usr/local/lib/libopen-pal.so.0  
(0x7fa64cef2000)

libdl.so.2 => /lib64/libdl.so.2 (0x7fa64ccee000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x7fa64cad6000)
libutil.so.1 => /lib64/libutil.so.1 (0x7fa64c8d3000)
libm.so.6 => /lib64/libm.so.6 (0x7fa64c67d000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fa64c466000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fa64c24a000)
libc.so.6 => /lib64/libc.so.6 (0x7fa64bef1000)
/lib64/ld-linux-x86-64.so.2 (0x7fa64d3b2000)

I had another openmpi in my computer. I configure openmpi 1.3 with  
gnu compiler and --prefix=/usr/local


I can not use both of them?



You can; you just need to set your LD_LIBRARY_PATH appropriately.

Specifically, when you run "mpirun" (or "mpif90" or ..."), it looks to  
find libopen-rte.so.  The first one that it is finding is the "wrong"  
one -- the one in /usr/local.  You therefore get "wrong" results  
because it's behaving like the one installed in /usr/local.


You can probably prefix your LD_LIBRARY_PATH with /usr/local/openmpi/ 
intel/1.3.2/lib and then it'll use the "right" ilibopen-rte.so.


Make sense?

--
Jeff Squyres
Cisco Systems



Re: [OMPI users] bulding rpm

2009-07-09 Thread rahmani

- Original Message -
From: "Jeff Squyres" 
To: "Open MPI Users" 
Sent: Thursday, July 9, 2009 7:34:49 AM (GMT-0500) America/New_York
Subject: Re: [OMPI users] bulding rpm

On Jul 7, 2009, at 1:32 AM, rahmani wrote:

> it create openmpi-1.3.2-1.x86_64.rpm  with no error, but when I  
> install it with rpm -ivh I see:
> error: Failed dependencies:
> libifcoremt.so.5()(64bit) is needed by openmpi-1.3.2-1.x86_64
> libifport.so.5()(64bit) is needed by openmpi-1.3.2-1.x86_64
> libimf.so()(64bit) is needed by openmpi-1.3.2-1.x86_64
> libintlc.so.5()(64bit) is needed by openmpi-1.3.2-1.x86_64
> libiomp5.so()(64bit) is needed by openmpi-1.3.2-1.x86_64
> libsvml.so()(64bit) is needed by openmpi-1.3.2-1.x86_64
> libtorque.so.2()(64bit) is needed by openmpi-1.3.2-1.x86_64
> but all above library are in my computer
>

Several of these look like they are the intel compiler libraries.  I  
think that libtorque is the Torque support library (resource manager)  
-- which seems weird because you explicitly built with SGE support.

Where are these library files located -- can they all be found via  
your LD_LIBRARY_PATH?

> I use rpm -ivh --nodeps and it install completely, but when I use  
> mpif90 and mpirun I see:
>   $ /usr/local/openmpi/intel/1.3.2/bin/mpif90
> gfortran: no input files   (I compile with ifort)
>

What does "/usr/local/openmpi/intel/1.3.2/bin/mpif90 --showme" show?

Does your LD_LIBRARY_PATH include a directory where another Open MPI  
is installed?  E.g., could running mpif90 be picking up the "wrong"  
MPI installation?  (and therefore picking up the other OMPI's compiler  
preference -- gfortran in this case)

>   $ /usr/local/openmpi/intel/1.3.2/bin/mpirun
> usr/local/openmpi/intel/1.3.2/bin/mpirun: symbol lookup error: /usr/ 
> local/openmpi/intel/1.3.2/bin/mpirun: undefined symbol: orted_cmd_line
>

mpirun didn't complain that it couldn't find its supporting .so  
libraries, so I'm guessing LD_LIBRARY_PATH is pointing to *some* OMPI  
libraries, but the fact that it can't find a symbol that it wants  
leads me to believe that it's pointing to the "wrong" OMPI install.   
What does "ldd /usr/local/openmpi/intel/1.3.2/bin/mpirun" show?

-- 
Jeff Squyres
Cisco Systems

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Dear Jeff Squyres
Thanks for your reply

yes, they are intel library and all are in LD_LIBRARY_PATH
/usr/local/openmpi/intel/1.3.2/bin/mpif90 --showme
gfortran -I/usr/local/include -pthread -I/usr/local/lib -L/usr/local/lib 
-lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl 
-lutil -lm -ldl

ldd /usr/local/openmpi/intel/1.3.2/bin/mpirun
linux-vdso.so.1 =>  (0x7fff555fd000)
libopen-rte.so.0 => /usr/local/lib/libopen-rte.so.0 (0x7fa64d154000)
libopen-pal.so.0 => /usr/local/lib/libopen-pal.so.0 (0x7fa64cef2000)
libdl.so.2 => /lib64/libdl.so.2 (0x7fa64ccee000)
libnsl.so.1 => /lib64/libnsl.so.1 (0x7fa64cad6000)
libutil.so.1 => /lib64/libutil.so.1 (0x7fa64c8d3000)
libm.so.6 => /lib64/libm.so.6 (0x7fa64c67d000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x7fa64c466000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x7fa64c24a000)
libc.so.6 => /lib64/libc.so.6 (0x7fa64bef1000)
/lib64/ld-linux-x86-64.so.2 (0x7fa64d3b2000)


I had another openmpi in my computer. I configure openmpi 1.3 with gnu compiler 
and --prefix=/usr/local


I can not use both of them?
how I can solve this problem?






Re: [OMPI users] bulding rpm

2009-07-09 Thread Jeff Squyres

On Jul 7, 2009, at 1:32 AM, rahmani wrote:

it create openmpi-1.3.2-1.x86_64.rpm  with no error, but when I  
install it with rpm -ivh I see:

error: Failed dependencies:
libifcoremt.so.5()(64bit) is needed by openmpi-1.3.2-1.x86_64
libifport.so.5()(64bit) is needed by openmpi-1.3.2-1.x86_64
libimf.so()(64bit) is needed by openmpi-1.3.2-1.x86_64
libintlc.so.5()(64bit) is needed by openmpi-1.3.2-1.x86_64
libiomp5.so()(64bit) is needed by openmpi-1.3.2-1.x86_64
libsvml.so()(64bit) is needed by openmpi-1.3.2-1.x86_64
libtorque.so.2()(64bit) is needed by openmpi-1.3.2-1.x86_64
but all above library are in my computer



Several of these look like they are the intel compiler libraries.  I  
think that libtorque is the Torque support library (resource manager)  
-- which seems weird because you explicitly built with SGE support.


Where are these library files located -- can they all be found via  
your LD_LIBRARY_PATH?


I use rpm -ivh --nodeps and it install completely, but when I use  
mpif90 and mpirun I see:

  $ /usr/local/openmpi/intel/1.3.2/bin/mpif90
gfortran: no input files   (I compile with ifort)



What does "/usr/local/openmpi/intel/1.3.2/bin/mpif90 --showme" show?

Does your LD_LIBRARY_PATH include a directory where another Open MPI  
is installed?  E.g., could running mpif90 be picking up the "wrong"  
MPI installation?  (and therefore picking up the other OMPI's compiler  
preference -- gfortran in this case)



  $ /usr/local/openmpi/intel/1.3.2/bin/mpirun
usr/local/openmpi/intel/1.3.2/bin/mpirun: symbol lookup error: /usr/ 
local/openmpi/intel/1.3.2/bin/mpirun: undefined symbol: orted_cmd_line




mpirun didn't complain that it couldn't find its supporting .so  
libraries, so I'm guessing LD_LIBRARY_PATH is pointing to *some* OMPI  
libraries, but the fact that it can't find a symbol that it wants  
leads me to believe that it's pointing to the "wrong" OMPI install.   
What does "ldd /usr/local/openmpi/intel/1.3.2/bin/mpirun" show?


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Segfault when using valgrind

2009-07-09 Thread Jeff Squyres

On Jul 7, 2009, at 11:47 AM, Justin wrote:


(Sorry if this is posted twice, I sent the same email yesterday but it
never appeared on the list).



Sorry for the delay in replying.  FWIW, I got your original message as  
well.



Hi,  I am attempting to debug a memory corruption in an mpi program
using valgrind.  However, when I run with valgrind I get semi-random
segfaults and valgrind messages with the openmpi library.  Here is an
example of such a seg fault:

==6153==
==6153== Invalid read of size 8
==6153==at 0x19102EA0: (within /usr/lib/openmpi/lib/openmpi/
mca_btl_sm.so)


...

==6153==  Address 0x10 is not stack'd, malloc'd or (recently) free'd
^G^G^GThread "main"(pid 6153) caught signal SIGSEGV at address (nil)
(segmentation violation)

Looking at the code for our isend at SFC.h:298 does not seem to have  
any

errors:

=
  MergeInfo myinfo,theirinfo;

  MPI_Request srequest, rrequest;
  MPI_Status status;

  myinfo.n=n;
  if(n!=0)
  {
myinfo.min=sendbuf[0].bits;
myinfo.max=sendbuf[n-1].bits;
  }
  //cout << rank << " n:" << n << " min:" << (int)myinfo.min << "max:"
<< (int)myinfo.max << endl;

  MPI_Isend(&myinfo,sizeof(MergeInfo),MPI_BYTE,to, 
0,Comm,&srequest);

==

myinfo is a struct located on the stack, to is the rank of the  
processor

that the message is being sent to, and srequest is also on the stack.
In addition this message is waited on prior to exiting this block of
code so they still exist on the stack.  When I don't run with valgrind
my program runs past this point just fine.



Strange.  I can't think of an immediate reason as to why this would  
happen -- does it also happen if you use a blocking send (vs. an  
Isend)?  Is myinfo a complex object, or a variable-length object?



I am currently using openmpi 1.3 from the debian unstable branch.  I
also see the same type of segfault in a different portion of the code
involving an MPI_Allgather which can be seen below:

==
==22736== Use of uninitialised value of size 8
==22736==at 0x19104775: mca_btl_sm_component_progress  
(opal_list.h:322)

==22736==by 0x1382CE09: opal_progress (opal_progress.c:207)
==22736==by 0xB404264: ompi_request_default_wait_all  
(condition.h:99)

==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual
(coll_tuned_util.c:55)
==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck
(coll_tuned_util.h:60)
==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121)
==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728)
==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc: 
537)

==22736==by 0x6465457:
Uintah::Grid::problemSetup(Uintah::Handle const&,
Uintah::ProcessorGroup const*, bool) (Grid.cc:866)
==22736==by 0x8345759: Uintah::SimulationController::gridSetup()
(SimulationController.cc:243)
==22736==by 0x834F418: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:117)
==22736==by 0x4089AE: main (sus.cc:629)
==22736==
==22736== Invalid read of size 8
==22736==at 0x19104775: mca_btl_sm_component_progress  
(opal_list.h:322)

==22736==by 0x1382CE09: opal_progress (opal_progress.c:207)
==22736==by 0xB404264: ompi_request_default_wait_all  
(condition.h:99)

==22736==by 0x1A1ADC16: ompi_coll_tuned_sendrecv_actual
(coll_tuned_util.c:55)
==22736==by 0x1A1B61E1: ompi_coll_tuned_allgatherv_intra_bruck
(coll_tuned_util.h:60)
==22736==by 0xB418B2E: PMPI_Allgatherv (pallgatherv.c:121)
==22736==by 0x646CCF7: Uintah::Level::setBCTypes() (Level.cc:728)
==22736==by 0x646D823: Uintah::Level::finalizeLevel() (Level.cc: 
537)

==22736==by 0x6465457:
Uintah::Grid::problemSetup(Uintah::Handle const&,
Uintah::ProcessorGroup const*, bool) (Grid.cc:866)
==22736==by 0x8345759: Uintah::SimulationController::gridSetup()
(SimulationController.cc:243)
==22736==by 0x834F418: Uintah::AMRSimulationController::run()
(AMRSimulationController.cc:117)
==22736==by 0x4089AE: main (sus.cc:629)


Are these problems with openmpi and is there any known work arounds?




These are new to me.  The problem does seem to occur with OMPI's  
shared memory device; you might want to try a different point-to-point  
device (e.g., tcp?) to see if the problem goes away.  But be aware  
that the problem "going away" does not really pinpoint the location of  
the problem -- moving to a slower transport (like tcp) may simply  
change timing such that the problem does not occur.  I.e., the problem  
could still exist in either your code or OMPI -- this would simply be  
a workaround.


--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Configuration problem or network problem?

2009-07-09 Thread Jeff Squyres
Open MPI includes VampirTrace to generate tracing info.  In addition  
to Vampir (a commercial product), there are a few free tools that can  
read VT's Open Tracefile Format (OTF) output -- I'll leave this up to  
the VT guys to describe.


Alternatively, you could also use MPE (http://www.mcs.anl.gov/research/projects/mpi/www/www4/MPE.html 
).  IIRC, MPE is included in MPICH2 distributions.  There's also mpiP (http://mpip.sourceforge.net/ 
).  Other MPI tracing tools also exist -- you should be able to google  
around and find them.


All of these tools basically interpose themselves on the MPI library  
and intercept your function calls.  Stats are generated, and possibly  
even timelines which can be shown graphically.  Such graphical  
timelines can be quite enlightening as to what is really happening in  
your MPI application at run-time.



On Jul 9, 2009, at 5:11 AM, Zou, Lin (GE, Research, Consultant) wrote:


Hi Jeff,
I tried your suggestion, insert MPI_Barrier every few iterations,  
but it doesn't work, in fact it became even slower.
i want to try tracing the communication avtivity, can you give me  
some more details about how to use mpitrace.

Thank you for your attention.
regards
Lin

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]  
On Behalf Of Jeff Squyres

Sent: 2009年7月7日 20:42
To: Open MPI Users
Subject: Re: [OMPI users] Configuration problem or network problem?

You might want to use a tracing library to see where exactly your  
synchronization issues are occurring.  It may depend on the

communication pattern between your nodes and the timing between them.
Additionally, your network switch(es) performance characteristics  
may come into effect here: are there retransmissions, timeouts, etc.?


It can sometimes be helpful to insert an MPI_BARRIER every few  
iterations just to keep all processes well-synchronized.  It seems  
counter-intuitive, but sometimes waiting a short time in a barrier  
can increase overall throughput (rather than waiting progressively  
longer times in poorly-synchronized blocking communications).




On Jul 6, 2009, at 11:33 PM, Zou, Lin (GE, Research, Consultant)  
wrote:


>  Thank you for your suggestion, I tried this solution, but it  
doesn't

> work. In fact, the headnode doesn't participate the computing and
> communication, it only malloc a large a memory, and when the loop in
> every PS3 is over, the headnode gather the data from every PS3.
> The strange thing is that sometimes the program can work well, but
> when reboot the system, without any change to the program, it can't
> work, so I think it should be some mechanism in OpenMPI that can
> configure to let the program work well.
>
> Lin
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
> On Behalf Of Doug Reeder
> Sent: 2009年7月7日 10:49
> To: Open MPI Users
> Subject: Re: [OMPI users] Configuration problem or network problem?
>
> Lin,
>
> Try -np 16 and not running on the head node.
>
> Doug Reeder
> On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant)  
wrote:

>
>> Hi all,
>> The system I use is a PS3 cluster, with 16 PS3s and a PowerPC  
as

>> a headnode, they are connected by a high speed switch.
>> There are point-to-point communication functions( MPI_Send and
>> MPI_Recv ), the data size is about 40KB, and a lot of computings
>> which will consume a long time(about 1 sec)in a loop.The co-
>> processor in PS3 can take care of the computation, the main  
processor

>> take care of point-to-point communication,so the computing and
>> communication can overlap.The communication funtions should return
>> much faster than computing function.
>> My question is that after some circles, the time consumed by
>> communication functions in a PS3 will increase heavily, and the  
whole
>> cluster's sync state will corrupt.When I decrease the computing  
time,

>> this situation just disappeare.I am very confused about this.
>> I think there is a mechanism in OpenMPI that cause this case, does
>> everyone get this situation before?
>> I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there
>> something i should added?
>> Lin
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems








--
Jeff Squyres
Cisco Systems




[OMPI users] MPI IO bug test case for OpenMPI 1.3

2009-07-09 Thread yvan . fournier
Hello,

Some weeks ago, I reported a problem using MPI IO in OpenMPI 1.3,
which did not occur with OpenMPI 1.2 or MPICH2.

The bug was encountered with the Code_Saturne CFD tool 
(http://www.code-saturne.org),
and seemed to be an issue with individual file pointers, as another mode using
explicit offsets worked fine.

I have finally extracted the read pattern from the complete case, so as to
generate the simple test case attached. Further testing showed that the
bug could be reproduced easily using only part of the read pattern,
so I commented most of the patterns from the original case using #if 0 / #endif.

The test should be run with an MPI_COMM_WORLD size of 2. Initially,
rank 0 generates a simple binary file using Posix I/O and
containing the values 0, 1, 2, ... up to about 30.

The file is then opened for reading using MPI IO, and as the values
expected at a given offset are easily determined, read values are compared
to expected values, and MPI_Abort is called in case of an error.

I also added a USE_FILE_TYPE macro definition, which can be undefined
to "turn off" the bug.

Basically, I have:



#ifdef USE_FILE_TYPE
  MPI_Type_hindexed(1, lengths, disps, MPI_BYTE, &file_type);
  MPI_Type_commit(&file_type);
  MPI_File_set_view(fh, offset, MPI_BYTE, file_type, datarep, MPI_INFO_NULL);
#else
  MPI_File_set_view(fh, offset+disps[0], MPI_BYTE, MPI_BYTE, datarep, 
MPI_INFO_NULL);
#endif

retval = MPI_File_read_all(fh, buf, (int)(lengths[0]), MPI_BYTE, &status);

#if USE_FILE_TYPE
  MPI_Type_free(&file_type);
#endif

-

Using the file type indexed datatype, I exhibit the bug with both
versions 1.3.0 and 1.3.2 of OpenMPI.

Best regards,

  Yvan Fournier


#include 
#include 
#include 
#include 

#define USE_FILE_TYPE 1
/* #undef USE_FILE_TYPE */

static void
_create_test_data(void)
{
  int i, j;
  FILE *f;

  int buf[1024];

  f = fopen("test_data", "w");

  for (i = 0; i < 300; i++) {
for (j = 0; j < 1024; j++)
  buf[j] = i*1024 + j;
fwrite(buf, sizeof(int), 1024, f);
  }

  fclose(f);
}

static void
_mpi_io_error_message(int error_code)
{
  char buffer[MPI_MAX_ERROR_STRING];
  int  buffer_len;

  MPI_Error_string(error_code, buffer, &buffer_len);

  fprintf(stderr, "MPI IO error: %s\n", buffer);
}

static void
_test_for_corruption(int  buf[],
 int  base_offset,
 int  rank_offset,
 int  ni)
{
  int i;
  int n_ints = ni / sizeof(int);
  int int_shift = (base_offset + rank_offset) / sizeof(int);

  for (i = 0; i < n_ints; i++) {
if (buf[i] != int_shift + i) {
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  printf("i = %d, buf = %d, ref = %d\n",
 i, buf[i], int_shift + i);
  fprintf(stderr,
  "rank %d, base offset %d, rank offset %d, size %d: corruption\n",
 rank, base_offset, rank_offset, ni);
  MPI_Abort(MPI_COMM_WORLD, 1);
}
  }
}

static void
_read_global_block(MPI_File   fh,
   intoffset,
   intni)
{
  MPI_Datatype file_type;
  MPI_Aint disps[1];
  MPI_Status status;
  int *buf;
  int lengths[1];
  char datarep[] = "native";
  int retval = 0;

  lengths[0] = ni;
  disps[0] = 0;

  buf = malloc(ni);
  assert(buf != NULL);

  MPI_Type_hindexed(1, lengths, disps, MPI_BYTE, &file_type);
  MPI_Type_commit(&file_type);
  MPI_File_set_view(fh, offset, MPI_BYTE, file_type, datarep, MPI_INFO_NULL);

  retval = MPI_File_read_all(fh, buf, ni, MPI_BYTE, &status);

  MPI_Type_free(&file_type);

  if (retval != MPI_SUCCESS)
_mpi_io_error_message(retval);

  _test_for_corruption(buf, offset, 0, ni);

  free(buf);
}

static void
_read_block_ip(MPI_File   fh,
   intoffset,
   intdispl,
   intni)
{
  int errcode;
  int *buf;
  int lengths[1];
  MPI_Aint disps[1];
  MPI_Status status;
  MPI_Datatype file_type;

  char datarep[] = "native";
  int retval = 0;

  buf = malloc(ni);
  assert(buf != NULL);

  lengths[0] = ni;
  disps[0] = displ;

#ifdef USE_FILE_TYPE
  MPI_Type_hindexed(1, lengths, disps, MPI_BYTE, &file_type);
  MPI_Type_commit(&file_type);

  MPI_File_set_view(fh, offset, MPI_BYTE, file_type, datarep, MPI_INFO_NULL);
#else
  MPI_File_set_view(fh, offset+displ, MPI_BYTE, MPI_BYTE, datarep, MPI_INFO_NULL);
#endif

  retval = MPI_File_read_all(fh, buf, (int)(lengths[0]), MPI_BYTE, &status);

  if (retval != MPI_SUCCESS)
_mpi_io_error_message(retval);

#if USE_FILE_TYPE
  MPI_Type_free(&file_type);
#endif

  _test_for_corruption(buf, offset, displ, ni);

  free(buf);
}

int main(int argc, char **argv)
{
  int rank;
  int retval;
  MPI_File fh;

  MPI_Init(&argc, &argv);

  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  if (rank == 0) {
_create_test_data();
  }

  retval = MPI_File_open(MPI_COMM_WORLD,
 "test_data",
 MPI_MODE_RDONLY,
 MPI_INFO_NULL,
   

Re: [OMPI users] Configuration problem or network problem?

2009-07-09 Thread Zou, Lin (GE, Research, Consultant)
Hi Jeff,
I tried your suggestion, insert MPI_Barrier every few iterations, but it 
doesn't work, in fact it became even slower.
i want to try tracing the communication avtivity, can you give me some more 
details about how to use mpitrace.
Thank you for your attention.
regards
Lin 

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: 2009年7月7日 20:42
To: Open MPI Users
Subject: Re: [OMPI users] Configuration problem or network problem?

You might want to use a tracing library to see where exactly your 
synchronization issues are occurring.  It may depend on the  
communication pattern between your nodes and the timing between them.   
Additionally, your network switch(es) performance characteristics may come into 
effect here: are there retransmissions, timeouts, etc.?

It can sometimes be helpful to insert an MPI_BARRIER every few iterations just 
to keep all processes well-synchronized.  It seems counter-intuitive, but 
sometimes waiting a short time in a barrier can increase overall throughput 
(rather than waiting progressively longer times in poorly-synchronized blocking 
communications).



On Jul 6, 2009, at 11:33 PM, Zou, Lin (GE, Research, Consultant) wrote:

>  Thank you for your suggestion, I tried this solution, but it doesn't 
> work. In fact, the headnode doesn't participate the computing and 
> communication, it only malloc a large a memory, and when the loop in 
> every PS3 is over, the headnode gather the data from every PS3.
> The strange thing is that sometimes the program can work well, but 
> when reboot the system, without any change to the program, it can't 
> work, so I think it should be some mechanism in OpenMPI that can 
> configure to let the program work well.
>
> Lin
>
> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
> On Behalf Of Doug Reeder
> Sent: 2009年7月7日 10:49
> To: Open MPI Users
> Subject: Re: [OMPI users] Configuration problem or network problem?
>
> Lin,
>
> Try -np 16 and not running on the head node.
>
> Doug Reeder
> On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote:
>
>> Hi all,
>> The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as 
>> a headnode, they are connected by a high speed switch.
>> There are point-to-point communication functions( MPI_Send and 
>> MPI_Recv ), the data size is about 40KB, and a lot of computings 
>> which will consume a long time(about 1 sec)in a loop.The co- 
>> processor in PS3 can take care of the computation, the main processor 
>> take care of point-to-point communication,so the computing and 
>> communication can overlap.The communication funtions should return 
>> much faster than computing function.
>> My question is that after some circles, the time consumed by 
>> communication functions in a PS3 will increase heavily, and the whole 
>> cluster's sync state will corrupt.When I decrease the computing time, 
>> this situation just disappeare.I am very confused about this.
>> I think there is a mechanism in OpenMPI that cause this case, does 
>> everyone get this situation before?
>> I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there 
>> something i should added?
>> Lin
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems





[OMPI users] fault tolerance in open mpi

2009-07-09 Thread vipin kumar
Hi all,

I want to know whether open mpi supports Network and process fault tolerance
or not? If there is any example demonstrating these features that will be
best.

Regards,
-- 
Vipin K.
Research Engineer,
C-DOTB, India


Re: [OMPI users] enable-mpi-threads

2009-07-09 Thread Lenny Verkhovsky
I guess this question was already before
https://svn.open-mpi.org/trac/ompi/ticket/1367
On Thu, Jul 9, 2009 at 10:35 AM, Lenny Verkhovsky <
lenny.verkhov...@gmail.com> wrote:

> BTW, What kind of threads Open MPI supports ?
> I found in the https://svn.open-mpi.org/trac/ompi/browser/trunk/READMEthat we 
> support  MPI_THREAD_MULTIPLE,
> and found few unclear mails about  MPI_THREAD_FUNNELED and
>  MPI_THREAD_SERIALIZED.
> Also found nothing in FAQ :(.
> Thanks,Lenny.
>
> On Thu, Jul 2, 2009 at 6:37 AM, rahmani  wrote:
>
>> Hi,
>> Very thanks for your discussion
>>
>> - Original Message -
>> From: "Jeff Squyres" 
>> To: "Open MPI Users" 
>> Sent: Tuesday, June 30, 2009 7:23:13 AM (GMT-0500) America/New_York
>> Subject: Re: [OMPI users] enable-mpi-threads
>>
>> On Jun 30, 2009, at 1:29 AM, rahmani wrote:
>>
>> > I want install openmpi in a cluster with multicore processor.
>> > Is it necessary to configure with --enable-mpi-threads option?
>> > when this option should be used?
>> >
>>
>>
>> Open MPI's threading support is functional but not optimized.
>>
>> It depends on the problem you're trying to solve.  There's many ways
>> to write software, but two not-uncommon models for MPI applications are:
>>
>> 1. Write the software such that MPI will launch one process for each
>> core.  You communicate between these processes via MPI communication
>> calls such as MPI_SEND, MPI_RECV, etc.
>>
>> 2. Write the software that that MPI will launch one process per host,
>> and then spawn threads for all the cores on that host.  The threads
>> communicate with each other via typical threaded IPC mechanisms
>> (usually not MPI); MPI processes communicate across hosts via MPI
>> communication calls.  Sometimes MPI function calls are restricted to
>> one thread; sometimes they're invoked by any thread.
>>
>> So it really depends on how you want to write your software.  Make
>> sense?
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>


Re: [OMPI users] enable-mpi-threads

2009-07-09 Thread Lenny Verkhovsky
BTW, What kind of threads Open MPI supports ?
I found in the https://svn.open-mpi.org/trac/ompi/browser/trunk/README that
we support  MPI_THREAD_MULTIPLE,
and found few unclear mails about  MPI_THREAD_FUNNELED and
 MPI_THREAD_SERIALIZED.
Also found nothing in FAQ :(.
Thanks,Lenny.

On Thu, Jul 2, 2009 at 6:37 AM, rahmani  wrote:

> Hi,
> Very thanks for your discussion
>
> - Original Message -
> From: "Jeff Squyres" 
> To: "Open MPI Users" 
> Sent: Tuesday, June 30, 2009 7:23:13 AM (GMT-0500) America/New_York
> Subject: Re: [OMPI users] enable-mpi-threads
>
> On Jun 30, 2009, at 1:29 AM, rahmani wrote:
>
> > I want install openmpi in a cluster with multicore processor.
> > Is it necessary to configure with --enable-mpi-threads option?
> > when this option should be used?
> >
>
>
> Open MPI's threading support is functional but not optimized.
>
> It depends on the problem you're trying to solve.  There's many ways
> to write software, but two not-uncommon models for MPI applications are:
>
> 1. Write the software such that MPI will launch one process for each
> core.  You communicate between these processes via MPI communication
> calls such as MPI_SEND, MPI_RECV, etc.
>
> 2. Write the software that that MPI will launch one process per host,
> and then spawn threads for all the cores on that host.  The threads
> communicate with each other via typical threaded IPC mechanisms
> (usually not MPI); MPI processes communicate across hosts via MPI
> communication calls.  Sometimes MPI function calls are restricted to
> one thread; sometimes they're invoked by any thread.
>
> So it really depends on how you want to write your software.  Make
> sense?
>
> --
> Jeff Squyres
> Cisco Systems
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>