Hi Galen,

Yes, that worked! Thanks so much!

-sophia 

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of Galen Shipman
Sent: Monday, June 11, 2007 2:15 PM
To: Open MPI Users
Subject: Re: [OMPI users] Open MPI issue with Iprobe

I think the problem is that we use MPI_STATUS_IGNORE in the C++ bindings
but don't check for it properly in mtl_mx_iprobe,

can you try applying this diff to ompi and having the user try again, we
will also push this into the 1.2 branch.

- Galen


Index: ompi/mca/mtl/mx/mtl_mx_probe.c
===================================================================
--- ompi/mca/mtl/mx/mtl_mx_probe.c      (revision 14997)
+++ ompi/mca/mtl/mx/mtl_mx_probe.c      (working copy)
@@ -58,11 +58,12 @@
      }
      if (result) {
-        status->MPI_ERROR = OMPI_SUCCESS;
-        MX_GET_SRC(mx_status.match_info, status->MPI_SOURCE);
-        MX_GET_TAG(mx_status.match_info, status->MPI_TAG);
-        status->_count = mx_status.msg_length;
-
+        if(MPI_STATUS_IGNORE != status) {
+            status->MPI_ERROR = OMPI_SUCCESS;
+            MX_GET_SRC(mx_status.match_info, status->MPI_SOURCE);
+            MX_GET_TAG(mx_status.match_info, status->MPI_TAG);
+            status->_count = mx_status.msg_length;
+        }
          *flag = 1;
      } else {
          *flag = 0;




On Jun 11, 2007, at 12:55 PM, Corwell, Sophia wrote:

> Hi,
>
> We are seeing the following issue with Iprobe on our clusters running 
> openmpi-1.2.2. Here is the code and related information:
>
> =======
> Modules currently loaded:
>
> (sn31)/projects>module list
>>> Currently Loaded Modulefiles:
>>>   1) /opt/modules/oscar-modulefiles/default-manpath/1.0.1
>>>   2) compilers/intel-9.1-f040-c045
>>>   3) misc/env-openmpi-1.2
>>>   4) mpi/openmpi-1.2.2_mx_intel-9.1-f040-c045
>>>   5) libraries/intel-mkl
> =======
>
> Source code:
>
>>>
>>> (sn31)/projects/>more probeTest.cc
>>>
>>> #include <mpi.h>
>>> #include <cassert>
>>>
>>> int main(int argc, char* argv[])
>>> {
>>>     MPI::Init(argc, argv);
>>>
>>>     const int rank = MPI::COMM_WORLD.Get_rank();
>>>     const int size = MPI::COMM_WORLD.Get_size();
>>>     const int sendProc = (rank + size - 1) % size;
>>>     const int recvProc = (rank + 1) % size;
>>>     const int tag = 1;
>>>
>>>     // send an asynchronous message
>>>     const int sendVal = 1;
>>>     MPI::Request sendRequest =
>>>         MPI::COMM_WORLD.Isend(&sendVal, 1, MPI_INT, recvProc, tag);
>>>
>>>     // wait for message to arrive
>>>     while (!MPI::COMM_WORLD.Iprobe(sendProc, tag)) {}  // This line 
>>> causes problems
>>>
>>>     // Receive asynchronous message
>>>     int recvVal;
>>>     MPI::Request recvRequest =
>>>         MPI::COMM_WORLD.Irecv(&recvVal, 1, MPI_INT, sendProc, tag);
>>>     recvRequest.Wait();
>>>
>>>     MPI::Finalize();
>>> }
> =======
>
> Compiled with:
>
>>> (sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi
>>> -1.2.2_mx/bin/mpicxx
>>> -I/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_m
>>> x/include -g -c -o probeTest.o probeTest.cc
>>>
>>> (sn31)/projects>/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi
>>> -1.2.2_mx/bin/mpicxx -g -o probeTest 
>>> -L/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/lib
>>> probeTest.o -lmpi
>>>
> /projects/global/x86_64/compilers/intel/intel-9.1-cce-045/lib/
> ibimf.so:
>>> warning: warning: feupdateenv is not implemented and will always 
>>> fail
>>>
>
> =======
>
> Error at runtime:
>
>>>
>>> (sn31)/projects>mpiexec -n 1 ./probeTest [sn31:17616] *** Process 
>>> received signal *** [sn31:17616] Signal:
>>> Segmentation fault (11) [sn31:17616] Signal code: Address not mapped
>>> (1) [sn31:17616] Failing at address: 0x8 [sn31:17616] [ 0] 
>>> /lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17616] [ 1] 
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
>>> [0x2a9980b305]
>>> [sn31:17616] [ 2]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
>>> [0x2a995eb817]
>>> [sn31:17616] [ 3]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/libmpi.so.0(MPI_Iprobe+0xef)
>>> [0x2a956d363f]
>>> [sn31:17616] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
>>> [0x4046aa][sn31:17616] [ 5] ./probeTest(main+0x147) [0x40480b] 
>>> [sn31:17616] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
>>> [0x2a967803fb]
>>> [sn31:17616] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a)
>>> [0x4038ca][sn31:17616] *** End of error message *** mpiexec noticed 
>>> that job rank 0 with PID 17616 on node sn31 exited
> on
>>> signal 11 (Segmentation fault).
>>>
>>> (sn31)/projects/ceptre/sdpautz/NWCC/temp>mpiexec -n 2 ./probeTest 
>>> [sn31:17621] *** Process received signal *** [sn31:17620] ***
> Process
>>> received signal *** [sn31:17620] Signal: Segmentation fault (11) 
>>> [sn31:17620] Signal code: Address not mapped (1) [sn31:17620] 
>>> Failing at address: 0x8 [sn31:17620] [ 0] /lib64/tls/libpthread.so.0
>
>>> [0x2a9665a4f0] [sn31:17620] [ 1]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
>>> [0x2a9980b305]
>>> [sn31:17620] [ 2]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
>>> [0x2a995eb817]
>>> [sn31:17620] [ 3]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/libmpi.so.0(MPI_Iprobe+0xef)
>>> [0x2a956d363f]
>>> [sn31:17620] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
>>> [0x4046aa][sn31:17620] [ 5] ./probeTest(main+0x147) [0x40480b] 
>>> [sn31:17620] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
>>> [0x2a967803fb]
>>> [sn31:17620] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a)
>>> [0x4038ca][sn31:17620] *** End of error message *** [sn31:17621]
>>> Signal: Segmentation fault (11) [sn31:17621] Signal code: Address 
>>> not mapped (1) [sn31:17621] Failing at address: 0x8 [sn31:17621] [ 
>>> 0] /lib64/tls/libpthread.so.0 [0x2a9665a4f0] [sn31:17621] [ 1] 
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/openmpi/mca_mtl_mx.so(ompi_mtl_mx_iprobe+0x81)
>>> [0x2a9980b305]
>>> [sn31:17621] [ 2]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/openmpi/mca_pml_cm.so(mca_pml_cm_iprobe+0x1f)
>>> [0x2a995eb817]
>>> [sn31:17621] [ 3]
>>> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
>>> lib/libmpi.so.0(MPI_Iprobe+0xef)
>>> [0x2a956d363f]
>>> [sn31:17621] [ 4] ./probeTest(_ZNK3MPI4Comm6IprobeEii+0x3a)
>>> [0x4046aa][sn31:17621] [ 5] ./probeTest(main+0x1ad) [0x404871] 
>>> [sn31:17621] [ 6] /lib64/tls/libc.so.6(__libc_start_main+0xdb)
>>> [0x2a967803fb]
>>> [sn31:17621] [ 7] ./probeTest(_ZNSt8ios_base4InitD1Ev+0x3a)
>>> [0x4038ca][sn31:17621] *** End of error message *** mpiexec noticed 
>>> that job rank 0 with PID 17620 on node sn31 exited
> on
>>> signal 11 (Segmentation fault).
>>> 1 additional process aborted (not shown)
>>>
>
> =======
>
> Additional Information:
>
>>> It appears that the call of Iprobe causes problems; if that line is 
>>> taken out, the code completes normally.  Failures also occur with
> the gcc compilers.
>
>>> Mpich appears to work, at least for the Intel compiler.
>
> =======
>
> Hardware information:
>
> [root@spirit1 ~]# mx_info -q
> MX Version: 1.2.1-rc20
> MX Build:
> r...@tocc1.sandia.gov:/projects/global/src/myricom/mx-1.2.1-rc20
> Thu Jun
> 7 17:08:02 MDT 2007
> 1 Myrinet board installed.
> The MX driver is configured to support a maximum of:
>         8 endpoints per NIC, 1024 NICs on the network, 32 NICs per 
> host 
> ===================================================================
> Instance #0:  333.2 MHz LANai, 133.3 MHz PCI bus, 4 MB SRAM
>         Status:         Running, P0: Link up, P1: Link up
>         Network:        Myrinet 2000
>
>         MAC Address:    00:60:dd:48:ba:ae
>         Product code:   M3F2-PCIXE-4
>         Part number:    09-02878
>         Serial number:  219851
>         Mapper (P0):    00:60:dd:48:c0:08, version = 0x01920f75,
> configured
>         Mapped hosts:   506
>         Mapper (P1):    00:60:dd:48:c0:08, version = 0x01920f75,
> configured
>         Mapped hosts:   506
>
>
> cat
> /apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/openmpi-1.2.2_mx/
> BUILD_ENV
> # Build Environment:
> USE="doc icc modules mx torque"
> COMPILER="intel-9.1-f040-c045"
> CC="icc"
> CXX="icpc"
> CLINKER="icc"
> FC="ifort"
> F77="ifort"
> CFLAGS=" -O3 -pipe"
> CXXFLAGS=" -O3 -pipe"
> FFLAGS=" -O3"
> MODULE_DEST="/apps/modules/modulefiles/mpi"
> MODULE_FILE="openmpi-1.2.2_mx_intel-9.1-f040-c045"
> INSTALL_DEST="/apps/x86_64/mpi/openmpi/intel-9.1-f040-c045/
> openmpi-1.2.2
> _mx"
> CONF_FLAGS=" --with-mx=/opt/mx --with-tm=/apps/torque"
> =======
>
> Thanks in advance for any help/advice you can provide.
>
> -Sophia
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Reply via email to