Dear François,

Thanks a lot for your report, it's really a great help for us. :-)

For the issues:

1) When you got "Conditional jump" errors, normally that means some uninitialized(or undefined) values were used. The parameters that passed into PMPI_Init_thread might contain uninitialized values, which could cause errors (even seg-fault) later. I need some time to run your application to check where these values exactly are. I'll post another email when it's done.


2) You're right. We shouldn't check the buffer when the request is completed and released. I'll fix that.

3)  Absolutely correct. I'll fix that.

4) Well, this sounds reasonable, but according to the MPI-1 standard (see page 40 for non-blocking send/recv, a more detailed explanation in page 30):

"A nonblocking send call indicates that the system may start copying data out of the send buffer. The sender should */not access*/ any part of the send buffer after a nonblocking send operation is called, until the send completes."

So before calling MPI_Wait to complete an isend operation, any access to the send buffer is illegal. It might be a little strict, but we have to do what the standard says.

5) Your feedbacks are alway welcome and, most importantly, helpful for us to make improvements. ;-) Thanks again.



Best Regards,
Shiqing



François PELLEGRINI wrote:
Hello all,


I am the main developer of the Scotch parallel graph partitioning
package, which uses both MPI and Posix Pthreads. I have been doing
a great deal of testing of my program on various platforms and
libraries, searching for potential bugs (there may still be some ;-) ).

The new memchecker tool proposed in v1.3 is indeed interesting
to me, so I tried it on my Linux platform. I used the following
configuration options :

% ./configure --enable-debug --enable-mem-debug --enable-memchecker
--with-valgrind=/usr/bin --enable-mpi-threads
--prefix=/home/pastix/pelegrin/openmpi

% ompi_info
                 Package: Open MPI pelegrin@laurel Distribution
                Open MPI: 1.3b2
   Open MPI SVN revision: r19927
   Open MPI release date: Nov 04, 2008
                Open RTE: 1.3b2
   Open RTE SVN revision: r19927
   Open RTE release date: Nov 04, 2008
                    OPAL: 1.3b2
       OPAL SVN revision: r19927
       OPAL release date: Nov 04, 2008
            Ident string: 1.3b2
                  Prefix: /home/pastix/pelegrin/openmpi
 Configured architecture: x86_64-unknown-linux-gnu
          Configure host: laurel
           Configured by: pelegrin
           Configured on: Wed Nov 19 00:50:50 CET 2008
          Configure host: laurel
                Built by: pelegrin
                Built on: mercredi 19 novembre 2008, 00:55:59 (UTC+0100)
              Built host: laurel
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: small
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
      Fortran77 compiler: gfortran
  Fortran77 compiler abs: /usr/bin/gfortran
      Fortran90 compiler: gfortran
  Fortran90 compiler abs: /usr/bin/gfortran
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: no
          Thread support: posix (mpi: yes, progress: no)
           Sparse Groups: no
  Internal debug support: yes
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: yes
         libltdl support: yes
   Heterogeneous support: no
 mpirun default --prefix: no
         MPI I/O support: yes
       MPI_WTIME support: gettimeofday
Symbol visibility support: yes
   FT Checkpoint support: no  (checkpoint thread: no)
           MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3)
          MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.3)
              MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3)
           MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3)
               MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3)
               MCA carto: file (MCA v2.0, API v2.0, Component v1.3)
           MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3)
           MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.3)
               MCA timer: linux (MCA v2.0, API v2.0, Component v1.3)
         MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3)
         MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3)
                 MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3)
              MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3)
           MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3)
           MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3)
                MCA coll: basic (MCA v2.0, API v2.0, Component v1.3)
                MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3)
                MCA coll: inter (MCA v2.0, API v2.0, Component v1.3)
                MCA coll: self (MCA v2.0, API v2.0, Component v1.3)
                MCA coll: sm (MCA v2.0, API v2.0, Component v1.3)
                MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3)
                  MCA io: romio (MCA v2.0, API v2.0, Component v1.3)
               MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3)
               MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3)
                 MCA pml: cm (MCA v2.0, API v2.0, Component v1.3)
                 MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3)
                 MCA pml: v (MCA v2.0, API v2.0, Component v1.3)
                 MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3)
              MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3)
                 MCA btl: self (MCA v2.0, API v2.0, Component v1.3)
                 MCA btl: sm (MCA v2.0, API v2.0, Component v1.3)
                 MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3)
                MCA topo: unity (MCA v2.0, API v2.0, Component v1.3)
                 MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3)
                 MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3)
                 MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3)
                 MCA iof: orted (MCA v2.0, API v2.0, Component v1.3)
                 MCA iof: tool (MCA v2.0, API v2.0, Component v1.3)
                 MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3)
                MCA odls: default (MCA v2.0, API v2.0, Component v1.3)
                 MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3)
               MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3)
               MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3)
               MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3)
                 MCA rml: oob (MCA v2.0, API v2.0, Component v1.3)
              MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3)
              MCA routed: direct (MCA v2.0, API v2.0, Component v1.3)
              MCA routed: linear (MCA v2.0, API v2.0, Component v1.3)
                 MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3)
                 MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3)
               MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3)
              MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3)
                 MCA ess: env (MCA v2.0, API v2.0, Component v1.3)
                 MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3)
                 MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3)
                 MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3)
                 MCA ess: tool (MCA v2.0, API v2.0, Component v1.3)
             MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3)
             MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3)

% gcc --version
gcc (Debian 4.3.2-1) 4.3.2
Copyright (C) 2008 Free Software Foundation, Inc.

I launched my program under valgrind on two procs and got the following report:

% mpirun -np 2 valgrind ../bin/dgord ~/paral/graph/altr4.grf.gz /dev/null -vt
==10978== Memcheck, a memory error detector.
==10978== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==10978== Using LibVEX rev 1854, a library for dynamic binary translation.
==10978== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==10978== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation 
framework.
==10978== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==10978== For more details, rerun with: -v
==10978==
==10979== Memcheck, a memory error detector.
==10979== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==10979== Using LibVEX rev 1854, a library for dynamic binary translation.
==10979== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==10979== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation 
framework.
==10979== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==10979== For more details, rerun with: -v
==10979==
==10979== Syscall param sched_setaffinity(mask) points to unaddressable
byte(s)==10978== Syscall param sched_setaffinity(mask) points to unaddressable
byte(s)
==10978==    at 0x65FB269: syscall (in /lib/libc-2.7.so)

==10978==    by 0x6C8365A: opal_paffinity_linux_plpa_api_probe_init
(plpa_api_probe.c:43)
==10978==    by 0x6C83BB8: opal_paffinity_linux_plpa_init (plpa_runtime.c:36)
==10978==    by 0x6C84984: opal_paffinity_linux_plpa_have_topology_information
(plpa_map.c:501)
==10978==    by 0x6C83129: linux_module_init (paffinity_linux_module.c:119)
==10978==    by 0x5AB35EA: opal_paffinity_base_select 
(paffinity_base_select.c:64)
==10978==    by 0x5A7DE99: opal_init (opal_init.c:292)
==10978==    by 0x580087A: orte_init (orte_init.c:76)
==10978==    by 0x551675F: ompi_mpi_init (ompi_mpi_init.c:343)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10978==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10979==    at 0x65FB269: syscall (in /lib/libc-2.7.so)
==10979==    by 0x6C8365A: opal_paffinity_linux_plpa_api_probe_init
(plpa_api_probe.c:43)
==10979==    by 0x6C83BB8: opal_paffinity_linux_plpa_init (plpa_runtime.c:36)
==10979==    by 0x6C84984: opal_paffinity_linux_plpa_have_topology_information
(plpa_map.c:501)
==10979==    by 0x6C83129: linux_module_init (paffinity_linux_module.c:119)
==10979==    by 0x5AB35EA: opal_paffinity_base_select 
(paffinity_base_select.c:64)
==10979==    by 0x5A7DE99: opal_init (opal_init.c:292)
==10979==    by 0x580087A: orte_init (orte_init.c:76)
==10979==    by 0x551675F: ompi_mpi_init (ompi_mpi_init.c:343)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10979==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10978== Warning: set address range perms: large range 134217728 (defined)
==10979== Warning: set address range perms: large range 134217728 (defined)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978==    by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10978==    by 0x972F6A8: sm_btl_first_time_init (btl_sm.c:314)
==10978==    by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10978==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978==    by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10978==    by 0x972EC85: init_fifos (btl_sm.c:125)
==10978==    by 0x972F6CB: sm_btl_first_time_init (btl_sm.c:317)
==10978==    by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10978==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978==    by 0x54E660E: ompi_free_list_grow (ompi_free_list.c:198)
==10978==    by 0x54E6435: ompi_free_list_init_ex_new (ompi_free_list.c:163)
==10978==    by 0x972F9D3: ompi_free_list_init_new (ompi_free_list.h:169)
==10978==    by 0x972F864: sm_btl_first_time_init (btl_sm.c:343)
==10978==    by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10978==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979==    by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10979==    by 0x972F6A8: sm_btl_first_time_init (btl_sm.c:314)
==10979==    by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10979==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979==    by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10979==    by 0x972EC85: init_fifos (btl_sm.c:125)
==10979==    by 0x972F6CB: sm_btl_first_time_init (btl_sm.c:317)
==10979==    by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10979==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979==    by 0x54E660E: ompi_free_list_grow (ompi_free_list.c:198)
==10979==    by 0x54E6435: ompi_free_list_init_ex_new (ompi_free_list.c:163)
==10979==    by 0x972F9D3: ompi_free_list_init_new (ompi_free_list.h:169)
==10979==    by 0x972F864: sm_btl_first_time_init (btl_sm.c:343)
==10979==    by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10979==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979==    by 0x9730165: ompi_fifo_init (ompi_fifo.h:280)
==10979==    by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10979==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979==    by 0x97302C4: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:158)
==10979==    by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10979==    by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10979==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979==    by 0x97303B3: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:180)
==10979==    by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10979==    by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10979==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979==    by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978==    by 0x9730165: ompi_fifo_init (ompi_fifo.h:280)
==10978==    by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10978==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978==    by 0x97302C4: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:158)
==10978==    by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10978==    by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10978==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978==    at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978==    by 0x97303B3: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:180)
==10978==    by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10978==    by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10978==    by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978==    by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978==    by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978==    by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978==    by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Uninitialised byte(s) found during client check request
==10979==    at 0x5AB2C87: valgrind_module_isdefined
(memchecker_valgrind_module.c:112)
==10979==    by 0x5AB26CB: opal_memchecker_base_isdefined
(memchecker_base_wrappers.c:34)
==10979==    by 0x553F067: memchecker_call (memchecker.h:96)
==10979==    by 0x553ECDF: PMPI_Bcast (pbcast.c:41)
==10979==    by 0x40BB82: _SCOTCHdgraphLoad (dgraph_io_load.c:226)
==10979==    by 0x406B32: main (dgord.c:265)
==10979==  Address 0x7feffff74 is on thread 1's stack
==10978==
==10978== Invalid read of size 8
==10978==    at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10978==    by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10978==    by 0x55154DA: ompi_request_free (request.h:354)
==10978==    by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10978==    by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10978==    by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10978==    by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)==10979==
==10979== Invalid read of size 8
==10979==    at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10979==    by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10979==    by 0x55154DA: ompi_request_free (request.h:354)
==10979==    by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10979==    by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10979==    by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10979==    by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
==10979==    by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10979==    by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10979==    by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10979==    by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10979==    by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10979==  Address 0x28 is not stack'd, malloc'd or (recently) free'd

==10978==    by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10978==    by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10978==    by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10978==    by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10978==    by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10978==  Address 0x28 is not stack'd, malloc'd or (recently) free'd
[laurel:10979] *** Process received signal ***
[laurel:10978] *** Process received signal ***
[laurel:10979] Signal: Segmentation fault (11)
[laurel:10979] Signal code: Address not mapped (1)
[laurel:10979] Failing at address: 0x28
[laurel:10978] Signal: Segmentation fault (11)
[laurel:10978] Signal code: Address not mapped (1)
[laurel:10978] Failing at address: 0x28
[laurel:10979] [ 0] /lib/libpthread.so.0 [0x6321a80]
[laurel:10979] [ 1] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d85a]
[laurel:10979] [ 2] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d813]
[laurel:10979] [ 3] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x55154db]
[laurel:10979] [ 4] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x5515b06]
[laurel:10979] [ 5]
/home/pastix/pelegrin/openmpi/lib/libmpi.so.0(PMPI_Waitall+0x15d) [0x556ea01]
[laurel:10979] [ 6] ../bin/dgord(_SCOTCHdgraphCoarsen+0x13ce) [0x41fc2e]
[laurel:10979] [ 7] ../bin/dgord [0x415b35]
[laurel:10979] [ 8] ../bin/dgord(_SCOTCHvdgraphSeparateMl+0x27) [0x415d97]
[laurel:10979] [ 9] ../bin/dgord(_SCOTCHvdgraphSeparateSt+0x5a) [0x40ec4a]
[laurel:10978] [ 0] /lib/libpthread.so.0 [0x6321a80]
[laurel:10979] [10] ../bin/dgord(_SCOTCHhdgraphOrderNd+0xe3) [0x412f53]
[laurel:10979] [11] ../bin/dgord(_SCOTCHhdgraphOrderSt+0x67) [0x40eb27]
[laurel:10978] [ 1] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d85a]
[laurel:10979] [12] ../bin/dgord(SCOTCH_dgraphOrderComputeList+0xeb) [0x40734b]
[laurel:10979] [13] ../bin/dgord(main+0x3ec) [0x406b7c]
[laurel:10979] [14] /lib/libc.so.6(__libc_start_main+0xe6) [0x654d1a6]
[laurel:10979] [15] ../bin/dgord [0x406669]
[laurel:10978] [ 2] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d813]
[laurel:10978] [ 3] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x55154db]
[laurel:10978] [ 4] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x5515b06]
[laurel:10978] [ 5]
/home/pastix/pelegrin/openmpi/lib/libmpi.so.0(PMPI_Waitall+0x15d) [0x556ea01]
[laurel:10978] [ 6] ../bin/dgord(_SCOTCHdgraphCoarsen+0x13ce) [0x41fc2e]
[laurel:10978] [ 7] ../bin/dgord [0x415b35]
[laurel:10978] [ 8] ../bin/dgord(_SCOTCHvdgraphSeparateMl+0x27) [0x415d97]
[laurel:10978] [ 9] ../bin/dgord(_SCOTCHvdgraphSeparateSt+0x5a) [0x40ec4a]
[laurel:10979] *** End of error message ***
[laurel:10978] [10] ../bin/dgord(_SCOTCHhdgraphOrderNd+0xe3) [0x412f53]
[laurel:10978] [11] ../bin/dgord(_SCOTCHhdgraphOrderSt+0x67) [0x40eb27]
[laurel:10978] [12] ../bin/dgord(SCOTCH_dgraphOrderComputeList+0xeb) [0x40734b]
[laurel:10978] [13] ../bin/dgord(main+0x3ec) [0x406b7c]
[laurel:10978] [14] /lib/libc.so.6(__libc_start_main+0xe6) [0x654d1a6]
[laurel:10978] [15] ../bin/dgord [0x406669]
==10979== [laurel:10978] *** End of error message ***

==10979== Process terminating with default action of signal 11 (SIGSEGV)
==10979==  Access not within mapped region at address 0x29
==10979==    at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10978==
==10978== Process terminating with default action of signal 11 (SIGSEGV)
==10978==  Access not within mapped region at address 0x29
==10978==    at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10978==    by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10978==    by 0x55154DA: ompi_request_free (request.h:354)
==10978==    by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10978==    by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10978==    by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10978==    by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
==10978==    by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10978==    by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10978==    by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10978==    by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10978==    by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10979==    by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10979==    by 0x55154DA: ompi_request_free (request.h:354)
==10979==    by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10979==    by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10979==    by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10979==    by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
==10979==    by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10979==    by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10979==    by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10979==    by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10979==    by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10979==
==10979== ERROR SUMMARY: 14 errors from 9 contexts (suppressed: 264 from 3)
==10979== malloc/free: in use at exit: 4,626,295 bytes in 2,614 blocks.==10978==
==10978== ERROR SUMMARY: 13 errors from 8 contexts (suppressed: 264 from 3)

==10979== malloc/free: 9,296 allocs, 6,682 frees, 11,121,335 bytes allocated.
==10979== For counts of detected errors, rerun with: -v==10978== malloc/free:
in use at exit: 4,671,068 bytes in 2,627 blocks.
==10978== malloc/free: 9,315 allocs, 6,688 frees, 13,108,494 bytes allocated.
==10978== For counts of detected errors, rerun with: -v

====10979== searching for pointers to 2,614 not-freed blocks.
10978== searching for pointers to 2,627 not-freed blocks.
==10978== checked 138,090,848 bytes.
==10978== ==10979== checked 138,047,136 bytes.
==10979==
==10979== LEAK SUMMARY:
==10979==    definitely lost: 2,049 bytes in 25 blocks.
==10979==      possibly lost: 2,405,098 bytes in 60 blocks.
==10979==    still reachable: 2,219,148 bytes in 2,529 blocks.
==10979==         suppressed: 0 bytes in 0 blocks.
==10979== Rerun with --leak-check=full to see details of leaked memory.

==10978== LEAK SUMMARY:
==10978==    definitely lost: 2,125 bytes in 27 blocks.
==10978==      possibly lost: 2,445,353 bytes in 63 blocks.
==10978==    still reachable: 2,223,590 bytes in 2,537 blocks.
==10978==         suppressed: 0 bytes in 0 blocks.
==10978== Rerun with --leak-check=full to see details of leaked memory.
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 10979 on node laurel exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------


I want to report the following issues :

1)- The "Conditional jump or move depends on uninitialised value(s)"
    messages are quite puzzling. Do they correspond to real problems
    in OpenMPI or should they just be ignored ?

2)- The MPI_Waitall call which causes the problem spans across a set
    of former receive requests already set to MPI_REQUEST_NULL, and
    to a set of matching (and hence matched) Isend requests.

3)- Memchecker also complaints (I think wrongfully) in the case of a
    Bcast where the receivers have not pre-set all of their receive array.
    I guess in the memcheck process the sender side and the receiver
    sides should get different treatment, since only one data array is
    passed, which is either to be read or written depending on the root
    process number.

4)- It also complaints when two Isend's correspond to overlapping regions
    of the same memory area. It seems that the first Isend sets flags on
    the region as "non-readable", while it should just be "non-writeable",
    isn't it ?

5)- Keep doing the good job ! Congrats.  ;-)


Sincerely yours,


                                        f.p.

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to