Hello all,
I am the main developer of the Scotch parallel graph partitioning
package, which uses both MPI and Posix Pthreads. I have been doing
a great deal of testing of my program on various platforms and
libraries, searching for potential bugs (there may still be some ;-) ).
The new memchecker tool proposed in v1.3 is indeed interesting
to me, so I tried it on my Linux platform. I used the following
configuration options :
% ./configure --enable-debug --enable-mem-debug --enable-memchecker
--with-valgrind=/usr/bin --enable-mpi-threads
--prefix=/home/pastix/pelegrin/openmpi
% ompi_info
Package: Open MPI pelegrin@laurel Distribution
Open MPI: 1.3b2
Open MPI SVN revision: r19927
Open MPI release date: Nov 04, 2008
Open RTE: 1.3b2
Open RTE SVN revision: r19927
Open RTE release date: Nov 04, 2008
OPAL: 1.3b2
OPAL SVN revision: r19927
OPAL release date: Nov 04, 2008
Ident string: 1.3b2
Prefix: /home/pastix/pelegrin/openmpi
Configured architecture: x86_64-unknown-linux-gnu
Configure host: laurel
Configured by: pelegrin
Configured on: Wed Nov 19 00:50:50 CET 2008
Configure host: laurel
Built by: pelegrin
Built on: mercredi 19 novembre 2008, 00:55:59 (UTC+0100)
Built host: laurel
C bindings: yes
C++ bindings: yes
Fortran77 bindings: yes (all)
Fortran90 bindings: yes
Fortran90 bindings size: small
C compiler: gcc
C compiler absolute: /usr/bin/gcc
C++ compiler: g++
C++ compiler absolute: /usr/bin/g++
Fortran77 compiler: gfortran
Fortran77 compiler abs: /usr/bin/gfortran
Fortran90 compiler: gfortran
Fortran90 compiler abs: /usr/bin/gfortran
C profiling: yes
C++ profiling: yes
Fortran77 profiling: yes
Fortran90 profiling: yes
C++ exceptions: no
Thread support: posix (mpi: yes, progress: no)
Sparse Groups: no
Internal debug support: yes
MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: yes
libltdl support: yes
Heterogeneous support: no
mpirun default --prefix: no
MPI I/O support: yes
MPI_WTIME support: gettimeofday
Symbol visibility support: yes
FT Checkpoint support: no (checkpoint thread: no)
MCA backtrace: execinfo (MCA v2.0, API v2.0, Component v1.3)
MCA memchecker: valgrind (MCA v2.0, API v2.0, Component v1.3)
MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.3)
MCA paffinity: linux (MCA v2.0, API v2.0, Component v1.3)
MCA carto: auto_detect (MCA v2.0, API v2.0, Component v1.3)
MCA carto: file (MCA v2.0, API v2.0, Component v1.3)
MCA maffinity: first_use (MCA v2.0, API v2.0, Component v1.3)
MCA maffinity: libnuma (MCA v2.0, API v2.0, Component v1.3)
MCA timer: linux (MCA v2.0, API v2.0, Component v1.3)
MCA installdirs: env (MCA v2.0, API v2.0, Component v1.3)
MCA installdirs: config (MCA v2.0, API v2.0, Component v1.3)
MCA dpm: orte (MCA v2.0, API v2.0, Component v1.3)
MCA pubsub: orte (MCA v2.0, API v2.0, Component v1.3)
MCA allocator: basic (MCA v2.0, API v2.0, Component v1.3)
MCA allocator: bucket (MCA v2.0, API v2.0, Component v1.3)
MCA coll: basic (MCA v2.0, API v2.0, Component v1.3)
MCA coll: hierarch (MCA v2.0, API v2.0, Component v1.3)
MCA coll: inter (MCA v2.0, API v2.0, Component v1.3)
MCA coll: self (MCA v2.0, API v2.0, Component v1.3)
MCA coll: sm (MCA v2.0, API v2.0, Component v1.3)
MCA coll: tuned (MCA v2.0, API v2.0, Component v1.3)
MCA io: romio (MCA v2.0, API v2.0, Component v1.3)
MCA mpool: rdma (MCA v2.0, API v2.0, Component v1.3)
MCA mpool: sm (MCA v2.0, API v2.0, Component v1.3)
MCA pml: cm (MCA v2.0, API v2.0, Component v1.3)
MCA pml: ob1 (MCA v2.0, API v2.0, Component v1.3)
MCA pml: v (MCA v2.0, API v2.0, Component v1.3)
MCA bml: r2 (MCA v2.0, API v2.0, Component v1.3)
MCA rcache: vma (MCA v2.0, API v2.0, Component v1.3)
MCA btl: self (MCA v2.0, API v2.0, Component v1.3)
MCA btl: sm (MCA v2.0, API v2.0, Component v1.3)
MCA btl: tcp (MCA v2.0, API v2.0, Component v1.3)
MCA topo: unity (MCA v2.0, API v2.0, Component v1.3)
MCA osc: pt2pt (MCA v2.0, API v2.0, Component v1.3)
MCA osc: rdma (MCA v2.0, API v2.0, Component v1.3)
MCA iof: hnp (MCA v2.0, API v2.0, Component v1.3)
MCA iof: orted (MCA v2.0, API v2.0, Component v1.3)
MCA iof: tool (MCA v2.0, API v2.0, Component v1.3)
MCA oob: tcp (MCA v2.0, API v2.0, Component v1.3)
MCA odls: default (MCA v2.0, API v2.0, Component v1.3)
MCA ras: slurm (MCA v2.0, API v2.0, Component v1.3)
MCA rmaps: rank_file (MCA v2.0, API v2.0, Component v1.3)
MCA rmaps: round_robin (MCA v2.0, API v2.0, Component v1.3)
MCA rmaps: seq (MCA v2.0, API v2.0, Component v1.3)
MCA rml: oob (MCA v2.0, API v2.0, Component v1.3)
MCA routed: binomial (MCA v2.0, API v2.0, Component v1.3)
MCA routed: direct (MCA v2.0, API v2.0, Component v1.3)
MCA routed: linear (MCA v2.0, API v2.0, Component v1.3)
MCA plm: rsh (MCA v2.0, API v2.0, Component v1.3)
MCA plm: slurm (MCA v2.0, API v2.0, Component v1.3)
MCA filem: rsh (MCA v2.0, API v2.0, Component v1.3)
MCA errmgr: default (MCA v2.0, API v2.0, Component v1.3)
MCA ess: env (MCA v2.0, API v2.0, Component v1.3)
MCA ess: hnp (MCA v2.0, API v2.0, Component v1.3)
MCA ess: singleton (MCA v2.0, API v2.0, Component v1.3)
MCA ess: slurm (MCA v2.0, API v2.0, Component v1.3)
MCA ess: tool (MCA v2.0, API v2.0, Component v1.3)
MCA grpcomm: bad (MCA v2.0, API v2.0, Component v1.3)
MCA grpcomm: basic (MCA v2.0, API v2.0, Component v1.3)
% gcc --version
gcc (Debian 4.3.2-1) 4.3.2
Copyright (C) 2008 Free Software Foundation, Inc.
I launched my program under valgrind on two procs and got the following report:
% mpirun -np 2 valgrind ../bin/dgord ~/paral/graph/altr4.grf.gz /dev/null -vt
==10978== Memcheck, a memory error detector.
==10978== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==10978== Using LibVEX rev 1854, a library for dynamic binary translation.
==10978== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==10978== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation
framework.
==10978== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==10978== For more details, rerun with: -v
==10978==
==10979== Memcheck, a memory error detector.
==10979== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==10979== Using LibVEX rev 1854, a library for dynamic binary translation.
==10979== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==10979== Using valgrind-3.3.1-Debian, a dynamic binary instrumentation
framework.
==10979== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==10979== For more details, rerun with: -v
==10979==
==10979== Syscall param sched_setaffinity(mask) points to unaddressable
byte(s)==10978== Syscall param sched_setaffinity(mask) points to unaddressable
byte(s)
==10978== at 0x65FB269: syscall (in /lib/libc-2.7.so)
==10978== by 0x6C8365A: opal_paffinity_linux_plpa_api_probe_init
(plpa_api_probe.c:43)
==10978== by 0x6C83BB8: opal_paffinity_linux_plpa_init (plpa_runtime.c:36)
==10978== by 0x6C84984: opal_paffinity_linux_plpa_have_topology_information
(plpa_map.c:501)
==10978== by 0x6C83129: linux_module_init (paffinity_linux_module.c:119)
==10978== by 0x5AB35EA: opal_paffinity_base_select
(paffinity_base_select.c:64)
==10978== by 0x5A7DE99: opal_init (opal_init.c:292)
==10978== by 0x580087A: orte_init (orte_init.c:76)
==10978== by 0x551675F: ompi_mpi_init (ompi_mpi_init.c:343)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10978== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10979== at 0x65FB269: syscall (in /lib/libc-2.7.so)
==10979== by 0x6C8365A: opal_paffinity_linux_plpa_api_probe_init
(plpa_api_probe.c:43)
==10979== by 0x6C83BB8: opal_paffinity_linux_plpa_init (plpa_runtime.c:36)
==10979== by 0x6C84984: opal_paffinity_linux_plpa_have_topology_information
(plpa_map.c:501)
==10979== by 0x6C83129: linux_module_init (paffinity_linux_module.c:119)
==10979== by 0x5AB35EA: opal_paffinity_base_select
(paffinity_base_select.c:64)
==10979== by 0x5A7DE99: opal_init (opal_init.c:292)
==10979== by 0x580087A: orte_init (orte_init.c:76)
==10979== by 0x551675F: ompi_mpi_init (ompi_mpi_init.c:343)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10979== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==10978== Warning: set address range perms: large range 134217728 (defined)
==10979== Warning: set address range perms: large range 134217728 (defined)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10978== by 0x972F6A8: sm_btl_first_time_init (btl_sm.c:314)
==10978== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10978== by 0x972EC85: init_fifos (btl_sm.c:125)
==10978== by 0x972F6CB: sm_btl_first_time_init (btl_sm.c:317)
==10978== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978== by 0x54E660E: ompi_free_list_grow (ompi_free_list.c:198)
==10978== by 0x54E6435: ompi_free_list_init_ex_new (ompi_free_list.c:163)
==10978== by 0x972F9D3: ompi_free_list_init_new (ompi_free_list.h:169)
==10978== by 0x972F864: sm_btl_first_time_init (btl_sm.c:343)
==10978== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10979== by 0x972F6A8: sm_btl_first_time_init (btl_sm.c:314)
==10979== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979== by 0x972EBD4: mpool_calloc (btl_sm.c:109)
==10979== by 0x972EC85: init_fifos (btl_sm.c:125)
==10979== by 0x972F6CB: sm_btl_first_time_init (btl_sm.c:317)
==10979== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979== by 0x54E660E: ompi_free_list_grow (ompi_free_list.c:198)
==10979== by 0x54E6435: ompi_free_list_init_ex_new (ompi_free_list.c:163)
==10979== by 0x972F9D3: ompi_free_list_init_new (ompi_free_list.h:169)
==10979== by 0x972F864: sm_btl_first_time_init (btl_sm.c:343)
==10979== by 0x972FCE3: mca_btl_sm_add_procs (btl_sm.c:488)
==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979== by 0x9730165: ompi_fifo_init (ompi_fifo.h:280)
==10979== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979== by 0x97302C4: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:158)
==10979== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10979== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Conditional jump or move depends on uninitialised value(s)
==10979== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10979== by 0x97303B3: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:180)
==10979== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10979== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10979== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10979== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10979== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10979== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10979== by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978== by 0x9730165: ompi_fifo_init (ompi_fifo.h:280)
==10978== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978== by 0x97302C4: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:158)
==10978== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10978== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10978==
==10978== Conditional jump or move depends on uninitialised value(s)
==10978== at 0x88E8F1A: mca_mpool_sm_alloc (mpool_sm_module.c:80)
==10978== by 0x97303B3: ompi_cb_fifo_init (ompi_circular_buffer_fifo.h:180)
==10978== by 0x97301BA: ompi_fifo_init (ompi_fifo.h:288)
==10978== by 0x9730044: mca_btl_sm_add_procs (btl_sm.c:538)
==10978== by 0x9322C33: mca_bml_r2_add_procs (bml_r2.c:206)
==10978== by 0x8EFD000: mca_pml_ob1_add_procs (pml_ob1.c:237)
==10978== by 0x5516F4C: ompi_mpi_init (ompi_mpi_init.c:678)
==10978== by 0x55546B9: PMPI_Init_thread (pinit_thread.c:82)
==10978== by 0x4067CF: main (dgord.c:123)
==10979==
==10979== Uninitialised byte(s) found during client check request
==10979== at 0x5AB2C87: valgrind_module_isdefined
(memchecker_valgrind_module.c:112)
==10979== by 0x5AB26CB: opal_memchecker_base_isdefined
(memchecker_base_wrappers.c:34)
==10979== by 0x553F067: memchecker_call (memchecker.h:96)
==10979== by 0x553ECDF: PMPI_Bcast (pbcast.c:41)
==10979== by 0x40BB82: _SCOTCHdgraphLoad (dgraph_io_load.c:226)
==10979== by 0x406B32: main (dgord.c:265)
==10979== Address 0x7feffff74 is on thread 1's stack
==10978==
==10978== Invalid read of size 8
==10978== at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10978== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10978== by 0x55154DA: ompi_request_free (request.h:354)
==10978== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10978== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10978== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10978== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)==10979==
==10979== Invalid read of size 8
==10979== at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10979== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10979== by 0x55154DA: ompi_request_free (request.h:354)
==10979== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10979== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10979== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10979== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
==10979== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10979== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10979== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10979== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10979== by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10979== Address 0x28 is not stack'd, malloc'd or (recently) free'd
==10978== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10978== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10978== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10978== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10978== by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10978== Address 0x28 is not stack'd, malloc'd or (recently) free'd
[laurel:10979] *** Process received signal ***
[laurel:10978] *** Process received signal ***
[laurel:10979] Signal: Segmentation fault (11)
[laurel:10979] Signal code: Address not mapped (1)
[laurel:10979] Failing at address: 0x28
[laurel:10978] Signal: Segmentation fault (11)
[laurel:10978] Signal code: Address not mapped (1)
[laurel:10978] Failing at address: 0x28
[laurel:10979] [ 0] /lib/libpthread.so.0 [0x6321a80]
[laurel:10979] [ 1] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d85a]
[laurel:10979] [ 2] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d813]
[laurel:10979] [ 3] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x55154db]
[laurel:10979] [ 4] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x5515b06]
[laurel:10979] [ 5]
/home/pastix/pelegrin/openmpi/lib/libmpi.so.0(PMPI_Waitall+0x15d) [0x556ea01]
[laurel:10979] [ 6] ../bin/dgord(_SCOTCHdgraphCoarsen+0x13ce) [0x41fc2e]
[laurel:10979] [ 7] ../bin/dgord [0x415b35]
[laurel:10979] [ 8] ../bin/dgord(_SCOTCHvdgraphSeparateMl+0x27) [0x415d97]
[laurel:10979] [ 9] ../bin/dgord(_SCOTCHvdgraphSeparateSt+0x5a) [0x40ec4a]
[laurel:10978] [ 0] /lib/libpthread.so.0 [0x6321a80]
[laurel:10979] [10] ../bin/dgord(_SCOTCHhdgraphOrderNd+0xe3) [0x412f53]
[laurel:10979] [11] ../bin/dgord(_SCOTCHhdgraphOrderSt+0x67) [0x40eb27]
[laurel:10978] [ 1] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d85a]
[laurel:10979] [12] ../bin/dgord(SCOTCH_dgraphOrderComputeList+0xeb) [0x40734b]
[laurel:10979] [13] ../bin/dgord(main+0x3ec) [0x406b7c]
[laurel:10979] [14] /lib/libc.so.6(__libc_start_main+0xe6) [0x654d1a6]
[laurel:10979] [15] ../bin/dgord [0x406669]
[laurel:10978] [ 2] /home/pastix/pelegrin/openmpi/lib/openmpi/mca_pml_ob1.so
[0x8f0d813]
[laurel:10978] [ 3] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x55154db]
[laurel:10978] [ 4] /home/pastix/pelegrin/openmpi/lib/libmpi.so.0 [0x5515b06]
[laurel:10978] [ 5]
/home/pastix/pelegrin/openmpi/lib/libmpi.so.0(PMPI_Waitall+0x15d) [0x556ea01]
[laurel:10978] [ 6] ../bin/dgord(_SCOTCHdgraphCoarsen+0x13ce) [0x41fc2e]
[laurel:10978] [ 7] ../bin/dgord [0x415b35]
[laurel:10978] [ 8] ../bin/dgord(_SCOTCHvdgraphSeparateMl+0x27) [0x415d97]
[laurel:10978] [ 9] ../bin/dgord(_SCOTCHvdgraphSeparateSt+0x5a) [0x40ec4a]
[laurel:10979] *** End of error message ***
[laurel:10978] [10] ../bin/dgord(_SCOTCHhdgraphOrderNd+0xe3) [0x412f53]
[laurel:10978] [11] ../bin/dgord(_SCOTCHhdgraphOrderSt+0x67) [0x40eb27]
[laurel:10978] [12] ../bin/dgord(SCOTCH_dgraphOrderComputeList+0xeb) [0x40734b]
[laurel:10978] [13] ../bin/dgord(main+0x3ec) [0x406b7c]
[laurel:10978] [14] /lib/libc.so.6(__libc_start_main+0xe6) [0x654d1a6]
[laurel:10978] [15] ../bin/dgord [0x406669]
==10979== [laurel:10978] *** End of error message ***
==10979== Process terminating with default action of signal 11 (SIGSEGV)
==10979== Access not within mapped region at address 0x29
==10979== at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10978==
==10978== Process terminating with default action of signal 11 (SIGSEGV)
==10978== Access not within mapped region at address 0x29
==10978== at 0x8F0D85A: memchecker_call (memchecker.h:94)
==10978== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10978== by 0x55154DA: ompi_request_free (request.h:354)
==10978== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10978== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10978== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10978== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
==10978== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10978== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10978== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10978== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10978== by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10979== by 0x8F0D812: mca_pml_ob1_send_request_free (pml_ob1_sendreq.c:107)
==10979== by 0x55154DA: ompi_request_free (request.h:354)
==10979== by 0x5515B05: ompi_request_default_wait_all (req_wait.c:344)
==10979== by 0x556EA00: PMPI_Waitall (pwaitall.c:68)
==10979== by 0x41FC2D: _SCOTCHdgraphCoarsen (dgraph_coarsen.c:711)
==10979== by 0x415B34: vdgraphSeparateMl2 (vdgraph_separate_ml.c:99)
==10979== by 0x415D96: _SCOTCHvdgraphSeparateMl (vdgraph_separate_ml.c:660)
==10979== by 0x40EC49: _SCOTCHvdgraphSeparateSt (vdgraph_separate_st.c:327)
==10979== by 0x412F52: _SCOTCHhdgraphOrderNd (hdgraph_order_nd.c:294)
==10979== by 0x40EB26: _SCOTCHhdgraphOrderSt (hdgraph_order_st.c:216)
==10979== by 0x40734A: SCOTCH_dgraphOrderComputeList
(library_dgraph_order.c:220)
==10979==
==10979== ERROR SUMMARY: 14 errors from 9 contexts (suppressed: 264 from 3)
==10979== malloc/free: in use at exit: 4,626,295 bytes in 2,614 blocks.==10978==
==10978== ERROR SUMMARY: 13 errors from 8 contexts (suppressed: 264 from 3)
==10979== malloc/free: 9,296 allocs, 6,682 frees, 11,121,335 bytes allocated.
==10979== For counts of detected errors, rerun with: -v==10978== malloc/free:
in use at exit: 4,671,068 bytes in 2,627 blocks.
==10978== malloc/free: 9,315 allocs, 6,688 frees, 13,108,494 bytes allocated.
==10978== For counts of detected errors, rerun with: -v
====10979== searching for pointers to 2,614 not-freed blocks.
10978== searching for pointers to 2,627 not-freed blocks.
==10978== checked 138,090,848 bytes.
==10978== ==10979== checked 138,047,136 bytes.
==10979==
==10979== LEAK SUMMARY:
==10979== definitely lost: 2,049 bytes in 25 blocks.
==10979== possibly lost: 2,405,098 bytes in 60 blocks.
==10979== still reachable: 2,219,148 bytes in 2,529 blocks.
==10979== suppressed: 0 bytes in 0 blocks.
==10979== Rerun with --leak-check=full to see details of leaked memory.
==10978== LEAK SUMMARY:
==10978== definitely lost: 2,125 bytes in 27 blocks.
==10978== possibly lost: 2,445,353 bytes in 63 blocks.
==10978== still reachable: 2,223,590 bytes in 2,537 blocks.
==10978== suppressed: 0 bytes in 0 blocks.
==10978== Rerun with --leak-check=full to see details of leaked memory.
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 10979 on node laurel exited on
signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I want to report the following issues :
1)- The "Conditional jump or move depends on uninitialised value(s)"
messages are quite puzzling. Do they correspond to real problems
in OpenMPI or should they just be ignored ?
2)- The MPI_Waitall call which causes the problem spans across a set
of former receive requests already set to MPI_REQUEST_NULL, and
to a set of matching (and hence matched) Isend requests.
3)- Memchecker also complaints (I think wrongfully) in the case of a
Bcast where the receivers have not pre-set all of their receive array.
I guess in the memcheck process the sender side and the receiver
sides should get different treatment, since only one data array is
passed, which is either to be read or written depending on the root
process number.
4)- It also complaints when two Isend's correspond to overlapping regions
of the same memory area. It seems that the first Isend sets flags on
the region as "non-readable", while it should just be "non-writeable",
isn't it ?
5)- Keep doing the good job ! Congrats. ;-)
Sincerely yours,
f.p.
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users