Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ?
Based on the stack trace, i assume this is a complete end user application. have you tried/been able to reproduce the same kind of crash with a trimmed test program ? BTW, what kind of filesystem is hosting Resultats.Eta1 ? (e.g. ext4 / nfs / lustre / other) Cheers, Gilles On 2014/12/15 4:06, Eric Chamberland wrote: > Hi, > > I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for > my problem with collective MPI I/O. > > A problem still there. In this 2 processes example, process rank 1 > dies with segfault while process rank 0 wait indefinitely... > > Running with valgrind, I found these errors which may gives hints: > > ************************************************* > Rank 1: > ************************************************* > On process rank 1, without valgrind it ends with either a segmentation > violation or memory corruption or invalide free without valgrind). > > But running with valgrind, it tells: > > ==16715== Invalid write of size 2 > ==16715== at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16715== by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) > ==16715== by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match > (pml_ob1_recvfrag.c:225) > ==16715== by 0x2544110C: mca_btl_vader_check_fboxes > (btl_vader_fbox.h:220) > ==16715== by 0x25443577: mca_btl_vader_component_progress > (btl_vader_component.c:695) > ==16715== by 0x1F5F0F27: opal_progress (opal_progress.c:207) > ==16715== by 0x1ACB40B3: opal_condition_wait (condition.h:93) > ==16715== by 0x1ACB4201: ompi_request_wait_completion (request.h:381) > ==16715== by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) > ==16715== by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic > (coll_tuned_bcast.c:254) > ==16715== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial > (coll_tuned_bcast.c:385) > ==16715== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed > (coll_tuned_decision_fixed.c:258) > ==16715== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) > ==16715== by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) > ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16715== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16715== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16715== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16715== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 > alloc'd > ==16715== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16715== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16715== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16715== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16715== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16715== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16715== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ... > ... > ==16715== Invalid write of size 1 > ==16715== at 0x4C2E7BB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16715== by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) > ==16715== by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match > (pml_ob1_recvfrag.c:225) > ==16715== by 0x2544110C: mca_btl_vader_check_fboxes > (btl_vader_fbox.h:220) > ==16715== by 0x25443577: mca_btl_vader_component_progress > (btl_vader_component.c:695) > ==16715== by 0x1F5F0F27: opal_progress (opal_progress.c:207) > ==16715== by 0x1ACB40B3: opal_condition_wait (condition.h:93) > ==16715== by 0x1ACB4201: ompi_request_wait_completion (request.h:381) > ==16715== by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) > ==16715== by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic > (coll_tuned_bcast.c:254) > ==16715== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial > (coll_tuned_bcast.c:385) > ==16715== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed > (coll_tuned_decision_fixed.c:258) > ==16715== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) > ==16715== by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) > ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16715== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16715== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16715== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16715== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16715== Address 0x32ef3e60 is 16 bytes after a block of size 256 > alloc'd > ==16715== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16715== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16715== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16715== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16715== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16715== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16715== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16715== > > ************************************************* > Rank 0: > ************************************************* > > ==16714== Invalid read of size 1 > ==16714== at 0x4C2CA74: __strrchr_sse42 (vg_replace_strmem.c:194) > ==16714== by 0x2FE1CAB7: ADIOI_Shfp_fname (shfp_fname.c:51) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16714== Address 0x219377d0 is 0 bytes after a block of size 256 > alloc'd > ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16714== > ... > ==16714== Invalid read of size 1 > ==16714== at 0x4C2D034: strlen (vg_replace_strmem.c:412) > ==16714== by 0x2FE1CB81: ADIOI_Shfp_fname (shfp_fname.c:61) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16714== Address 0x219377d0 is 0 bytes after a block of size 256 > alloc'd > ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ... > ==16714== Invalid read of size 2 > ==16714== at 0x4C2E79E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16714== by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590) > ==16714== by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341) > ==16714== by 0x25AB4207: mca_pml_ob1_send_request_start_prepare > (pml_ob1_sendreq.c:620) > ==16714== by 0x25AA3519: mca_pml_ob1_send_request_start_btl > (pml_ob1_sendreq.h:397) > ==16714== by 0x25AA3766: mca_pml_ob1_send_request_start_seq > (pml_ob1_sendreq.h:460) > ==16714== by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171) > ==16714== by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic > (coll_tuned_bcast.c:112) > ==16714== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial > (coll_tuned_bcast.c:385) > ==16714== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed > (coll_tuned_decision_fixed.c:258) > ==16714== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) > ==16714== by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16714== Address 0x219377d0 is 0 bytes after a block of size 256 > alloc'd > ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ... > ==16714== Invalid read of size 2 > ==16714== at 0x4C2E790: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16714== by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590) > ==16714== by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341) > ==16714== by 0x25AB4207: mca_pml_ob1_send_request_start_prepare > (pml_ob1_sendreq.c:620) > ==16714== by 0x25AA3519: mca_pml_ob1_send_request_start_btl > (pml_ob1_sendreq.h:397) > ==16714== by 0x25AA3766: mca_pml_ob1_send_request_start_seq > (pml_ob1_sendreq.h:460) > ==16714== by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171) > ==16714== by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic > (coll_tuned_bcast.c:112) > ==16714== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial > (coll_tuned_bcast.c:385) > ==16714== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed > (coll_tuned_decision_fixed.c:258) > ==16714== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) > ==16714== by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16714== Address 0x219377d2 is 2 bytes after a block of size 256 > alloc'd > ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ... > ==16714== Invalid read of size 1 > ==16714== at 0x4C2E7B8: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16714== by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590) > ==16714== by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341) > ==16714== by 0x25AB4207: mca_pml_ob1_send_request_start_prepare > (pml_ob1_sendreq.c:620) > ==16714== by 0x25AA3519: mca_pml_ob1_send_request_start_btl > (pml_ob1_sendreq.h:397) > ==16714== by 0x25AA3766: mca_pml_ob1_send_request_start_seq > (pml_ob1_sendreq.h:460) > ==16714== by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171) > ==16714== by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic > (coll_tuned_bcast.c:112) > ==16714== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial > (coll_tuned_bcast.c:385) > ==16714== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed > (coll_tuned_decision_fixed.c:258) > ==16714== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) > ==16714== by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16714== Address 0x219377e0 is 16 bytes after a block of size 256 > alloc'd > ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16714== by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16714== by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) > ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16714== by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16714== by 0xCA44252: > GISLectureEcriture<double>::litGISMPI(std::string, > GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) > ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ... > > > I have to precise that with MPICH 3.1.3, I can't reproduce the same > bad behavior. > > Also, the segfault is not always there: running the same code with > other inputs, gave me trouble-free results with or without valgrind. > I noticed the problem appears mors frequently with longer "paths". > > Please, help! > > Thanks, > > Eric > > ompi_info -all : > http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.184rc3.txt.gz > config.log: > http://www.giref.ulaval.ca/~ericc/ompi_bug/config.184rc3.log.gz > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25983.php