Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
On 01/14/2015 05:57 PM, Rob Latham wrote: On 12/17/2014 07:04 PM, Eric Chamberland wrote: Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think there are some limitations but it works very well for our uses... and until a "real" fix is proposed... Thanks for the bug report and test cases. I just pushed two fixes for master that fix the problem you were seeing: http://git.mpich.org/mpich.git/commit/ed39c901 http://git.mpich.org/mpich.git/commit/a30a4721a2 ==rob Great! Thank you for the follow up (and both messages)! Eric
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
On 12/17/2014 07:04 PM, Eric Chamberland wrote: Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think there are some limitations but it works very well for our uses... and until a "real" fix is proposed... Thanks for the bug report and test cases. I just pushed two fixes for master that fix the problem you were seeing: http://git.mpich.org/mpich.git/commit/ed39c901 http://git.mpich.org/mpich.git/commit/a30a4721a2 ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
On 12/17/2014 07:04 PM, Eric Chamberland wrote: Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think there are some limitations but it works very well for our uses... and until a "real" fix is proposed... A bit of a delay on my part due to the winter break but I have returned to this topic. I have an approach that will at least tell you something went wrong in processing the shared file pointer name: the string is so long it truncates the error message, but it leaves enough to tell you what went wrong. ERROR Returned by MPI: 1006695702 ERROR_string Returned by MPI: Invalid file name, error stack: ADIOI_Shfp_fname(60): Pathname this/is/a_very/long/path/that/contains/a/not/so/long/filename /but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/ invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simpimple/use r At least you get "invalid file name" Furthermore, I'm changing that code to use PATH_MAX, not 256, which would have fixed the specific problem you encountered (and might have been sufficient to get us 10 more years, at which point someone might try to create a file with 1000 characters in it) ==rob Thanks for helping! Eric On 12/15/2014 11:42 PM, Gilles Gouaillardet wrote: Eric and all, That is clearly a limitation in romio, and this is being tracked at https://trac.mpich.org/projects/mpich/ticket/2212 in the mean time, what we can do in OpenMPI is update mca_io_romio_file_open() and fails with a user friendly error message if strlen(filename) is larger that 225. Cheers, Gilles On 2014/12/16 12:43, Gilles Gouaillardet wrote: Eric, thanks for the simple test program. i think i see what is going wrong and i will make some changes to avoid the memory overflow. that being said, there is a hard coded limit of 256 characters, and your path is bigger than 300 characters. bottom line, and even if there is no more memory overflow, that cannot work as expected. i will report this to the mpich folks, since romio is currently imported from mpich. Cheers, Gilles On 2014/12/16 0:16, Eric Chamberland wrote: Hi Gilles, just created a very simple test case! with this setup, you will see the bug with valgrind: export too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed mpicc -o bug_MPI_File_open_path_too_long bug_MPI_File_open_path_too_long.c mkdir -p $too_long echo "header of a text file" > $too_long/toto.txt mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long $too_long/toto.txt and watch the errors! unfortunately, the memory corruptions here doesn't seem to segfault this simple test case, but in my case, it is fatal and with valgrind, it is reported... OpenMPI 1.6.5, 1.8.3rc3 are affected MPICH-3.1.3 also have the error! thanks, Eric ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/12/26005.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/12/26006.php -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think there are some limitations but it works very well for our uses... and until a "real" fix is proposed... Thanks for helping! Eric On 12/15/2014 11:42 PM, Gilles Gouaillardet wrote: Eric and all, That is clearly a limitation in romio, and this is being tracked at https://trac.mpich.org/projects/mpich/ticket/2212 in the mean time, what we can do in OpenMPI is update mca_io_romio_file_open() and fails with a user friendly error message if strlen(filename) is larger that 225. Cheers, Gilles On 2014/12/16 12:43, Gilles Gouaillardet wrote: Eric, thanks for the simple test program. i think i see what is going wrong and i will make some changes to avoid the memory overflow. that being said, there is a hard coded limit of 256 characters, and your path is bigger than 300 characters. bottom line, and even if there is no more memory overflow, that cannot work as expected. i will report this to the mpich folks, since romio is currently imported from mpich. Cheers, Gilles On 2014/12/16 0:16, Eric Chamberland wrote: Hi Gilles, just created a very simple test case! with this setup, you will see the bug with valgrind: export too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed mpicc -o bug_MPI_File_open_path_too_long bug_MPI_File_open_path_too_long.c mkdir -p $too_long echo "header of a text file" > $too_long/toto.txt mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long $too_long/toto.txt and watch the errors! unfortunately, the memory corruptions here doesn't seem to segfault this simple test case, but in my case, it is fatal and with valgrind, it is reported... OpenMPI 1.6.5, 1.8.3rc3 are affected MPICH-3.1.3 also have the error! thanks, Eric ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/12/26005.php ___ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/12/26006.php
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Eric and all, That is clearly a limitation in romio, and this is being tracked at https://trac.mpich.org/projects/mpich/ticket/2212 in the mean time, what we can do in OpenMPI is update mca_io_romio_file_open() and fails with a user friendly error message if strlen(filename) is larger that 225. Cheers, Gilles On 2014/12/16 12:43, Gilles Gouaillardet wrote: > Eric, > > thanks for the simple test program. > > i think i see what is going wrong and i will make some changes to avoid > the memory overflow. > > that being said, there is a hard coded limit of 256 characters, and your > path is bigger than 300 characters. > bottom line, and even if there is no more memory overflow, that cannot > work as expected. > > i will report this to the mpich folks, since romio is currently imported > from mpich. > > Cheers, > > Gilles > > On 2014/12/16 0:16, Eric Chamberland wrote: >> Hi Gilles, >> >> just created a very simple test case! >> >> with this setup, you will see the bug with valgrind: >> >> export >> too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed >> >> mpicc -o bug_MPI_File_open_path_too_long >> bug_MPI_File_open_path_too_long.c >> >> mkdir -p $too_long >> echo "header of a text file" > $too_long/toto.txt >> >> mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long >> $too_long/toto.txt >> >> and watch the errors! >> >> unfortunately, the memory corruptions here doesn't seem to segfault >> this simple test case, but in my case, it is fatal and with valgrind, >> it is reported... >> >> OpenMPI 1.6.5, 1.8.3rc3 are affected >> >> MPICH-3.1.3 also have the error! >> >> thanks, >> >> Eric >> > ___ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/26005.php
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Eric, thanks for the simple test program. i think i see what is going wrong and i will make some changes to avoid the memory overflow. that being said, there is a hard coded limit of 256 characters, and your path is bigger than 300 characters. bottom line, and even if there is no more memory overflow, that cannot work as expected. i will report this to the mpich folks, since romio is currently imported from mpich. Cheers, Gilles On 2014/12/16 0:16, Eric Chamberland wrote: > Hi Gilles, > > just created a very simple test case! > > with this setup, you will see the bug with valgrind: > > export > too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed > > mpicc -o bug_MPI_File_open_path_too_long > bug_MPI_File_open_path_too_long.c > > mkdir -p $too_long > echo "header of a text file" > $too_long/toto.txt > > mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long > $too_long/toto.txt > > and watch the errors! > > unfortunately, the memory corruptions here doesn't seem to segfault > this simple test case, but in my case, it is fatal and with valgrind, > it is reported... > > OpenMPI 1.6.5, 1.8.3rc3 are affected > > MPICH-3.1.3 also have the error! > > thanks, > > Eric >
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi Gilles, here is a simple setup to have valgrind caomplains now: export too_long=./this/is/a_very/long/path/that/contains/a/not/so/long/filename/but/trying/to/collectively/mpi_file_open/it/you/will/have/a/memory/corruption/resulting/of/invalide/writing/or/reading/past/the/end/of/one/or/some/hidden/strings/in/mpio/Simple/user/would/like/to/have/the/parameter/checked/and/an/error/returned/or/this/limit/removed mkdir -p $too_long echo "hello world." > $too_long/toto.txt mpicc -o bug_MPI_File_open_path_too_long bug_MPI_File_open_path_too_long.c mpirun -np 2 valgrind ./bug_MPI_File_open_path_too_long $too_long/toto.txt and look at valgrind errors for invalid read/write on rank0/1. This particular simple case doesn't segfault without valgrind, but in as reported in my real case, it does! Thanks! Eric #include "mpi.h" #include #include #include void abortOnError(int ierr) { if (ierr != MPI_SUCCESS) { printf("ERROR Returned by MPI: %d\n",ierr); char* lCharPtr = (char*) malloc(sizeof(char)*MPI_MAX_ERROR_STRING); int lLongueur = 0; MPI_Error_string(ierr,lCharPtr, ); printf("ERROR_string Returned by MPI: %s\n",lCharPtr); free(lCharPtr); MPI_Abort( MPI_COMM_WORLD, 1 ); } } int openFileCollectivelyAndReadMyFormat(char* pFileName) { int lReturnValue = 0; MPI_File lFile = 0; printf("Opening the file by MPI_file_open : %s\n", pFileName); abortOnError(MPI_File_open( MPI_COMM_WORLD, pFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, )); /*printf ("ierr=%d, lFile=%ld, lFile == MPI_FILE_NULL ? %d",ierr,lFile, lFile == MPI_FILE_NULL);*/ long int lTrois = 0; char lCharGIS[]="123\0"; long int lOnze = 0; char lCharVersion10[]="12345678901\0"; abortOnError(MPI_File_read_all(lFile, , 1, MPI_LONG, MPI_STATUS_IGNORE)); abortOnError(MPI_File_read_all(lFile,lCharGIS, 3, MPI_CHAR, MPI_STATUS_IGNORE)); if (3 != lTrois) { lReturnValue = 1; } if (0 == lReturnValue && 0 != strcmp(lCharGIS, "123\0")) { lReturnValue = 2; } if (lFile) { printf(" ...closing the file %s\n", pFileName); abortOnError(MPI_File_close( )); } return lReturnValue; } int main(int argc, char *argv[]) { char lValeur[1024]; char *lHints[] = {"cb_nodes", "striping_factor", "striping_unit"}; int flag; MPI_Init(, ); if (2 != argc) { printf("ERROR: you must specify a filename to create.\n"); MPI_Finalize(); return 1; } if (strlen(argv[1]) < 256) { printf("ERROR: you must specify a path+filename longer than 256 to have the bug!.\n"); MPI_Finalize(); return 1; } int lResult = 0; int i; for (i=0; i<10 ; ++i) { lResult |= openFileCollectivelyAndReadMyFormat(argv[1]); } MPI_Finalize(); return lResult; }
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi Gilles, ok I patched the file, without valgrind it exploded at MPI_File_close: *** Error in `/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev': free(): invalid next size (normal): 0x04b6c950 *** === Backtrace: = /lib64/libc.so.6(+0x7ac56)[0x7fab5692bc56] /lib64/libc.so.6(+0x7b9d3)[0x7fab5692c9d3] /opt/openmpi-1.8.4rc3_debug/lib64/openmpi/mca_io_romio.so(ADIOI_Free_fn+0x5f)[0x7fab4c1b9920] /opt/openmpi-1.8.4rc3_debug/lib64/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xc6)[0x7fab4c185afa] /opt/openmpi-1.8.4rc3_debug/lib64/openmpi/mca_io_romio.so(mca_io_romio_file_close+0x2be)[0x7fab4c180e88] /opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(+0x4c09c)[0x7fab574ed09c] /opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(+0x4af4b)[0x7fab574ebf4b] /opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(ompi_file_close+0xd7)[0x7fab574eca0d] /opt/openmpi-1.8.4rc3_debug/lib64/libmpi.so.1(PMPI_File_close+0xc1)[0x7fab57572e62] /pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/lib/libgiref_dev_Champs.so(_ZN18GISLectureEcritureIdE9litGISMPIESsR13GroupeInfoSurIdERSs+0x258f)[0x7fab658bb637] /pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/lib/libgiref_dev_Champs.so(_ZN5Champ16importeParalleleERKSs+0x2ae)[0x7fab65898f0e] /pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev[0x4d0def] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab568d2a15] /pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev[0x4b1429] I will launch it in valgrind now... but since it last 20 minutes, I will send the result tomorrow only... anyway, merci beaucoup! :-) Eric On 12/14/2014 10:26 PM, Gilles Gouaillardet wrote: Eric, here is a patch for the v1.8 series, it fixes a one byte overflow. valgrind should stop complaining, and assuming this is the root cause of the memory corruption, that could also fix your program. that being said, shared_fp_fname is limited to 255 characters (this is hard coded) so even if it gets truncated to 255 characters (instead of 256), the behavior could be kind of random. /* from ADIOI_Shfp_fname : If the real file is /tmp/thakur/testfile, the shared-file-pointer file will be /tmp/thakur/.testfile.shfp., where is FWIW, is a random number that takes between 1 and 10 characters could you please give this patch a try and let us know the results ? Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: Hi again, some new hints that might help: 1- With valgrind : If I run the same test case, same data, but moved to a shorter path+filename, then *valgrind* does *not* complains!! 2- Without valgrind: *Sometimes*, the test case with long path+filename passes without "segfaulting"! 3- It seems to happen at the fourth file I try to open using the following described procedure: Also, I was wondering about this: In this 2 processes test case (running in the same node), I : 1- open the file collectively (which resides on the same ssd drive on my computer) 2- MPI_File_read_at_all a long int and 3 chars (11 bytes) 3- stop (because I detect I am not reading my MPIIO file format) 4- close the file A guess (FWIW): Can process rank 0, for example close the file too quickly, which destroys the string reserved for the filename that is used by process rank 1 which could be using shared memory on the same node? Thanks, Eric On 12/14/2014 02:06 PM, Eric Chamberland wrote: Hi, I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for my problem with collective MPI I/O. A problem still there. In this 2 processes example, process rank 1 dies with segfault while process rank 0 wait indefinitely... Running with valgrind, I found these errors which may gives hints: * Rank 1: * On process rank 1, without valgrind it ends with either a segmentation violation or memory corruption or invalide free without valgrind). But running with valgrind, it tells: ==16715== Invalid write of size 2 ==16715==at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:225) ==16715==by 0x2544110C: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220) ==16715==by 0x25443577: mca_btl_vader_component_progress (btl_vader_component.c:695) ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic (coll_tuned_bcast.c:254) ==16715==by 0x26BA36F7:
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
On 12/14/2014 09:55 PM, Gilles Gouaillardet wrote: Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). Oh my god! Is it that simple? By the way, my filename is shorter thant 256, but the whole path is: echo "/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/TestValidation/Ressources/dev/Test.NormesEtProjectionChamp/Ressources.champscalhermite2dordre5incarete_elemtri_2proc/Resultats.Etalons/champscalhermite2dordre5incarete_elemtri_2procReinterpole_UAna.U0" |wc -c 258 characters If this is it, I hope it can be fixed in some manner... at least an error message!... Thanks, Eric i am now checking if the overflow is correctly detected (that could explain the one byte overflow reported by valgrind) Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: Hi again, some new hints that might help: 1- With valgrind : If I run the same test case, same data, but moved to a shorter path+filename, then *valgrind* does *not* complains!! 2- Without valgrind: *Sometimes*, the test case with long path+filename passes without "segfaulting"! 3- It seems to happen at the fourth file I try to open using the following described procedure: Also, I was wondering about this: In this 2 processes test case (running in the same node), I : 1- open the file collectively (which resides on the same ssd drive on my computer) 2- MPI_File_read_at_all a long int and 3 chars (11 bytes) 3- stop (because I detect I am not reading my MPIIO file format) 4- close the file A guess (FWIW): Can process rank 0, for example close the file too quickly, which destroys the string reserved for the filename that is used by process rank 1 which could be using shared memory on the same node? Thanks, Eric On 12/14/2014 02:06 PM, Eric Chamberland wrote: Hi, I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for my problem with collective MPI I/O. A problem still there. In this 2 processes example, process rank 1 dies with segfault while process rank 0 wait indefinitely... Running with valgrind, I found these errors which may gives hints: * Rank 1: * On process rank 1, without valgrind it ends with either a segmentation violation or memory corruption or invalide free without valgrind). But running with valgrind, it tells: ==16715== Invalid write of size 2 ==16715==at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:225) ==16715==by 0x2544110C: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220) ==16715==by 0x25443577: mca_btl_vader_component_progress (btl_vader_component.c:695) ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic (coll_tuned_bcast.c:254) ==16715==by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial (coll_tuned_bcast.c:385) ==16715==by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed (coll_tuned_decision_fixed.c:258) ==16715==by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) ==16715==by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==by 0x1AD51DFA: mca_io_base_file_select (io_base_file_select.c:238) ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) ==16715==by 0x13F9B36F: PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, ompi_file_t*&, bool) (PAIO.cc:290) ==16715==by 0xCA44252: GISLectureEcriture::litGISMPI(std::string, GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) (Champ.cc:951) ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 alloc'd ==16715==at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) ==16715==by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) ==16715==by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==by 0x1AD51DFA: mca_io_base_file_select (io_base_file_select.c:238) ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) ==16715==by 0x1AD30DA3:
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). i am now checking if the overflow is correctly detected (that could explain the one byte overflow reported by valgrind) Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: > Hi again, > > some new hints that might help: > > 1- With valgrind : If I run the same test case, same data, but > moved to a shorter path+filename, then *valgrind* does *not* > complains!! > 2- Without valgrind: *Sometimes*, the test case with long > path+filename passes without "segfaulting"! > 3- It seems to happen at the fourth file I try to open using the > following described procedure: > > Also, I was wondering about this: In this 2 processes test case > (running in the same node), I : > > 1- open the file collectively (which resides on the same ssd drive on > my computer) > 2- MPI_File_read_at_all a long int and 3 chars (11 bytes) > 3- stop (because I detect I am not reading my MPIIO file format) > 4- close the file > > A guess (FWIW): Can process rank 0, for example close the file too > quickly, which destroys the string reserved for the filename that is > used by process rank 1 which could be using shared memory on the same > node? > > Thanks, > > Eric > > On 12/14/2014 02:06 PM, Eric Chamberland wrote: >> Hi, >> >> I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for >> my problem with collective MPI I/O. >> >> A problem still there. In this 2 processes example, process rank 1 >> dies with segfault while process rank 0 wait indefinitely... >> >> Running with valgrind, I found these errors which may gives hints: >> >> * >> Rank 1: >> * >> On process rank 1, without valgrind it ends with either a segmentation >> violation or memory corruption or invalide free without valgrind). >> >> But running with valgrind, it tells: >> >> ==16715== Invalid write of size 2 >> ==16715==at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) >> ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) >> ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match >> (pml_ob1_recvfrag.c:225) >> ==16715==by 0x2544110C: mca_btl_vader_check_fboxes >> (btl_vader_fbox.h:220) >> ==16715==by 0x25443577: mca_btl_vader_component_progress >> (btl_vader_component.c:695) >> ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) >> ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) >> ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) >> ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) >> ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic >> (coll_tuned_bcast.c:254) >> ==16715==by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial >> (coll_tuned_bcast.c:385) >> ==16715==by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed >> (coll_tuned_decision_fixed.c:258) >> ==16715==by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) >> ==16715==by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) >> ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16715==by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16715==by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16715==by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16715==by 0xCA44252: >> GISLectureEcriture::litGISMPI(std::string, >> GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) >> ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 >> alloc'd >> ==16715==at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16715==by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16715==by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16715==by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16715==by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16715==by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16715==by 0xCA44252: >> GISLectureEcriture::litGISMPI(std::string, >> GroupeInfoSur&, std::string&)
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi again, some new hints that might help: 1- With valgrind : If I run the same test case, same data, but moved to a shorter path+filename, then *valgrind* does *not* complains!! 2- Without valgrind: *Sometimes*, the test case with long path+filename passes without "segfaulting"! 3- It seems to happen at the fourth file I try to open using the following described procedure: Also, I was wondering about this: In this 2 processes test case (running in the same node), I : 1- open the file collectively (which resides on the same ssd drive on my computer) 2- MPI_File_read_at_all a long int and 3 chars (11 bytes) 3- stop (because I detect I am not reading my MPIIO file format) 4- close the file A guess (FWIW): Can process rank 0, for example close the file too quickly, which destroys the string reserved for the filename that is used by process rank 1 which could be using shared memory on the same node? Thanks, Eric On 12/14/2014 02:06 PM, Eric Chamberland wrote: Hi, I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for my problem with collective MPI I/O. A problem still there. In this 2 processes example, process rank 1 dies with segfault while process rank 0 wait indefinitely... Running with valgrind, I found these errors which may gives hints: * Rank 1: * On process rank 1, without valgrind it ends with either a segmentation violation or memory corruption or invalide free without valgrind). But running with valgrind, it tells: ==16715== Invalid write of size 2 ==16715==at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:225) ==16715==by 0x2544110C: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220) ==16715==by 0x25443577: mca_btl_vader_component_progress (btl_vader_component.c:695) ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic (coll_tuned_bcast.c:254) ==16715==by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial (coll_tuned_bcast.c:385) ==16715==by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed (coll_tuned_decision_fixed.c:258) ==16715==by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) ==16715==by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==by 0x1AD51DFA: mca_io_base_file_select (io_base_file_select.c:238) ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) ==16715==by 0x13F9B36F: PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, ompi_file_t*&, bool) (PAIO.cc:290) ==16715==by 0xCA44252: GISLectureEcriture::litGISMPI(std::string, GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) (Champ.cc:951) ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 alloc'd ==16715==at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) ==16715==by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) ==16715==by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==by 0x1AD51DFA: mca_io_base_file_select (io_base_file_select.c:238) ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) ==16715==by 0x13F9B36F: PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, ompi_file_t*&, bool) (PAIO.cc:290) ==16715==by 0xCA44252: GISLectureEcriture::litGISMPI(std::string, GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) (Champ.cc:951) ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) ... ... ==16715== Invalid write of size 1 ==16715==at 0x4C2E7BB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:225) ==16715==by 0x2544110C: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220) ==16715==by
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi Gilles, On 12/14/2014 09:20 PM, Gilles Gouaillardet wrote: Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? I would like to, but the complete app is big (and not public), is on top of PETSc with mkl, and in C++... :-( I can for sure send you binaries if you have any of the following plateforms (RedHat 6.6, openSUSE 13.1 , openSUSE 12.3 , Fedora 19, RedHat 5.7 or openSUSE 11.3 ) and input files (maybe we could get it run in a chrooted environnement? but I never tried this), but our source code I don't think I can... but I would like to post a simple example showing the problem... Based on the stack trace, i assume this is a complete end user application. have you tried/been able to reproduce the same kind of crash with a trimmed test program ? I am trying to do so right now... ;-) I try to reproduce the very exact order for open/close of files by MPI followed with "normal" open of the file, etc... If I can reproduce the problem, I will send it immediatly to the list. It is an intermittent problem, but valgrind seems to catch it every time! I will work on this this evening and in the following days, hoping to send it in time before the final release... BTW, what kind of filesystem is hosting Resultats.Eta1 ? (e.g. ext4 / nfs / lustre / other) It is a local hard drive with ext4. oh, just noticed that one of my mail didn't made it to the list... I will try to resend it now... contains a few hints... Merci! :-) Eric
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? Based on the stack trace, i assume this is a complete end user application. have you tried/been able to reproduce the same kind of crash with a trimmed test program ? BTW, what kind of filesystem is hosting Resultats.Eta1 ? (e.g. ext4 / nfs / lustre / other) Cheers, Gilles On 2014/12/15 4:06, Eric Chamberland wrote: > Hi, > > I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for > my problem with collective MPI I/O. > > A problem still there. In this 2 processes example, process rank 1 > dies with segfault while process rank 0 wait indefinitely... > > Running with valgrind, I found these errors which may gives hints: > > * > Rank 1: > * > On process rank 1, without valgrind it ends with either a segmentation > violation or memory corruption or invalide free without valgrind). > > But running with valgrind, it tells: > > ==16715== Invalid write of size 2 > ==16715==at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) > ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match > (pml_ob1_recvfrag.c:225) > ==16715==by 0x2544110C: mca_btl_vader_check_fboxes > (btl_vader_fbox.h:220) > ==16715==by 0x25443577: mca_btl_vader_component_progress > (btl_vader_component.c:695) > ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) > ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) > ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) > ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) > ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic > (coll_tuned_bcast.c:254) > ==16715==by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial > (coll_tuned_bcast.c:385) > ==16715==by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed > (coll_tuned_decision_fixed.c:258) > ==16715==by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) > ==16715==by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) > ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16715==by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16715==by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) > ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16715==by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16715==by 0xCA44252: > GISLectureEcriture::litGISMPI(std::string, > GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) > ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 > alloc'd > ==16715==at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) > ==16715==by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) > ==16715==by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) > ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) > ==16715==by 0x2FDE3B0D: mca_io_romio_file_open > (io_romio_file_open.c:40) > ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) > ==16715==by 0x1AD51DFA: mca_io_base_file_select > (io_base_file_select.c:238) > ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) > ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) > ==16715==by 0x13F9B36F: > PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, > ompi_file_t*&, bool) (PAIO.cc:290) > ==16715==by 0xCA44252: > GISLectureEcriture::litGISMPI(std::string, > GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) > ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) > (Champ.cc:951) > ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) > ... > ... > ==16715== Invalid write of size 1 > ==16715==at 0x4C2E7BB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) > ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) > ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match > (pml_ob1_recvfrag.c:225) > ==16715==by 0x2544110C: mca_btl_vader_check_fboxes > (btl_vader_fbox.h:220) > ==16715==by 0x25443577: mca_btl_vader_component_progress > (btl_vader_component.c:695) > ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) > ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) > ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) > ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) > ==16715==by 0x26BA2FFB:
Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close
Hi, I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for my problem with collective MPI I/O. A problem still there. In this 2 processes example, process rank 1 dies with segfault while process rank 0 wait indefinitely... Running with valgrind, I found these errors which may gives hints: * Rank 1: * On process rank 1, without valgrind it ends with either a segmentation violation or memory corruption or invalide free without valgrind). But running with valgrind, it tells: ==16715== Invalid write of size 2 ==16715==at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:225) ==16715==by 0x2544110C: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220) ==16715==by 0x25443577: mca_btl_vader_component_progress (btl_vader_component.c:695) ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic (coll_tuned_bcast.c:254) ==16715==by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial (coll_tuned_bcast.c:385) ==16715==by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed (coll_tuned_decision_fixed.c:258) ==16715==by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) ==16715==by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==by 0x1AD51DFA: mca_io_base_file_select (io_base_file_select.c:238) ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) ==16715==by 0x13F9B36F: PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, ompi_file_t*&, bool) (PAIO.cc:290) ==16715==by 0xCA44252: GISLectureEcriture::litGISMPI(std::string, GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) (Champ.cc:951) ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 alloc'd ==16715==at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) ==16715==by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) ==16715==by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==by 0x1AD51DFA: mca_io_base_file_select (io_base_file_select.c:238) ==16715==by 0x1ACA582F: ompi_file_open (file.c:130) ==16715==by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) ==16715==by 0x13F9B36F: PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, ompi_file_t*&, bool) (PAIO.cc:290) ==16715==by 0xCA44252: GISLectureEcriture::litGISMPI(std::string, GroupeInfoSur&, std::string&) (GISLectureEcriture.icc:411) ==16715==by 0xCA23F0D: Champ::importeParallele(std::string const&) (Champ.cc:951) ==16715==by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) ... ... ==16715== Invalid write of size 1 ==16715==at 0x4C2E7BB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) ==16715==by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) ==16715==by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match (pml_ob1_recvfrag.c:225) ==16715==by 0x2544110C: mca_btl_vader_check_fboxes (btl_vader_fbox.h:220) ==16715==by 0x25443577: mca_btl_vader_component_progress (btl_vader_component.c:695) ==16715==by 0x1F5F0F27: opal_progress (opal_progress.c:207) ==16715==by 0x1ACB40B3: opal_condition_wait (condition.h:93) ==16715==by 0x1ACB4201: ompi_request_wait_completion (request.h:381) ==16715==by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) ==16715==by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic (coll_tuned_bcast.c:254) ==16715==by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial (coll_tuned_bcast.c:385) ==16715==by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed (coll_tuned_decision_fixed.c:258) ==16715==by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) ==16715==by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) ==16715==by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) ==16715==by 0x2FDE3B0D: mca_io_romio_file_open (io_romio_file_open.c:40) ==16715==by 0x1AD52344: module_init (io_base_file_select.c:455) ==16715==