Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2015-01-16 Thread Eric Chamberland
On 01/14/2015 05:57 PM, Rob Latham wrote: On 12/17/2014 07:04 PM, Eric Chamberland wrote: Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3-

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2015-01-14 Thread Rob Latham
On 12/17/2014 07:04 PM, Eric Chamberland wrote: Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2015-01-12 Thread Rob Latham
On 12/17/2014 07:04 PM, Eric Chamberland wrote: Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-17 Thread Eric Chamberland
Hi! Here is a "poor man's fix" that works for me (the idea is not from me, thanks to Thomas H.): #1- char* lCwd = getcwd(0,0); #2- chdir(lPathToFile); #3- MPI_File_open(...,lFileNameWithoutTooLongPath,...); #4- chdir(lCwd); #5- ... I think there are some limitations but it works very well

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-15 Thread Gilles Gouaillardet
Eric and all, That is clearly a limitation in romio, and this is being tracked at https://trac.mpich.org/projects/mpich/ticket/2212 in the mean time, what we can do in OpenMPI is update mca_io_romio_file_open() and fails with a user friendly error message if strlen(filename) is larger that 225.

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-15 Thread Gilles Gouaillardet
Eric, thanks for the simple test program. i think i see what is going wrong and i will make some changes to avoid the memory overflow. that being said, there is a hard coded limit of 256 characters, and your path is bigger than 300 characters. bottom line, and even if there is no more memory

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-15 Thread Eric Chamberland
Hi Gilles, here is a simple setup to have valgrind caomplains now: export

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi Gilles, ok I patched the file, without valgrind it exploded at MPI_File_close: *** Error in `/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev': free(): invalid next size (normal): 0x04b6c950 *** === Backtrace: =

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
On 12/14/2014 09:55 PM, Gilles Gouaillardet wrote: Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). Oh my god! Is it that simple? By the way, my filename is shorter thant 256, but the whole path is: echo

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). i am now checking if the overflow is correctly detected (that could explain the one byte overflow reported by valgrind) Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: > Hi again, > >

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi again, some new hints that might help: 1- With valgrind : If I run the same test case, same data, but moved to a shorter path+filename, then *valgrind* does *not* complains!! 2- Without valgrind: *Sometimes*, the test case with long path+filename passes without "segfaulting"! 3- It

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi Gilles, On 12/14/2014 09:20 PM, Gilles Gouaillardet wrote: Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? I would like to, but the complete app is big (and not public), is on top of PETSc with mkl, and in C++... :-( I

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? Based on the stack trace, i assume this is a complete end user application. have you tried/been able to reproduce the same kind of crash with a trimmed test program ? BTW, what

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi, I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for my problem with collective MPI I/O. A problem still there. In this 2 processes example, process rank 1 dies with segfault while process rank 0 wait indefinitely... Running with valgrind, I found these errors which