A simple test would be to run it with valgrind, so that out of bound reads and writes will be obvious.
Cheers, Matthieu 2014-05-08 21:16 GMT+02:00 Spenser Gilliland <spen...@gillilanding.com>: > George & Mattheiu, > >> The Alltoall should only return when all data is sent and received on >> the current rank, so there shouldn't be any race condition. > > Your right this is MPI not pthreads. That should never happen. Duh! > >> I think the issue is with the way you define the send and receive >> buffer in the MPI_Alltoall. You have to keep in mind that the >> all-to-all pattern will overwrite the entire data in the receive >> buffer. Thus, starting from a relative displacement in the data (in >> this case matrix[wrank*wrows]), begs for troubles, as you will write >> outside the receive buffer. > > The submatrix corresponding to matrix[wrank*wrows][0] to > matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process. This > is a block distribution of the rows like what MPI_Scatter would > produce. As wrows is equal to N (matrix width/height) divided by > wsize, the number of mpi_all_t blocks in each message is equal to > wsize. Therefore, there should be no writing outside the bounds of > the submatrix. > > On another note, > I just ported the example to use dynamic memory and now I'm getting > segfaults when I call MPI_Finalize(). Any idea what in the code could > have caused this? > > It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39 > > The result is > > [sgillila@jarvis src]$ mpirun -npernode 2 transpose2 8 > N = 8 > Matrix = > 0: 0 1 2 3 4 5 6 7 > 0: 8 9 10 11 12 13 14 15 > 0: 16 17 18 19 20 21 22 23 > 0: 24 25 26 27 28 29 30 31 > 1: 32 33 34 35 36 37 38 39 > 1: 40 41 42 43 44 45 46 47 > 1: 48 49 50 51 52 53 54 55 > 1: 56 57 58 59 60 61 62 63 > Matrix = > 0: 0 8 16 24 32 40 48 56 > 0: 1 9 17 25 33 41 49 57 > 0: 2 10 18 26 34 42 50 58 > 0: 3 11 19 27 35 43 51 59 > 1: 4 12 20 28 36 44 52 60 > 1: 5 13 21 29 37 45 53 61 > 1: 6 14 22 30 38 46 54 62 > 1: 7 15 23 31 39 47 55 63 > [jarvis:09314] *** Process received signal *** > [jarvis:09314] Signal: Segmentation fault (11) > [jarvis:09314] Signal code: Address not mapped (1) > [jarvis:09314] Failing at address: 0x21da228 > [jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500] > [jarvis:09314] [ 1] > /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75) > [0x7f2e85452575] > [jarvis:09314] [ 2] > /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3) > [0x7f2e85452bc3] > [jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0] > [jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd] > [jarvis:09314] [ 5] transpose2() [0x400d49] > [jarvis:09314] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 1 with PID 9314 on node > jarvis.cs.iit.edu exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > > -- > Spenser Gilliland > Computer Engineer > Doctoral Candidate > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/