A simple test would be to run it with valgrind, so that out of bound
reads and writes will be obvious.

Cheers,

Matthieu

2014-05-08 21:16 GMT+02:00 Spenser Gilliland <spen...@gillilanding.com>:
> George & Mattheiu,
>
>> The Alltoall should only return when all data is sent and received on
>> the current rank, so there shouldn't be any race condition.
>
> Your right this is MPI not pthreads.  That should never happen. Duh!
>
>> I think the issue is with the way you define the send and receive
>> buffer in the MPI_Alltoall. You have to keep in mind that the
>> all-to-all pattern will overwrite the entire data in the receive
>> buffer. Thus, starting from a relative displacement in the data (in
>> this case matrix[wrank*wrows]), begs for troubles, as you will write
>> outside the receive buffer.
>
> The submatrix corresponding to matrix[wrank*wrows][0] to
> matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process.  This
> is a block distribution of the rows like what MPI_Scatter would
> produce.  As wrows is equal to N (matrix width/height) divided by
> wsize, the number of mpi_all_t blocks in each message is equal to
> wsize.  Therefore, there should be no writing outside the bounds of
> the submatrix.
>
> On another note,
> I just ported the example to use dynamic memory and now I'm getting
> segfaults when I call MPI_Finalize().  Any idea what in the code could
> have caused this?
>
> It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39
>
> The result is
>
> [sgillila@jarvis src]$ mpirun -npernode 2 transpose2 8
> N = 8
> Matrix =
>  0:     0     1     2     3     4     5     6     7
>  0:     8     9    10    11    12    13    14    15
>  0:    16    17    18    19    20    21    22    23
>  0:    24    25    26    27    28    29    30    31
>  1:    32    33    34    35    36    37    38    39
>  1:    40    41    42    43    44    45    46    47
>  1:    48    49    50    51    52    53    54    55
>  1:    56    57    58    59    60    61    62    63
> Matrix =
>  0:     0     8    16    24    32    40    48    56
>  0:     1     9    17    25    33    41    49    57
>  0:     2    10    18    26    34    42    50    58
>  0:     3    11    19    27    35    43    51    59
>  1:     4    12    20    28    36    44    52    60
>  1:     5    13    21    29    37    45    53    61
>  1:     6    14    22    30    38    46    54    62
>  1:     7    15    23    31    39    47    55    63
> [jarvis:09314] *** Process received signal ***
> [jarvis:09314] Signal: Segmentation fault (11)
> [jarvis:09314] Signal code: Address not mapped (1)
> [jarvis:09314] Failing at address: 0x21da228
> [jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500]
> [jarvis:09314] [ 1]
> /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75)
> [0x7f2e85452575]
> [jarvis:09314] [ 2]
> /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3)
> [0x7f2e85452bc3]
> [jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0]
> [jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd]
> [jarvis:09314] [ 5] transpose2() [0x400d49]
> [jarvis:09314] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 9314 on node
> jarvis.cs.iit.edu exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
>
> --
> Spenser Gilliland
> Computer Engineer
> Doctoral Candidate
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/

Reply via email to