The segfault indicates that you overwrite outside of the allocated memory (and 
conflicts with the ptmalloc library). I’m quite certain that you write outside 
the allocated array …

  George.

On May 8, 2014, at 15:16 , Spenser Gilliland <spen...@gillilanding.com> wrote:

> George & Mattheiu,
> 
>> The Alltoall should only return when all data is sent and received on
>> the current rank, so there shouldn't be any race condition.
> 
> Your right this is MPI not pthreads.  That should never happen. Duh!
> 
>> I think the issue is with the way you define the send and receive
>> buffer in the MPI_Alltoall. You have to keep in mind that the
>> all-to-all pattern will overwrite the entire data in the receive
>> buffer. Thus, starting from a relative displacement in the data (in
>> this case matrix[wrank*wrows]), begs for troubles, as you will write
>> outside the receive buffer.
> 
> The submatrix corresponding to matrix[wrank*wrows][0] to
> matrix[(wrank+1)*wrows-1][:] is valid only on the wrank process.  This
> is a block distribution of the rows like what MPI_Scatter would
> produce.  As wrows is equal to N (matrix width/height) divided by
> wsize, the number of mpi_all_t blocks in each message is equal to
> wsize.  Therefore, there should be no writing outside the bounds of
> the submatrix.
> 
> On another note,
> I just ported the example to use dynamic memory and now I'm getting
> segfaults when I call MPI_Finalize().  Any idea what in the code could
> have caused this?
> 
> It's paste binned here: https://gist.github.com/anonymous/a80e0679c3cbffb82e39
> 
> The result is
> 
> [sgillila@jarvis src]$ mpirun -npernode 2 transpose2 8
> N = 8
> Matrix =
> 0:     0     1     2     3     4     5     6     7
> 0:     8     9    10    11    12    13    14    15
> 0:    16    17    18    19    20    21    22    23
> 0:    24    25    26    27    28    29    30    31
> 1:    32    33    34    35    36    37    38    39
> 1:    40    41    42    43    44    45    46    47
> 1:    48    49    50    51    52    53    54    55
> 1:    56    57    58    59    60    61    62    63
> Matrix =
> 0:     0     8    16    24    32    40    48    56
> 0:     1     9    17    25    33    41    49    57
> 0:     2    10    18    26    34    42    50    58
> 0:     3    11    19    27    35    43    51    59
> 1:     4    12    20    28    36    44    52    60
> 1:     5    13    21    29    37    45    53    61
> 1:     6    14    22    30    38    46    54    62
> 1:     7    15    23    31    39    47    55    63
> [jarvis:09314] *** Process received signal ***
> [jarvis:09314] Signal: Segmentation fault (11)
> [jarvis:09314] Signal code: Address not mapped (1)
> [jarvis:09314] Failing at address: 0x21da228
> [jarvis:09314] [ 0] /lib64/libpthread.so.0() [0x371480f500]
> [jarvis:09314] [ 1]
> /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_int_free+0x75)
> [0x7f2e85452575]
> [jarvis:09314] [ 2]
> /opt/openmpi/lib/libmpi.so.1(opal_memory_ptmalloc2_free+0xd3)
> [0x7f2e85452bc3]
> [jarvis:09314] [ 3] transpose2(main+0x160) [0x4012a0]
> [jarvis:09314] [ 4] /lib64/libc.so.6(__libc_start_main+0xfd) [0x3713c1ecdd]
> [jarvis:09314] [ 5] transpose2() [0x400d49]
> [jarvis:09314] *** End of error message ***
> --------------------------------------------------------------------------
> mpirun noticed that process rank 1 with PID 9314 on node
> jarvis.cs.iit.edu exited on signal 11 (Segmentation fault).
> --------------------------------------------------------------------------
> 
> -- 
> Spenser Gilliland
> Computer Engineer
> Doctoral Candidate
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to