The alltoall exchanges data from all nodes to all nodes, including the
local participant. So every participant will write the same amount of
data.

  George.


On Thu, May 8, 2014 at 6:16 PM, Spenser Gilliland
<spen...@gillilanding.com> wrote:
> George,
>
>> Here is basically what is happening. On the top left, I depicted the 
>> datatype resulting from the vector type. The two arrows point to the lower 
>> bound and upper bound (thus the extent) of the datatype. On the top right, 
>> the resized datatype, where the ub is now moved 2 elements after the lb, 
>> allowing for a nice interleaving of the data. Then the next line is the 
>> unrolled datatype representation, flatten to a 1D. Again it contains in red 
>> the data touched by the defined memory layout, as well as the extent (lb and 
>> ub).
>>
>> Now, let’s move on the MPI_Alltoall call. The array is the one without 
>> colors, and then I put the datatype starting from the position you specified 
>> in the alltoall. As you can see as soon as you don’t start at the origin of 
>> the allocated memory, you end-up writing outside of your data. This happens 
>> deep inside the MPI_Alltoall call (no validation at the MPI level).
>
> Why are the last two elements in the 1D view present?  If that's the
> case I would have to define a new MPI Type for each set of columns
> within a matrix.  Why would it be defined in this manner?  Also, why
> is the extent of the initial vector type equal to 12 when it is
> actually accessing 16 elements (for the 4x4 example).
>
> So, is this a bug in Alltoall or openmpi?
>
> I believe it is all to all causing the bug and not vector because the 
> following
>
> MPI_Aint lb, extent, true_lb, true_extent;
> MPI_Type_get_extent(mpi_all_t, &lb, &extent);
> MPI_Type_get_true_extent(mpi_all_t, &true_lb, &true_extent);
> printf("mpi_all_t - lb = %d, extent = %d, true_lb = %d, true_extent =
> %d\n", lb, extent, true_lb, true_extent);
>
> produces
>
> mpi_all_t - lb = 0, extent = 16, true_lb = 0, true_extent = 240
>
> Which means that the size is correct (using 4 byte floats with 2
> processor on an 8x8 array this would be the 30th element).
>
> There's a similar drawing to what you made attached that's more
> focused on the specific instance in this code.  Hopefully, this clears
> up the algorithm a bit.
>
> Thanks,
> Spenser
>
> --
> Spenser Gilliland
> Computer Engineer
> Doctoral Candidate
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to