Bruce,

which version of OpenMPI are you using ?
out of curiosity, did you try your program with an other MPI implementation
such as MPICH or it's derivative ?
when using derived datatypes (ddt) in one sided communication, the
ddt description must be sent with the data.
two protocols are internally used
- inline for "short" description
- within a new message for "long" description
assuming your program is correct, I can guess there is a bug in the way
ddt "long" description is handled, and I will investigate that.

that being said, it is very likely MPI_Type_create_struct invoked with a
high count, will internally generate a long description, so it will always
be suboptimal compared to MPI_Type_create_subarray, or other subroutine
that can be used because of the "regular shape" of your ddt.

Cheers,

Gilles

On Saturday, April 30, 2016, Palmer, Bruce J <bruce.pal...@pnnl.gov> wrote:

> I’ve been trying to recreate the semantics of the Global Array gather and
> scatter operations using MPI RMA routines and I’ve run into some issues
> with MPI Datatypes. I’ve been focusing on building MPI versions of the GA
> gather and scatter calls, which I’ve been trying to implement using MPI
> data types built with the MPI_Type_create_struct call. I’ve developed a
> test program that simulates copying data into and out of a 1D distributed
> array of size NSIZE. Each processor contains a segment of approximately
> size NSIZE/nproc and is responsible for assigning every nprocth value in
> the array starting with the value indexed by the rank of the array. After
> assigning values and synchronizing the distributed data structure, each
> processor then reads the values set by the processor of next higher rank
> (the process with rank nproc-1 reads the values set by process 0).
>
>
>
> The distributed array is represented by and MPI window and created using a
> standard MPI_Win_create call. The values in the array are set and read
> using MPI RMA operations, either MPI_Get/MPI_Put or MPI_Rget/MPI_Rput.
> Three different protocols have been used. The first is to call MPI_Win_lock
> and create a shared lock on the remote processor, then call MPI_Put/MPI_Get
> and then call MPI_Win_unlock to clear the lock. The second protocol is to
> use MPI request-based calls. After the call to MPI_Win_create,
> MPI_Win_lock_all is called to start a passive synchronization epoch on the
> window. Data is written and read to the distributed array using
> MPI_Rput/MPI_Rget immediately followed by a call to MPI_Wait, using the
> handle returned by the MPI_Rput/MPI_Rget call. The third protocol also
> immediately creates a passive synchronization epoch after window creation,
> but uses calls to MPI_Put/MPI_Get immediately followed by a call to
> MPI_Win_flush_local. These three protocols seem to cover all the
> possibilities that I have seen in other MPI/RMA based implementations of
> ARMCI/GA.
>
>
>
> The issue that I’ve run into is that these tests seem to work reliably if
> I build the data type using the MPI_Type_create_subbarray function but fail
> for larger arrays (NSIZE ~ 10000) when I use MPI_Type_create_struct.
> Because the values being set by each processor are evenly spaced, I can use
> either function in this case (this is not generally true in applications).
> The struct data type hangs on 2 processors using lock/unlock, crashes for
> the request-based protocol and does not get the correct values in the Get
> phase of the data transfer when using flush_local. These tests are done on
> a Linux cluster using an Infiniband interconnect and the value of NSIZE is
> 10000. For comparison, the same test using MPI_Type_create_subarray seems
> to function reliably for all three protocols for NSIZE=1000000 using 1,2,8
> processors on 1 and 2 SMP nodes.
>
>
>
> I’ve attached the test program for these test cases. Does anyone have a
> suggestion about what might be going on here?
>
>
>
> Bruce
>

Reply via email to