Jonas, In case I misunderstood your question and you want to print
v_glob on P0: 9x2 0 9 1 10 2 11 3 12 4 13 5 14 6 15 7 16 8 17 then you have to fix the print invocation // note: print an additional column to show the displacement error we get: if (!rank) print("v_glob", rank, n, m, v_glob); and also resize rtype so the second element starts at v_glob[3][0] => upper bound = (3*sizeof(int)) By the way, since this question is not Open MPI specific, sites such as Stack Overflow are a better fit. Cheers, Gilles On Thu, Dec 16, 2021 at 6:46 PM Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > Jonas, > > Assuming v_glob is what you expect, you will need to > `MPI_Type_create_resized_type()` the received type so the block received > from process 1 will be placed at the right position (v_glob[3][1] => upper > bound = ((4*3+1) * sizeof(int)) > > Cheers, > > Gilles > > On Thu, Dec 16, 2021 at 6:33 PM Jonas Thies via users < > users@lists.open-mpi.org> wrote: > >> Dear OpenMPI community, >> >> Here's a little puzzle for the Christmas holidays (although I would >> really appreciate a quick solution!). >> >> I'm stuck with the following relatively basic problem: given a local nloc >> x m matrix X_p in column-major ordering on each MPI process p, perform a >> single MPI_Gather operation to construct the matrix >> X_0 >> X_1 >> ... >> >> X_nproc >> >> again, in col-major ordering. My approach is to use MPI_Type_vector to >> define an stype and an rtype, where stype has stride nloc, and rtype has >> stride nproc*nloc. The observation is that there is an unexpected >> displacement of (m-1)*n*p in the result array for the part arriving from >> process p. >> >> The MFE code is attached, and I use OpenMPI 4.0.5 with GCC 11.2 (although >> other versions and even distributions seem to display the same behavior). >> Example (nloc=3, nproc=3, m=2, with some additional columns printed for the >> sake of demonstration): >> >> >> > mpicxx -o matrix_gather matrix_gather.cpp >> mpirun -np 3 ./matrix_gather >> >> v_loc on P0: 3x2 >> 0 9 >> 1 10 >> 2 11 >> >> v_loc on P1: 3x2 >> 3 12 >> 4 13 >> 5 14 >> >> v_loc on P2: 3x2 >> 6 15 >> 7 16 >> 8 17 >> >> v_glob on P0: 9x4 >> 0 9 0 0 >> 1 10 0 0 >> 2 11 0 0 >> 0 3 12 0 >> 0 4 13 0 >> 0 5 14 0 >> 0 0 6 15 >> 0 0 7 16 >> 0 0 8 17 >> >> Any ideas? >> >> Thanks, >> >> Jonas >> >> >> -- >> *J. Thies* >> Assistant Professor >> >> TU Delft >> Faculty Electrical Engineering, Mathematics and Computer Science >> Institute of Applied Mathematics and High Performance Computing Center >> Mekelweg 4 >> 2628 CD Delft >> >> T +31 15 27 XXXX >> *j.th...@tudelft.nl <j.th...@tudelft.nl>* >> >