Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector
Hi George, thanks, I'm starting to understand this now. Still not quite intuitive that "Type_create_resized" allows me to reset the extent but not the size (just from a naming perspective). The man page is talking about extent, upper and lower bounds, but the upper bound cannot be specified: NAME MPI_Type_create_resized - Returns a new data type with new extent and upper and lower bounds. SYNTAX C Syntax #include int MPI_Type_create_resized(MPI_Datatype oldtype, MPI_Aint lb, MPI_Aint extent, MPI_Datatype *newtype) Jonas On 16-12-2021 22:39, George Bosilca wrote: You are confusing the size and extent of the datatype. The size (aka the physical number of bytes described by the memory layout) would be m*nloc*sizeof(type), while the extent will be related to where you expect the second element of the same type to start. If you do resize, you will incorporate the leading dimension in your pointer computation, and will see the gaps you were reporting. George. On Thu, Dec 16, 2021 at 3:03 PM Jonas Thies via users mailto:users@lists.open-mpi.org>> wrote: Dear Gilles, thanks, the resizing fixes the issue, it seems. It is not really intuitive, though, because the actual extent of the data type is m*nloc*sizeof(int) and I have to make MPI believe that it is nloc*sizeof(int). And indeed, this seems to be not OpenMPI-specific, sorry for that. Best, Jonas MPI_Type_vector (Gilles Gouaillardet) -- Message: 1 Date: Thu, 16 Dec 2021 10:29:27 +0100 From: Jonas Thies <mailto:j.th...@tudelft.nl> To:users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> Subject: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector Message-ID:<64075574-7a58-b194-208f-d455c10c8...@tudelft.nl> <mailto:64075574-7a58-b194-208f-d455c10c8...@tudelft.nl> Content-Type: text/plain; charset="utf-8"; Format="flowed" Dear OpenMPI community, Here's a little puzzle for the Christmas holidays (although I would really appreciate a quick solution!). I'm stuck with the following relatively basic problem: given a local nloc x m matrix X_p in column-major ordering on each MPI process p, perform a single MPI_Gather operation to construct the matrix X_0 X_1 ... X_nproc again, in col-major ordering. My approach is to use MPI_Type_vector to define an stype and an rtype, where stype has stride nloc, and rtype has stride nproc*nloc. The observation is that there is an unexpected displacement of (m-1)*n*p in the result array for the part arriving from process p. The MFE code is attached, and I use OpenMPI 4.0.5 with GCC 11.2 (although other versions and even distributions seem to display the same behavior). Example (nloc=3, nproc=3, m=2, with some additional columns printed for the sake of demonstration): > mpicxx -o matrix_gather matrix_gather.cpp mpirun -np 3 ./matrix_gather v_loc on P0: 3x2 0 9 1 10 2 11 v_loc on P1: 3x2 3 12 4 13 5 14 v_loc on P2: 3x2 6 15 7 16 8 17 v_glob on P0: 9x4 0 9 0 0 1 10 0 0 2 11 0 0 0 3 12 0 0 4 13 0 0 5 14 0 0 0 6 15 0 0 7 16 0 0 8 17 Any ideas? Thanks, Jonas -- *J. Thies* Assistant Professor TU Delft Faculty Electrical Engineering, Mathematics and Computer Science Institute of Applied Mathematics and High Performance Computing Center Mekelweg 4 2628 CD Delft T +31 15 27 *j.th...@tudelft.nl <mailto:j.th...@tudelft.nl>* -- *J. Thies* Assistant Professor TU Delft Faculty Electrical Engineering, Mathematics and Computer Science Institute of Applied Mathematics and High Performance Computing Center Mekelweg 4 2628 CD Delft T +31 15 27 *j.th...@tudelft.nl*
Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector
Dear Gilles, thanks, the resizing fixes the issue, it seems. It is not really intuitive, though, because the actual extent of the data type is m*nloc*sizeof(int) and I have to make MPI believe that it is nloc*sizeof(int). And indeed, this seems to be not OpenMPI-specific, sorry for that. Best, Jonas MPI_Type_vector (Gilles Gouaillardet) -- Message: 1 Date: Thu, 16 Dec 2021 10:29:27 +0100 From: Jonas Thies To: users@lists.open-mpi.org Subject: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector Message-ID: <64075574-7a58-b194-208f-d455c10c8...@tudelft.nl> Content-Type: text/plain; charset="utf-8"; Format="flowed" Dear OpenMPI community, Here's a little puzzle for the Christmas holidays (although I would really appreciate a quick solution!). I'm stuck with the following relatively basic problem: given a local nloc x m matrix X_p in column-major ordering on each MPI process p, perform a single MPI_Gather operation to construct the matrix X_0 X_1 ... X_nproc again, in col-major ordering. My approach is to use MPI_Type_vector to define an stype and an rtype, where stype has stride nloc, and rtype has stride nproc*nloc. The observation is that there is an unexpected displacement of (m-1)*n*p in the result array for the part arriving from process p. The MFE code is attached, and I use OpenMPI 4.0.5 with GCC 11.2 (although other versions and even distributions seem to display the same behavior). Example (nloc=3, nproc=3, m=2, with some additional columns printed for the sake of demonstration): > mpicxx -o matrix_gather matrix_gather.cpp mpirun -np 3 ./matrix_gather v_loc on P0: 3x2 0 9 1 10 2 11 v_loc on P1: 3x2 3 12 4 13 5 14 v_loc on P2: 3x2 6 15 7 16 8 17 v_glob on P0: 9x4 0 9 0 0 1 10 0 0 2 11 0 0 0 3 12 0 0 4 13 0 0 5 14 0 0 0 6 15 0 0 7 16 0 0 8 17 Any ideas? Thanks, Jonas -- *J. Thies* Assistant Professor TU Delft Faculty Electrical Engineering, Mathematics and Computer Science Institute of Applied Mathematics and High Performance Computing Center Mekelweg 4 2628 CD Delft T +31 15 27 *j.th...@tudelft.nl*
[OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector
Dear OpenMPI community, Here's a little puzzle for the Christmas holidays (although I would really appreciate a quick solution!). I'm stuck with the following relatively basic problem: given a local nloc x m matrix X_p in column-major ordering on each MPI process p, perform a single MPI_Gather operation to construct the matrix X_0 X_1 ... X_nproc again, in col-major ordering. My approach is to use MPI_Type_vector to define an stype and an rtype, where stype has stride nloc, and rtype has stride nproc*nloc. The observation is that there is an unexpected displacement of (m-1)*n*p in the result array for the part arriving from process p. The MFE code is attached, and I use OpenMPI 4.0.5 with GCC 11.2 (although other versions and even distributions seem to display the same behavior). Example (nloc=3, nproc=3, m=2, with some additional columns printed for the sake of demonstration): > mpicxx -o matrix_gather matrix_gather.cpp mpirun -np 3 ./matrix_gather v_loc on P0: 3x2 0 9 1 10 2 11 v_loc on P1: 3x2 3 12 4 13 5 14 v_loc on P2: 3x2 6 15 7 16 8 17 v_glob on P0: 9x4 0 9 0 0 1 10 0 0 2 11 0 0 0 3 12 0 0 4 13 0 0 5 14 0 0 0 6 15 0 0 7 16 0 0 8 17 Any ideas? Thanks, Jonas -- *J. Thies* Assistant Professor TU Delft Faculty Electrical Engineering, Mathematics and Computer Science Institute of Applied Mathematics and High Performance Computing Center Mekelweg 4 2628 CD Delft T +31 15 27 *j.th...@tudelft.nl* #include #include #include #include void print(std::string label, int rank, int nloc, int m, int* array) { std::ostringstream oss; oss << label << " on P"<