My mistake, yes rank 0 does not need to put any values of x into slvec0. rank 1 and 2 need to put parts of y and w in because they will be needed by the other ranks.
> On Mar 23, 2022, at 4:51 PM, Sam Guo <[email protected]> wrote: > > Hi Barry, > I try to understand your example. Why does Rank 0 put the values of x > into slvec0? x is needed on rank 0. Other ranks need B'x and D'x. Is it > because we need slvec0 to be the same size as (x,y,z) on all ranks? > > Thanks, > Sam > > > On Tue, Mar 22, 2022 at 2:24 PM Sam Guo <[email protected] > <mailto:[email protected]>> wrote: > Hi Barry, > This is the total memory summed over all the ranks. > Same problem size on different np. > I call MUMPS in parallel with distributed input and centralized rhs. > > Thanks, > Sam > > On Tue, Mar 22, 2022 at 2:11 PM Barry Smith <[email protected] > <mailto:[email protected]>> wrote: > > I don't understand the numbers in the table. > > Is this memory summed over all the ranks or the maximum over the ranks? > > Is this for the same problem size on different np or are you increasing the > problem size with more ranks? > > Are you storing and factoring the matrix on each rank or is the solution > of a single linear system done in parallel? > > > > > > > > >> On Mar 22, 2022, at 4:16 PM, Sam Guo <[email protected] >> <mailto:[email protected]>> wrote: >> >> Here is one memory comparison (memory in MB) >> np=1 np=2 np=4 np=8 np=16 >> shell 1614 1720 1874 1673 1248 >> PETSc(using full matrix) 2108 2260 2364 2215 1734 >> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727 >> Those are the total water mark memory added. >> >> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith <[email protected] >> <mailto:[email protected]>> wrote: >> >> Sam, >> >> MUMPS is a direct solver, as such, it requires much more memory than the >> original matrix (stored as a PETSc matrix) to store the factored matrix. The >> savings you will get by not having a PETSc copy of the matrix and a MUMPS >> copy of the matrix at the same time is unlikely to be significant. Do you >> have memory footprint measurements indicating that not having the PETSc copy >> of the matrix in memory will allow you to run measurably larger simulations? >> >> Barry >> >> >> >> >>> On Mar 22, 2022, at 3:58 PM, Sam Guo <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> The reason I want to use shell matrix is to reduce memory footprint. If I >>> create a PETSc matrix and use MUMPS, I understand PETSc will create another >>> copy of the matrix for MUMPS. Is there any way to avoid the extra copy of >>> MUMPS? >>> >>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo <[email protected] >>> <mailto:[email protected]>> wrote: >>> Barry, >>> Thanks for the illustration. Is there an easy way to mimic the >>> implementation using shell matrix? I have been studying how the sMvctx is >>> created and it seems pretty involved. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> >>>> On Mar 21, 2022, at 4:36 PM, Sam Guo <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Barry, >>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>> multiplication for a symmetric matrix using shell matrix. >>> >>> Consider with three ranks >>> >>> (a) = ( A B D) (x) >>> (b) ( B' C E) (y) >>> (c) ( D' E' F) (w) >>> >>> Only the ones without the ' are stored on the rank. So for example B >>> is stored on rank 0. >>> >>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps >>> it in b Rank 2 computes Fw and keeps it in c >>> >>> Rank 0 computes B'x and D'x. It puts the nonzero entries of these >>> values as well as the values of x into slvec0 >>> >>> Rank 1 computes E'y and puts the nonzero entries as well as the >>> values into slvec0 >>> >>> Rank 2 puts the values of we needed by the other ranks into slvec0 >>> >>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from >>> slvec1 and adds it to a >>> >>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes the >>> E y_h values where the y_h are pulled from slvec1 and adds them b >>> >>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>> >>> >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> The "trick" is that though "more" communication is needed to complete >>>> the product the communication can still be done in a single VecScatter >>>> instead of two separate calls to VecScatter. We simply pack both pieces of >>>> information that needs to be sent into a single vector. >>>> >>>> /* copy x into the vec slvec0 */ >>>> 1111: <> VecGetArray >>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from); >>>> 1112: <> VecGetArrayRead >>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x); >>>> >>>> 1114: <> PetscArraycpy >>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs); >>>> 1115: <> VecRestoreArray >>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from); >>>> 1116: <> VecRestoreArrayRead >>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x); >>>> >>>> 1118: <> VecScatterBegin >>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>> >>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>> >>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); >>>> 1119: <> VecScatterEnd >>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>> >>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>> >>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); >>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ and >>>> compare the time to do the product you will find that the SBAIJ is not >>>> significantly slower but does save memory. >>>> >>>> >>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Using following example from the MatCreateSBAIJ documentation >>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>> -------------------------- >>>>> row 3 |. . . d d d o o o o o o >>>>> row 4 |. . . d d d o o o o o o >>>>> row 5 |. . . d d d o o o o o o >>>>> -------------------------- >>>>> >>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. >>>>> Is is that the processor that owns rows 0-2 will apply B symmetrical and >>>>> then send the result >>>>> to the processor that owns 3-5? >>>>> >>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>> diagonal (d or A) block and one for the rest (o or B). >>>>> I would guess that for symmetric matrices it has a symmetric matrix for >>>>> the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>> So the multtranspose is applying B symmetrically. This lower off-diagonal >>>>> and the diagonal block can be done without communication. >>>>> Then the off processor values are collected, and the upper off-diagonal >>>>> is applied. >>>>> >>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> I am most interested in how the lower triangular part is redistributed. >>>>> It seems that SBAJI saves memory but requires more communication than >>>>> BAIJ. >>>>> >>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Mark, thanks for the quick response. I am more interested in parallel >>>>> implementation of MatMult for SBAIJ. I found following >>>>> 1094: <> <>PetscErrorCode >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> >>>>> MatMult_MPISBAIJ(Mat >>>>> <https://petsc.org/main/docs/manualpages/Mat/Mat.html#Mat> A,Vec >>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> xx,Vec >>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> yy) >>>>> 1095: <>{ >>>>> 1096: <> Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data; >>>>> 1097: <> PetscErrorCode >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> >>>>> ierr; >>>>> 1098: <> PetscInt >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscInt.html#PetscInt> >>>>> mbs=a->mbs,bs=A->rmap->bs; >>>>> 1099: <> PetscScalar >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> >>>>> *from; >>>>> 1100: <> const PetscScalar >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> >>>>> *x; >>>>> >>>>> 1103: <> /* diagonal part */ >>>>> 1104: <> (*a->A->ops->mult)(a->A,xx,a->slvec1a); >>>>> 1105: <> VecSet >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecSet.html#VecSet>(a->slvec1b,0.0); >>>>> >>>>> 1107: <> /* subdiagonal part */ >>>>> 1108: <> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>> >>>>> 1110: <> /* copy x into the vec slvec0 */ >>>>> 1111: <> VecGetArray >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from); >>>>> 1112: <> VecGetArrayRead >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x); >>>>> >>>>> 1114: <> PetscArraycpy >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs); >>>>> 1115: <> VecRestoreArray >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from); >>>>> 1116: <> VecRestoreArrayRead >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x); >>>>> >>>>> 1118: <> VecScatterBegin >>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); >>>>> 1119: <> VecScatterEnd >>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); >>>>> 1120: <> /* supperdiagonal part */ >>>>> 1121: <> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy); >>>>> 1122: <> return(0); >>>>> 1123: <>} >>>>> I try to understand the algorithm. >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> This code looks fine to me and the code is in >>>>> src/mat/impls/sbaij/seq/sbaij2.c >>>>> >>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> Dear PETSc dev team, >>>>> The documentation about MatCreateSBAIJ has following >>>>> "It is recommended that one use the MatCreate >>>>> <https://petsc.org/main/docs/manualpages/Mat/MatCreate.html#MatCreate>(), >>>>> MatSetType >>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetType.html#MatSetType>() >>>>> and/or MatSetFromOptions >>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetFromOptions.html#MatSetFromOptions>(), >>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation >>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html#MatSeqAIJSetPreallocation>]" >>>>> I currently call MatCreateSBAIJ directly as follows: >>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>> MatSetValues (to add row by row) >>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>> >>>>> Two questions: >>>>> (1) I am wondering whether what I am doing is the most efficient. >>>>> >>>>> (2) I try to find out how the matrix vector multiplication is >>>>> implemented in PETSc for SBAIJ storage. >>>>> >>>>> Thanks, >>>>> Sam >>>> >>> >> >
