The matrix only requires 175MB (upper triangular part). Not sure where the other extra memory comes from for np > 1.
On Tue, Mar 22, 2022 at 1:21 PM Matthew Knepley <[email protected]> wrote: > On Tue, Mar 22, 2022 at 4:16 PM Sam Guo <[email protected]> wrote: > >> Here is one memory comparison (memory in MB) >> np=1np=2np=4np=8np=16 >> shell 1614 1720 1874 1673 1248 >> PETSc(using full matrix) 2108 2260 2364 2215 1734 >> PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the >> total water mark memory added. >> > > You should be able to directly read off the memory in the PETSc Mat > structures (it is at the end of the log). > With a tool like massif you could also directly measure the MUMPS memory. > > Thanks, > > Matt > > >> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith <[email protected]> wrote: >> >>> >>> Sam, >>> >>> MUMPS is a direct solver, as such, it requires much more memory than >>> the original matrix (stored as a PETSc matrix) to store the factored >>> matrix. The savings you will get by not having a PETSc copy of the matrix >>> and a MUMPS copy of the matrix at the same time is unlikely to be >>> significant. Do you have memory footprint measurements indicating that not >>> having the PETSc copy of the matrix in memory will allow you to run >>> measurably larger simulations? >>> >>> Barry >>> >>> >>> >>> >>> On Mar 22, 2022, at 3:58 PM, Sam Guo <[email protected]> wrote: >>> >>> The reason I want to use shell matrix is to reduce memory footprint. If >>> I create a PETSc matrix and use MUMPS, I understand PETSc will create >>> another copy of the matrix for MUMPS. Is there any way to avoid the extra >>> copy of MUMPS? >>> >>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo <[email protected]> wrote: >>> >>>> Barry, >>>> Thanks for the illustration. Is there an easy way to mimic the >>>> implementation using shell matrix? I have been studying how the sMvctx is >>>> created and it seems pretty involved. >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> >>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo <[email protected]> wrote: >>>>> >>>>> Barry, >>>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>>> multiplication for a symmetric matrix using shell matrix. >>>>> >>>>> >>>>> Consider with three ranks >>>>> >>>>> (a) = ( A B D) (x) >>>>> (b) ( B' C E) (y) >>>>> (c) ( D' E' F) (w) >>>>> >>>>> Only the ones without the ' are stored on the rank. So for >>>>> example B is stored on rank 0. >>>>> >>>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>>> >>>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>>> these values as well as the values of x into slvec0 >>>>> >>>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>>> values into slvec0 >>>>> >>>>> Rank 2 puts the values of we needed by the other ranks into >>>>> slvec0 >>>>> >>>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>>> from slvec1 and adds it to a >>>>> >>>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>>> >>>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> Sam >>>>> >>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith <[email protected]> wrote: >>>>> >>>>>> >>>>>> The "trick" is that though "more" communication is needed to >>>>>> complete the product the communication can still be done in a single >>>>>> VecScatter instead of two separate calls to VecScatter. We simply pack >>>>>> both >>>>>> pieces of information that needs to be sent into a single vector. >>>>>> >>>>>> /* copy x into the vec slvec0 */1111: VecGetArray >>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112: >>>>>> VecGetArrayRead >>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x); >>>>>> 1114: PetscArraycpy >>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115: >>>>>> VecRestoreArray >>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116: >>>>>> VecRestoreArrayRead >>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x); >>>>>> 1118: VecScatterBegin >>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>>> >>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>>> >>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119: >>>>>> VecScatterEnd >>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>>> >>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>>> >>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); >>>>>> >>>>>> If you create two symmetric matrices, one with SBAIJ and one with >>>>>> BAIJ and compare the time to do the product you will find that the SBAIJ >>>>>> is >>>>>> not significantly slower but does save memory. >>>>>> >>>>>> >>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo <[email protected]> wrote: >>>>>> >>>>>> Using following example from the MatCreateSBAIJ documentation >>>>>> >>>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>>> -------------------------- >>>>>> row 3 |. . . d d d o o o o o o >>>>>> row 4 |. . . d d d o o o o o o >>>>>> row 5 |. . . d d d o o o o o o >>>>>> -------------------------- >>>>>> >>>>>> >>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still >>>>>> needed. Is is that the processor that owns rows 0-2 will apply B >>>>>> symmetrical and then send the result >>>>>> >>>>>> to the processor that owns 3-5? >>>>>> >>>>>> >>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams <[email protected]> wrote: >>>>>> >>>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>>> Then the off processor values are collected, and the upper >>>>>>> off-diagonal is applied. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> I am most interested in how the lower triangular part is >>>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>>> communication than BAIJ. >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>>> >>>>>>>>> 1094: *PetscErrorCode >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> >>>>>>>>> MatMult_MPISBAIJ(Mat >>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/Mat.html#Mat> A,Vec >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> xx,Vec >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> yy)*1095: >>>>>>>>> {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: >>>>>>>>> PetscErrorCode >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> >>>>>>>>> ierr;1098: PetscInt >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscInt.html#PetscInt> >>>>>>>>> mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> >>>>>>>>> *from;1100: const PetscScalar >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> >>>>>>>>> *x; >>>>>>>>> 1103: /* diagonal part */1104: >>>>>>>>> (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecSet.html#VecSet>(a->slvec1b,0.0); >>>>>>>>> 1107: /* subdiagonal part */1108: >>>>>>>>> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112: >>>>>>>>> VecGetArrayRead >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x); >>>>>>>>> 1114: PetscArraycpy >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115: >>>>>>>>> VecRestoreArray >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116: >>>>>>>>> VecRestoreArrayRead >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x); >>>>>>>>> 1118: VecScatterBegin >>>>>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>>>>>> >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>>>>>> >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119: >>>>>>>>> VecScatterEnd >>>>>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>>>>>> >>>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>>>>>> >>>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1120: >>>>>>>>> /* supperdiagonal part */1121: >>>>>>>>> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: >>>>>>>>> return(0);1123: } >>>>>>>>> >>>>>>>>> I try to understand the algorithm. >>>>>>>>> >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Sam >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> This code looks fine to me and the code is >>>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>>> >>>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Dear PETSc dev team, >>>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatCreate.html#MatCreate> >>>>>>>>>>> (), MatSetType >>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetType.html#MatSetType>() >>>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetFromOptions.html#MatSetFromOptions>(), >>>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html#MatSeqAIJSetPreallocation> >>>>>>>>>>> ]" >>>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>>> >>>>>>>>>>> Two questions: >>>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>>> most efficient. >>>>>>>>>>> >>>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Sam >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>>> >>> > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > <http://www.cse.buffalo.edu/~knepley/> >
