I am using SLEPc to solve a generalized eigenvalue problem. On Tue, Mar 22, 2022 at 1:16 PM Sam Guo <[email protected]> wrote:
> Here is one memory comparison (memory in MB) > np=1np=2np=4np=8np=16 > shell 1614 1720 1874 1673 1248 > PETSc(using full matrix) 2108 2260 2364 2215 1734 > PETSc(using symmetric matrix) 1750 2100 2189 2094 1727Those are the total > water mark memory added. > > On Tue, Mar 22, 2022 at 1:10 PM Barry Smith <[email protected]> wrote: > >> >> Sam, >> >> MUMPS is a direct solver, as such, it requires much more memory than >> the original matrix (stored as a PETSc matrix) to store the factored >> matrix. The savings you will get by not having a PETSc copy of the matrix >> and a MUMPS copy of the matrix at the same time is unlikely to be >> significant. Do you have memory footprint measurements indicating that not >> having the PETSc copy of the matrix in memory will allow you to run >> measurably larger simulations? >> >> Barry >> >> >> >> >> On Mar 22, 2022, at 3:58 PM, Sam Guo <[email protected]> wrote: >> >> The reason I want to use shell matrix is to reduce memory footprint. If I >> create a PETSc matrix and use MUMPS, I understand PETSc will create another >> copy of the matrix for MUMPS. Is there any way to avoid the extra copy of >> MUMPS? >> >> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo <[email protected]> wrote: >> >>> Barry, >>> Thanks for the illustration. Is there an easy way to mimic the >>> implementation using shell matrix? I have been studying how the sMvctx is >>> created and it seems pretty involved. >>> >>> Thanks, >>> Sam >>> >>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith <[email protected]> wrote: >>> >>>> >>>> >>>> On Mar 21, 2022, at 4:36 PM, Sam Guo <[email protected]> wrote: >>>> >>>> Barry, >>>> Thanks. Could you elaborate? I try to implement the matrix-vector >>>> multiplication for a symmetric matrix using shell matrix. >>>> >>>> >>>> Consider with three ranks >>>> >>>> (a) = ( A B D) (x) >>>> (b) ( B' C E) (y) >>>> (c) ( D' E' F) (w) >>>> >>>> Only the ones without the ' are stored on the rank. So for >>>> example B is stored on rank 0. >>>> >>>> Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and >>>> keeps it in b Rank 2 computes Fw and keeps it in c >>>> >>>> Rank 0 computes B'x and D'x. It puts the nonzero entries of >>>> these values as well as the values of x into slvec0 >>>> >>>> Rank 1 computes E'y and puts the nonzero entries as well as the >>>> values into slvec0 >>>> >>>> Rank 2 puts the values of we needed by the other ranks into >>>> slvec0 >>>> >>>> Rank 0 does B y_h + D z_h where it gets the y_h and z_h values >>>> from slvec1 and adds it to a >>>> >>>> Rank 1 takes the B'x from slvec1 and adds it to b it then takes >>>> the E y_h values where the y_h are pulled from slvec1 and adds them b >>>> >>>> Rank 2 takes the B'x and E'y from slvec0 and adds them to c. >>>> >>>> >>>> >>>> Thanks, >>>> Sam >>>> >>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith <[email protected]> wrote: >>>> >>>>> >>>>> The "trick" is that though "more" communication is needed to >>>>> complete the product the communication can still be done in a single >>>>> VecScatter instead of two separate calls to VecScatter. We simply pack >>>>> both >>>>> pieces of information that needs to be sent into a single vector. >>>>> >>>>> /* copy x into the vec slvec0 */1111: VecGetArray >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112: >>>>> VecGetArrayRead >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x); >>>>> 1114: PetscArraycpy >>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115: >>>>> VecRestoreArray >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116: >>>>> VecRestoreArrayRead >>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x); >>>>> 1118: VecScatterBegin >>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119: >>>>> VecScatterEnd >>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>> >>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>); >>>>> >>>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ >>>>> and compare the time to do the product you will find that the SBAIJ is not >>>>> significantly slower but does save memory. >>>>> >>>>> >>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo <[email protected]> wrote: >>>>> >>>>> Using following example from the MatCreateSBAIJ documentation >>>>> >>>>> 0 1 2 3 4 5 6 7 8 9 10 11 >>>>> -------------------------- >>>>> row 3 |. . . d d d o o o o o o >>>>> row 4 |. . . d d d o o o o o o >>>>> row 5 |. . . d d d o o o o o o >>>>> -------------------------- >>>>> >>>>> >>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. >>>>> Is is that the processor that owns rows 0-2 will apply B symmetrical and >>>>> then send the result >>>>> >>>>> to the processor that owns 3-5? >>>>> >>>>> >>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams <[email protected]> wrote: >>>>> >>>>>> PETSc stores parallel matrices as two serial matrices. One for the >>>>>> diagonal (d or A) block and one for the rest (o or B). >>>>>> I would guess that for symmetric matrices it has a symmetric matrix >>>>>> for the diagonal and a full AIJ matrix for the (upper) off-diagonal. >>>>>> So the multtranspose is applying B symmetrically. This lower >>>>>> off-diagonal and the diagonal block can be done without communication. >>>>>> Then the off processor values are collected, and the upper >>>>>> off-diagonal is applied. >>>>>> >>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> I am most interested in how the lower triangular part is >>>>>>> redistributed. It seems that SBAJI saves memory but requires more >>>>>>> communication than BAIJ. >>>>>>> >>>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Mark, thanks for the quick response. I am more interested in >>>>>>>> parallel implementation of MatMult for SBAIJ. I found following >>>>>>>> >>>>>>>> 1094: *PetscErrorCode >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> >>>>>>>> MatMult_MPISBAIJ(Mat >>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/Mat.html#Mat> A,Vec >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> xx,Vec >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> yy)*1095: >>>>>>>> {1096: Mat_MPISBAIJ *a = (Mat_MPISBAIJ*)A->data;1097: >>>>>>>> PetscErrorCode >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode> >>>>>>>> ierr;1098: PetscInt >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscInt.html#PetscInt> >>>>>>>> mbs=a->mbs,bs=A->rmap->bs;1099: PetscScalar >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> >>>>>>>> *from;1100: const PetscScalar >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar> >>>>>>>> *x; >>>>>>>> 1103: /* diagonal part */1104: >>>>>>>> (*a->A->ops->mult)(a->A,xx,a->slvec1a);1105: VecSet >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecSet.html#VecSet>(a->slvec1b,0.0); >>>>>>>> 1107: /* subdiagonal part */1108: >>>>>>>> (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b); >>>>>>>> 1110: /* copy x into the vec slvec0 */1111: VecGetArray >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);1112: >>>>>>>> VecGetArrayRead >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x); >>>>>>>> 1114: PetscArraycpy >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);1115: >>>>>>>> VecRestoreArray >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);1116: >>>>>>>> VecRestoreArrayRead >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x); >>>>>>>> 1118: VecScatterBegin >>>>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>>>>> >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>>>>> >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1119: >>>>>>>> VecScatterEnd >>>>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES >>>>>>>> >>>>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD >>>>>>>> >>>>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);1120: >>>>>>>> /* supperdiagonal part */1121: >>>>>>>> (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);1122: >>>>>>>> return(0);1123: } >>>>>>>> >>>>>>>> I try to understand the algorithm. >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Sam >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> This code looks fine to me and the code is >>>>>>>>> in src/mat/impls/sbaij/seq/sbaij2.c >>>>>>>>> >>>>>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo <[email protected]> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Dear PETSc dev team, >>>>>>>>>> The documentation about MatCreateSBAIJ has following >>>>>>>>>> "It is recommended that one use the MatCreate >>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatCreate.html#MatCreate> >>>>>>>>>> (), MatSetType >>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetType.html#MatSetType>() >>>>>>>>>> and/or MatSetFromOptions >>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetFromOptions.html#MatSetFromOptions>(), >>>>>>>>>> MatXXXXSetPreallocation() paradigm instead of this routine directly. >>>>>>>>>> [MatXXXXSetPreallocation() is, for example, >>>>>>>>>> MatSeqAIJSetPreallocation >>>>>>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html#MatSeqAIJSetPreallocation> >>>>>>>>>> ]" >>>>>>>>>> I currently call MatCreateSBAIJ directly as follows: >>>>>>>>>> MatCreateSBAIJ (with d_nnz and o_nnz) >>>>>>>>>> MatSetValues (to add row by row) >>>>>>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY); >>>>>>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE); >>>>>>>>>> >>>>>>>>>> Two questions: >>>>>>>>>> (1) I am wondering whether what I am doing is the >>>>>>>>>> most efficient. >>>>>>>>>> >>>>>>>>>> (2) I try to find out how the matrix vector multiplication is >>>>>>>>>> implemented in PETSc for SBAIJ storage. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Sam >>>>>>>>>> >>>>>>>>> >>>>> >>>> >>
