My mistake, yes rank 0 does not need to put any values of x into slvec0. rank 
1 and 2 need to put parts of y and w in because they will be needed by the 
other ranks.



> On Mar 23, 2022, at 4:51 PM, Sam Guo <[email protected]> wrote:
> 
> Hi Barry,
>    I try to understand your example. Why does Rank 0  put the values of x 
> into slvec0? x is needed on rank 0. Other ranks need B'x and D'x. Is it 
> because we need  slvec0 to be the same size as (x,y,z) on all ranks?
> 
> Thanks,
> Sam
> 
> 
> On Tue, Mar 22, 2022 at 2:24 PM Sam Guo <[email protected] 
> <mailto:[email protected]>> wrote:
> Hi Barry,
>    This is the total memory summed over all the ranks.
>    Same problem size on different np.
>    I call MUMPS in parallel with distributed input and centralized rhs.
> 
> Thanks,
> Sam
> 
> On Tue, Mar 22, 2022 at 2:11 PM Barry Smith <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>    I don't understand the numbers in the table.
> 
>   Is this memory summed over all the ranks or the maximum over the ranks?
> 
>   Is this for the same problem size on different np or are you increasing the 
> problem size with more ranks?
> 
>    Are you storing and factoring the matrix on each rank or is the solution 
> of a single linear system done in parallel?
> 
>   
> 
>   
>    
> 
> 
> 
>> On Mar 22, 2022, at 4:16 PM, Sam Guo <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Here is one memory comparison (memory in MB)
>>      np=1    np=2    np=4    np=8    np=16
>> shell        1614    1720    1874    1673    1248
>> PETSc(using full matrix)     2108    2260    2364    2215    1734
>> PETSc(using symmetric matrix)        1750    2100    2189    2094    1727
>> Those are the total water mark memory added. 
>> 
>> On Tue, Mar 22, 2022 at 1:10 PM Barry Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>   Sam,
>> 
>>   MUMPS is a direct solver, as such, it requires much more memory than the 
>> original matrix (stored as a PETSc matrix) to store the factored matrix. The 
>> savings you will get by not having a PETSc copy of the matrix and a MUMPS 
>> copy of the matrix at the same time is unlikely to be significant. Do you 
>> have memory footprint measurements indicating that not having the PETSc copy 
>> of the matrix in memory will allow you to run measurably larger simulations? 
>> 
>>  Barry
>> 
>> 
>>   
>> 
>>> On Mar 22, 2022, at 3:58 PM, Sam Guo <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> The reason I want to use shell matrix is to reduce memory footprint. If I 
>>> create a PETSc matrix and use MUMPS, I understand PETSc will create another 
>>> copy of the matrix for MUMPS. Is there any way to avoid the extra copy of 
>>> MUMPS?
>>> 
>>> On Tue, Mar 22, 2022 at 12:53 PM Sam Guo <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Barry,
>>>    Thanks for the illustration. Is there an easy way to mimic the 
>>> implementation using shell matrix? I have been studying how the sMvctx is 
>>> created and it seems pretty involved.
>>> 
>>> Thanks,
>>> Sam
>>> 
>>> On Mon, Mar 21, 2022 at 2:48 PM Barry Smith <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>>> On Mar 21, 2022, at 4:36 PM, Sam Guo <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> Barry,
>>>>    Thanks. Could you elaborate? I try to implement the matrix-vector 
>>>> multiplication for a symmetric matrix using shell matrix.
>>> 
>>>      Consider with three ranks 
>>> 
>>>    (a)  =   ( A    B  D) (x)
>>>    (b)       ( B'   C  E) (y)
>>>    (c)       ( D'   E'  F) (w)
>>> 
>>>       Only the ones without the ' are stored on the rank. So for example B 
>>> is stored on rank 0.
>>> 
>>>        Rank 0 computes A x and keeps it in a. Rank 1 computes Cy and keeps 
>>> it in b Rank 2 computes Fw and keeps it in c
>>> 
>>>        Rank 0 computes B'x and D'x. It puts the nonzero entries of these 
>>> values as well as the values of x into slvec0
>>> 
>>>        Rank 1 computes E'y and puts the nonzero entries as well as the 
>>> values into slvec0
>>> 
>>>        Rank 2 puts the values of we needed by the other ranks into slvec0
>>> 
>>>        Rank 0 does B y_h + D z_h where it gets the y_h and z_h values from 
>>> slvec1 and adds it to a
>>> 
>>>        Rank 1 takes the B'x from slvec1 and adds it to b it then takes the 
>>> E y_h values where the y_h are pulled from slvec1 and adds them b
>>> 
>>>        Rank 2 takes the B'x and E'y from slvec0 and adds them to c. 
>>> 
>>> 
>>>> 
>>>> Thanks,
>>>> Sam
>>>> 
>>>> On Mon, Mar 21, 2022 at 12:56 PM Barry Smith <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>>   The "trick" is that though "more" communication is needed to complete 
>>>> the product the communication can still be done in a single VecScatter 
>>>> instead of two separate calls to VecScatter. We simply pack both pieces of 
>>>> information that needs to be sent into a single vector. 
>>>> 
>>>>  /* copy x into the vec slvec0 */
>>>> 1111:  <>  VecGetArray 
>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);
>>>> 1112:  <>  VecGetArrayRead 
>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x);
>>>> 
>>>> 1114:  <>  PetscArraycpy 
>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);
>>>> 1115:  <>  VecRestoreArray 
>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);
>>>> 1116:  <>  VecRestoreArrayRead 
>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x);
>>>> 
>>>> 1118:  <>  VecScatterBegin 
>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES
>>>>  
>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD
>>>>  
>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
>>>> 1119:  <>  VecScatterEnd 
>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES
>>>>  
>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD
>>>>  
>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
>>>> If you create two symmetric matrices, one with SBAIJ and one with BAIJ and 
>>>> compare the time to do the product you will find that the SBAIJ is not 
>>>> significantly slower but does save memory.
>>>> 
>>>> 
>>>>> On Mar 21, 2022, at 3:26 PM, Sam Guo <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> Using following example from the  MatCreateSBAIJ documentation
>>>>>           0 1 2 3 4 5 6 7 8 9 10 11
>>>>>           --------------------------
>>>>>    row 3  |. . . d d d o o o o  o  o
>>>>>    row 4  |. . . d d d o o o o  o  o
>>>>>    row 5  |. . . d d d o o o o  o  o
>>>>>           --------------------------
>>>>> 
>>>>> On a processor that owns rows 3, 4 and 5, rows 0-2 info are still needed. 
>>>>> Is is that the processor that owns rows 0-2 will apply B symmetrical and 
>>>>> then send the result 
>>>>> to the processor that owns 3-5?
>>>>> 
>>>>> On Mon, Mar 21, 2022 at 12:14 PM Mark Adams <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> PETSc stores parallel matrices as two serial matrices. One for the 
>>>>> diagonal (d or A) block and one for the rest (o or B).
>>>>> I would guess that for symmetric matrices it has a symmetric matrix for 
>>>>> the diagonal and a full AIJ matrix for the (upper) off-diagonal.
>>>>> So the multtranspose is applying B symmetrically. This lower off-diagonal 
>>>>> and the diagonal block can be done without communication.
>>>>> Then the off processor values are collected, and the upper off-diagonal 
>>>>> is applied.
>>>>> 
>>>>> On Mon, Mar 21, 2022 at 2:35 PM Sam Guo <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> I am most interested in how the lower triangular part is redistributed. 
>>>>> It seems that SBAJI saves memory but requires more communication than 
>>>>> BAIJ.
>>>>> 
>>>>> On Mon, Mar 21, 2022 at 11:27 AM Sam Guo <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Mark, thanks for the quick response. I am more interested in parallel 
>>>>> implementation of MatMult for SBAIJ. I found following
>>>>> 1094:  <> <>PetscErrorCode 
>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode>
>>>>>  MatMult_MPISBAIJ(Mat 
>>>>> <https://petsc.org/main/docs/manualpages/Mat/Mat.html#Mat> A,Vec 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> xx,Vec 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/Vec.html#Vec> yy)
>>>>> 1095:  <>{
>>>>> 1096:  <>  Mat_MPISBAIJ      *a = (Mat_MPISBAIJ*)A->data;
>>>>> 1097:  <>  PetscErrorCode 
>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscErrorCode.html#PetscErrorCode>
>>>>>     ierr;
>>>>> 1098:  <>  PetscInt 
>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscInt.html#PetscInt>      
>>>>>     mbs=a->mbs,bs=A->rmap->bs;
>>>>> 1099:  <>  PetscScalar 
>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar>
>>>>>        *from;
>>>>> 1100:  <>  const PetscScalar 
>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscScalar.html#PetscScalar>
>>>>>  *x;
>>>>> 
>>>>> 1103:  <>  /* diagonal part */
>>>>> 1104:  <>  (*a->A->ops->mult)(a->A,xx,a->slvec1a);
>>>>> 1105:  <>  VecSet 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecSet.html#VecSet>(a->slvec1b,0.0);
>>>>> 
>>>>> 1107:  <>  /* subdiagonal part */
>>>>> 1108:  <>  (*a->B->ops->multtranspose)(a->B,xx,a->slvec0b);
>>>>> 
>>>>> 1110:  <>  /* copy x into the vec slvec0 */
>>>>> 1111:  <>  VecGetArray 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArray.html#VecGetArray>(a->slvec0,&from);
>>>>> 1112:  <>  VecGetArrayRead 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecGetArrayRead.html#VecGetArrayRead>(xx,&x);
>>>>> 
>>>>> 1114:  <>  PetscArraycpy 
>>>>> <https://petsc.org/main/docs/manualpages/Sys/PetscArraycpy.html#PetscArraycpy>(from,x,bs*mbs);
>>>>> 1115:  <>  VecRestoreArray 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArray.html#VecRestoreArray>(a->slvec0,&from);
>>>>> 1116:  <>  VecRestoreArrayRead 
>>>>> <https://petsc.org/main/docs/manualpages/Vec/VecRestoreArrayRead.html#VecRestoreArrayRead>(xx,&x);
>>>>> 
>>>>> 1118:  <>  VecScatterBegin 
>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterBegin.html#VecScatterBegin>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES
>>>>>  
>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD
>>>>>  
>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
>>>>> 1119:  <>  VecScatterEnd 
>>>>> <https://petsc.org/main/docs/manualpages/PetscSF/VecScatterEnd.html#VecScatterEnd>(a->sMvctx,a->slvec0,a->slvec1,ADD_VALUES
>>>>>  
>>>>> <https://petsc.org/main/docs/manualpages/Sys/ADD_VALUES.html#ADD_VALUES>,SCATTER_FORWARD
>>>>>  
>>>>> <https://petsc.org/main/docs/manualpages/Vec/SCATTER_FORWARD.html#SCATTER_FORWARD>);
>>>>> 1120:  <>  /* supperdiagonal part */
>>>>> 1121:  <>  (*a->B->ops->multadd)(a->B,a->slvec1b,a->slvec1a,yy);
>>>>> 1122:  <>  return(0);
>>>>> 1123:  <>}
>>>>>   I try to understand the algorithm.
>>>>> 
>>>>> Thanks,
>>>>> Sam
>>>>> 
>>>>> On Mon, Mar 21, 2022 at 11:14 AM Mark Adams <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> This code looks fine to me and the code is in 
>>>>> src/mat/impls/sbaij/seq/sbaij2.c
>>>>> 
>>>>> On Mon, Mar 21, 2022 at 2:02 PM Sam Guo <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> Dear PETSc dev team,
>>>>>    The documentation about MatCreateSBAIJ has following
>>>>> "It is recommended that one use the MatCreate 
>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatCreate.html#MatCreate>(), 
>>>>> MatSetType 
>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetType.html#MatSetType>()
>>>>>  and/or MatSetFromOptions 
>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSetFromOptions.html#MatSetFromOptions>(),
>>>>>  MatXXXXSetPreallocation() paradigm instead of this routine directly. 
>>>>> [MatXXXXSetPreallocation() is, for example, MatSeqAIJSetPreallocation 
>>>>> <https://petsc.org/main/docs/manualpages/Mat/MatSeqAIJSetPreallocation.html#MatSeqAIJSetPreallocation>]"
>>>>>    I currently call MatCreateSBAIJ directly as follows:
>>>>> MatCreateSBAIJ (with d_nnz and o_nnz)
>>>>> MatSetValues (to add row by row)
>>>>> MatAssemblyBegin(A,MAT_FINAL_ASSEMBLY);
>>>>> MatAssemblyEnd(A,MAT_FINAL_ASSEMBLY);
>>>>> MatSetOption(A, MAT_SYMMETRIC, PETSC_TRUE);
>>>>> 
>>>>>    Two questions:
>>>>>    (1) I am wondering whether what I am doing is the most efficient. 
>>>>> 
>>>>>    (2) I try to find out how the matrix vector multiplication is 
>>>>> implemented in PETSc for SBAIJ storage.
>>>>> 
>>>>> Thanks,
>>>>> Sam
>>>> 
>>> 
>> 
> 

Reply via email to