Re: [petsc-users] Mixed matrices MatMatMult

Luc Berger-Vergiat Thu, 26 Mar 2015 14:08:10 -0700

Ok,

this work is still part of my Schur complement approach using the fullschur but with a block diagonal A00^-1.I implemented the computation of A00^-1 by extracting each diagonalblock and inverting them individually.This works quite well and does not cost some much, especially since Ican still use threads to accelerate this process (I might send aquestion about this in the future...).

At the moment the most expensive part of the procedure is inverting S(I'm using LU at the moment to make sure that everything is implementedcorrectly) and the second most expensive procedure is MatMatMult. I'mdoing two of these: A10 * A00^-1 and then a right multiplication by A01.Decreasing that cost would be nice (I attached the output of-log_summary for reference).

I think I need to look for the objects that are not Destroyed too.

Finally I now would like to split the Schur complement into twosubmatrices. I have an IS that tracks location of these sub-matrices inthe global system:


       [ A00  A01  A02 ]  --> IS(0)
A = [ A10  A11  A12 ]  --> IS(1)
       [ A20  A21  A22 ]  --> IS(2)

How can I use IS(1) and IS(2) to track:

S = [ A11 A12 ] _ [ A10] * [A00]^-1 * [ A01 A02 ] = [ S11 S12 ]--> IS(1)'[ A21 A22 ] [A20] = [ S21 S22 ] --> IS(2)'

or is there a simple way to compute IS(1)' and IS(2)' based on IS(1) andIS(2)?


Thanks!

Best,
Luc

On 03/26/2015 04:12 PM, Matthew Knepley wrote:

On Thu, Mar 26, 2015 at 3:07 PM, Luc Berger-Vergiat<[email protected] <mailto:[email protected]>> wrote:
    Hi all,
    I want to multiply two matrices together, one is MATAIJ and the
    second is MATBAIJ, is there a way to leverage the properties of
    the blocked matrix in the BAIJ format or should I just assemble
    the BAIJ matrix as AIJ?


I am afraid you are currently stuck with the latter.

  Thanks,

    Matt
--Best,
    Luc





--
What most experimenters take for granted before they begin theirexperiments is infinitely more interesting than any results to whichtheir experiments lead.
-- Norbert Wiener

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/luc/research/feap_repo/ShearBands/parfeap/feap on a arch-opt named euler with 1 processor, by luc Thu Mar 26 16:37:21 2015
Using Petsc Release Version 3.5.2, Sep, 08, 2014 

                         Max       Max/Min        Avg      Total 
Time (sec):           8.338e+01      1.00000   8.338e+01
Objects:              2.251e+03      1.00000   2.251e+03
Flops:                7.704e+09      1.00000   7.704e+09  7.704e+09
Flops/sec:            9.240e+07      1.00000   9.240e+07  9.240e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 7.0975e+01  85.1%  1.8346e+07   0.2%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
 1:    Linear solve: 1.2404e+01  14.9%  7.6855e+09  99.8%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                31 1.0 1.6618e-03 1.0 2.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 15  0  0  0  1622
VecNorm              534 1.0 8.6057e-03 1.0 1.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 85  0  0  0  1819
VecSet               477 1.0 6.4998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin     118 1.0 1.0848e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       118 1.0 2.5749e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      472 1.0 1.2344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      31 1.0 2.0504e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        31 1.0 9.4979e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        31 1.0 2.2029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               31 1.0 1.9977e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: Linear solve

VecSet                62 1.0 2.6948e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               62 1.0 2.9297e-03 1.0 2.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   920
MatMult              124 1.0 1.9394e-01 1.0 1.38e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   2  2  0  0  0   709
MatSolve              31 1.0 7.2679e-02 1.0 5.81e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  1  0  0  0   799
MatLUFactorSym        31 1.0 6.4250e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   5  0  0  0  0     0
MatLUFactorNum        31 1.0 3.8881e+00 1.0 5.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5 73  0  0  0  31 73  0  0  0  1441
MatAssemblyBegin     310 1.0 1.5259e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       310 1.0 3.4911e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
MatGetValues      251100 1.0 7.5529e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatGetRow         184512 1.0 1.4527e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ           31 1.0 6.7954e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice     124 1.0 1.3467e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0  11  0  0  0  0     0
MatGetOrdering        31 1.0 3.2172e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               62 1.0 6.0837e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY               31 1.0 3.3731e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
MatMatMult            62 1.0 5.2516e+00 1.0 1.89e+09 1.0 0.0e+00 0.0e+00 0.0e+00  6 24  0  0  0  42 25  0  0  0   359
MatMatMultSym         62 1.0 3.0043e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0  24  0  0  0  0     0
MatMatMultNum         62 1.0 2.2463e+00 1.0 1.89e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3 24  0  0  0  18 25  0  0  0   839
KSPSetUp              62 1.0 2.7657e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              31 1.0 1.2404e+01 1.0 7.69e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15100  0  0  0 100100  0  0  0   620
PCSetUp               62 1.0 6.5023e+00 1.0 5.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  8 73  0  0  0  52 73  0  0  0   862
PCApply               31 1.0 1.0462e+01 1.0 7.69e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13100  0  0  0  84100  0  0  0   735
Invert Jee            31 1.0 5.5238e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Index Set   480            476       396992     0
             Section     4              0            0     0
           Container     6              3         1716     0
              Vector   477            476     42452864     0
      Vector Scatter   472            472       303968     0
              Matrix     1              5     40070680     0
    Distributed Mesh     1              0            0     0
Star Forest Bipartite Graph     2              0            0     0
     Discrete System     1              0            0     0
       Krylov Solver    31              1         1160     0
      Preconditioner    31              1         1000     0
              Viewer    32             31        23064     0

--- Event Stage 1: Linear solve

           Index Set   217            214      1135968     0
              Vector   124            122       183488     0
              Matrix   279            123    387141316     0
       Krylov Solver    31             30        34800     0
      Preconditioner    31             30        30000     0
              Viewer    31             31        23064     0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-ksp_type preonly
-log_summary time.log
-pc_shell_type luc_schur
-pc_type shell
-schur_ksp_type preonly
-schur_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-debugging=0 --with-shared-libraries=0 --download-fblaslapack --download-mpich --download-parmetis --download-metis --download-ml=yes --download-hypre --download-superlu_dist --download-mumps --download-scalapack --download-suitesparse
-----------------------------------------
Libraries compiled on Mon Mar  9 10:58:10 2015 on euler 
Machine characteristics: Linux-3.13.0-46-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/luc/research/petsc-3.5.2
Using PETSc arch: arch-opt
-----------------------------------------

Using C compiler: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpif90   -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/luc/research/petsc-3.5.2/arch-opt/include -I/home/luc/research/petsc-3.5.2/include -I/home/luc/research/petsc-3.5.2/include -I/home/luc/research/petsc-3.5.2/arch-opt/include
-----------------------------------------

Using C linker: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpicc
Using Fortran linker: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -L/home/luc/research/petsc-3.5.2/arch-opt/lib -lpetsc -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -L/home/luc/research/petsc-3.5.2/arch-opt/lib -lml -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lHYPRE -lmpichcxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist_3.3 -lflapack -lfblas -lparmetis -lmetis -lX11 -lpthread -lssl -lcrypto -lm -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -L/home/luc/research/petsc-3.5.2/arch-opt/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl  
-----------------------------------------

Re: [petsc-users] Mixed matrices MatMatMult

Reply via email to