Ok,
this work is still part of my Schur complement approach using the full schur but with a block diagonal A00^-1. I implemented the computation of A00^-1 by extracting each diagonal block and inverting them individually. This works quite well and does not cost some much, especially since I can still use threads to accelerate this process (I might send a question about this in the future...).

At the moment the most expensive part of the procedure is inverting S (I'm using LU at the moment to make sure that everything is implemented correctly) and the second most expensive procedure is MatMatMult. I'm doing two of these: A10 * A00^-1 and then a right multiplication by A01. Decreasing that cost would be nice (I attached the output of -log_summary for reference).
I think I need to look for the objects that are not Destroyed too.

Finally I now would like to split the Schur complement into two submatrices. I have an IS that tracks location of these sub-matrices in the global system:

       [ A00  A01  A02 ]  --> IS(0)
A = [ A10  A11  A12 ]  --> IS(1)
       [ A20  A21  A22 ]  --> IS(2)

How can I use IS(1) and IS(2) to track:

S = [ A11 A12 ] _ [ A10] * [A00]^-1 * [ A01 A02 ] = [ S11 S12 ] --> IS(1)' [ A21 A22 ] [ A20] = [ S21 S22 ] --> IS(2)'

or is there a simple way to compute IS(1)' and IS(2)' based on IS(1) and IS(2)?

Thanks!

Best,
Luc

On 03/26/2015 04:12 PM, Matthew Knepley wrote:
On Thu, Mar 26, 2015 at 3:07 PM, Luc Berger-Vergiat <[email protected] <mailto:[email protected]>> wrote:

    Hi all,
    I want to multiply two matrices together, one is MATAIJ and the
    second is MATBAIJ, is there a way to leverage the properties of
    the blocked matrix in the BAIJ format or should I just assemble
    the BAIJ matrix as AIJ?


I am afraid you are currently stuck with the latter.

  Thanks,

    Matt


-- Best,
    Luc





--
What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r -fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: ----------------------------------------------

/home/luc/research/feap_repo/ShearBands/parfeap/feap on a arch-opt named euler with 1 processor, by luc Thu Mar 26 16:37:21 2015
Using Petsc Release Version 3.5.2, Sep, 08, 2014 

                         Max       Max/Min        Avg      Total 
Time (sec):           8.338e+01      1.00000   8.338e+01
Objects:              2.251e+03      1.00000   2.251e+03
Flops:                7.704e+09      1.00000   7.704e+09  7.704e+09
Flops/sec:            9.240e+07      1.00000   9.240e+07  9.240e+07
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N flops
                            and VecAXPY() for complex vectors of length N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     Avg         %Total   counts   %Total 
 0:      Main Stage: 7.0975e+01  85.1%  1.8346e+07   0.2%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 
 1:    Linear solve: 1.2404e+01  14.9%  7.6855e+09  99.8%  0.000e+00   0.0%  0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                             --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecDot                31 1.0 1.6618e-03 1.0 2.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 15  0  0  0  1622
VecNorm              534 1.0 8.6057e-03 1.0 1.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0 85  0  0  0  1819
VecSet               477 1.0 6.4998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin     118 1.0 1.0848e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd       118 1.0 2.5749e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      472 1.0 1.2344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      31 1.0 2.0504e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd        31 1.0 9.4979e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatZeroEntries        31 1.0 2.2029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               31 1.0 1.9977e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0

--- Event Stage 1: Linear solve

VecSet                62 1.0 2.6948e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY               62 1.0 2.9297e-03 1.0 2.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0   920
MatMult              124 1.0 1.9394e-01 1.0 1.38e+08 1.0 0.0e+00 0.0e+00 0.0e+00  0  2  0  0  0   2  2  0  0  0   709
MatSolve              31 1.0 7.2679e-02 1.0 5.81e+07 1.0 0.0e+00 0.0e+00 0.0e+00  0  1  0  0  0   1  1  0  0  0   799
MatLUFactorSym        31 1.0 6.4250e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   5  0  0  0  0     0
MatLUFactorNum        31 1.0 3.8881e+00 1.0 5.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  5 73  0  0  0  31 73  0  0  0  1441
MatAssemblyBegin     310 1.0 1.5259e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd       310 1.0 3.4911e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
MatGetValues      251100 1.0 7.5529e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   1  0  0  0  0     0
MatGetRow         184512 1.0 1.4527e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ           31 1.0 6.7954e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice     124 1.0 1.3467e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  2  0  0  0  0  11  0  0  0  0     0
MatGetOrdering        31 1.0 3.2172e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView               62 1.0 6.0837e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAXPY               31 1.0 3.3731e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   3  0  0  0  0     0
MatMatMult            62 1.0 5.2516e+00 1.0 1.89e+09 1.0 0.0e+00 0.0e+00 0.0e+00  6 24  0  0  0  42 25  0  0  0   359
MatMatMultSym         62 1.0 3.0043e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  4  0  0  0  0  24  0  0  0  0     0
MatMatMultNum         62 1.0 2.2463e+00 1.0 1.89e+09 1.0 0.0e+00 0.0e+00 0.0e+00  3 24  0  0  0  18 25  0  0  0   839
KSPSetUp              62 1.0 2.7657e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              31 1.0 1.2404e+01 1.0 7.69e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15100  0  0  0 100100  0  0  0   620
PCSetUp               62 1.0 6.5023e+00 1.0 5.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00  8 73  0  0  0  52 73  0  0  0   862
PCApply               31 1.0 1.0462e+01 1.0 7.69e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13100  0  0  0  84100  0  0  0   735
Invert Jee            31 1.0 5.5238e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00  1  0  0  0  0   4  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

           Index Set   480            476       396992     0
             Section     4              0            0     0
           Container     6              3         1716     0
              Vector   477            476     42452864     0
      Vector Scatter   472            472       303968     0
              Matrix     1              5     40070680     0
    Distributed Mesh     1              0            0     0
Star Forest Bipartite Graph     2              0            0     0
     Discrete System     1              0            0     0
       Krylov Solver    31              1         1160     0
      Preconditioner    31              1         1000     0
              Viewer    32             31        23064     0

--- Event Stage 1: Linear solve

           Index Set   217            214      1135968     0
              Vector   124            122       183488     0
              Matrix   279            123    387141316     0
       Krylov Solver    31             30        34800     0
      Preconditioner    31             30        30000     0
              Viewer    31             31        23064     0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-ksp_type preonly
-log_summary time.log
-pc_shell_type luc_schur
-pc_type shell
-schur_ksp_type preonly
-schur_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-debugging=0 --with-shared-libraries=0 --download-fblaslapack --download-mpich --download-parmetis --download-metis --download-ml=yes --download-hypre --download-superlu_dist --download-mumps --download-scalapack --download-suitesparse
-----------------------------------------
Libraries compiled on Mon Mar  9 10:58:10 2015 on euler 
Machine characteristics: Linux-3.13.0-46-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/luc/research/petsc-3.5.2
Using PETSc arch: arch-opt
-----------------------------------------

Using C compiler: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpicc  -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpif90   -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O  ${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/luc/research/petsc-3.5.2/arch-opt/include -I/home/luc/research/petsc-3.5.2/include -I/home/luc/research/petsc-3.5.2/include -I/home/luc/research/petsc-3.5.2/arch-opt/include
-----------------------------------------

Using C linker: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpicc
Using Fortran linker: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -L/home/luc/research/petsc-3.5.2/arch-opt/lib -lpetsc -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -L/home/luc/research/petsc-3.5.2/arch-opt/lib -lml -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lHYPRE -lmpichcxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist_3.3 -lflapack -lfblas -lparmetis -lmetis -lX11 -lpthread -lssl -lcrypto -lm -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -L/home/luc/research/petsc-3.5.2/arch-opt/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl  
-----------------------------------------

Reply via email to