Ok,
this work is still part of my Schur complement approach using the full
schur but with a block diagonal A00^-1.
I implemented the computation of A00^-1 by extracting each diagonal
block and inverting them individually.
This works quite well and does not cost some much, especially since I
can still use threads to accelerate this process (I might send a
question about this in the future...).
At the moment the most expensive part of the procedure is inverting S
(I'm using LU at the moment to make sure that everything is implemented
correctly) and the second most expensive procedure is MatMatMult. I'm
doing two of these: A10 * A00^-1 and then a right multiplication by A01.
Decreasing that cost would be nice (I attached the output of
-log_summary for reference).
I think I need to look for the objects that are not Destroyed too.
Finally I now would like to split the Schur complement into two
submatrices. I have an IS that tracks location of these sub-matrices in
the global system:
[ A00 A01 A02 ] --> IS(0)
A = [ A10 A11 A12 ] --> IS(1)
[ A20 A21 A22 ] --> IS(2)
How can I use IS(1) and IS(2) to track:
S = [ A11 A12 ] _ [ A10] * [A00]^-1 * [ A01 A02 ] = [ S11 S12 ]
--> IS(1)'
[ A21 A22 ] [
A20] = [ S21 S22 ] --> IS(2)'
or is there a simple way to compute IS(1)' and IS(2)' based on IS(1) and
IS(2)?
Thanks!
Best,
Luc
On 03/26/2015 04:12 PM, Matthew Knepley wrote:
On Thu, Mar 26, 2015 at 3:07 PM, Luc Berger-Vergiat
<[email protected] <mailto:[email protected]>> wrote:
Hi all,
I want to multiply two matrices together, one is MATAIJ and the
second is MATBAIJ, is there a way to leverage the properties of
the blocked matrix in the BAIJ format or should I just assemble
the BAIJ matrix as AIJ?
I am afraid you are currently stuck with the latter.
Thanks,
Matt
--
Best,
Luc
--
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which
their experiments lead.
-- Norbert Wiener
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r -fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary: ----------------------------------------------
/home/luc/research/feap_repo/ShearBands/parfeap/feap on a arch-opt named euler with 1 processor, by luc Thu Mar 26 16:37:21 2015
Using Petsc Release Version 3.5.2, Sep, 08, 2014
Max Max/Min Avg Total
Time (sec): 8.338e+01 1.00000 8.338e+01
Objects: 2.251e+03 1.00000 2.251e+03
Flops: 7.704e+09 1.00000 7.704e+09 7.704e+09
Flops/sec: 9.240e+07 1.00000 9.240e+07 9.240e+07
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type (multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N flops
and VecAXPY() for complex vectors of length N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total Avg %Total counts %Total
0: Main Stage: 7.0975e+01 85.1% 1.8346e+07 0.2% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
1: Linear solve: 1.2404e+01 14.9% 7.6855e+09 99.8% 0.000e+00 0.0% 0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecDot 31 1.0 1.6618e-03 1.0 2.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 15 0 0 0 1622
VecNorm 534 1.0 8.6057e-03 1.0 1.57e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 85 0 0 0 1819
VecSet 477 1.0 6.4998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 118 1.0 1.0848e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 118 1.0 2.5749e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 472 1.0 1.2344e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 31 1.0 2.0504e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 31 1.0 9.4979e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatZeroEntries 31 1.0 2.2029e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 31 1.0 1.9977e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
--- Event Stage 1: Linear solve
VecSet 62 1.0 2.6948e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 62 1.0 2.9297e-03 1.0 2.70e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 920
MatMult 124 1.0 1.9394e-01 1.0 1.38e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 2 2 0 0 0 709
MatSolve 31 1.0 7.2679e-02 1.0 5.81e+07 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 1 1 0 0 0 799
MatLUFactorSym 31 1.0 6.4250e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 5 0 0 0 0 0
MatLUFactorNum 31 1.0 3.8881e+00 1.0 5.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 5 73 0 0 0 31 73 0 0 0 1441
MatAssemblyBegin 310 1.0 1.5259e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 310 1.0 3.4911e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatGetValues 251100 1.0 7.5529e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 1 0 0 0 0 0
MatGetRow 184512 1.0 1.4527e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 31 1.0 6.7954e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 124 1.0 1.3467e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 11 0 0 0 0 0
MatGetOrdering 31 1.0 3.2172e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 62 1.0 6.0837e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAXPY 31 1.0 3.3731e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 3 0 0 0 0 0
MatMatMult 62 1.0 5.2516e+00 1.0 1.89e+09 1.0 0.0e+00 0.0e+00 0.0e+00 6 24 0 0 0 42 25 0 0 0 359
MatMatMultSym 62 1.0 3.0043e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 4 0 0 0 0 24 0 0 0 0 0
MatMatMultNum 62 1.0 2.2463e+00 1.0 1.89e+09 1.0 0.0e+00 0.0e+00 0.0e+00 3 24 0 0 0 18 25 0 0 0 839
KSPSetUp 62 1.0 2.7657e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 31 1.0 1.2404e+01 1.0 7.69e+09 1.0 0.0e+00 0.0e+00 0.0e+00 15100 0 0 0 100100 0 0 0 620
PCSetUp 62 1.0 6.5023e+00 1.0 5.60e+09 1.0 0.0e+00 0.0e+00 0.0e+00 8 73 0 0 0 52 73 0 0 0 862
PCApply 31 1.0 1.0462e+01 1.0 7.69e+09 1.0 0.0e+00 0.0e+00 0.0e+00 13100 0 0 0 84100 0 0 0 735
Invert Jee 31 1.0 5.5238e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 4 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Index Set 480 476 396992 0
Section 4 0 0 0
Container 6 3 1716 0
Vector 477 476 42452864 0
Vector Scatter 472 472 303968 0
Matrix 1 5 40070680 0
Distributed Mesh 1 0 0 0
Star Forest Bipartite Graph 2 0 0 0
Discrete System 1 0 0 0
Krylov Solver 31 1 1160 0
Preconditioner 31 1 1000 0
Viewer 32 31 23064 0
--- Event Stage 1: Linear solve
Index Set 217 214 1135968 0
Vector 124 122 183488 0
Matrix 279 123 387141316 0
Krylov Solver 31 30 34800 0
Preconditioner 31 30 30000 0
Viewer 31 31 23064 0
========================================================================================================================
Average time to get PetscTime(): 4.76837e-08
#PETSc Option Table entries:
-ksp_type preonly
-log_summary time.log
-pc_shell_type luc_schur
-pc_type shell
-schur_ksp_type preonly
-schur_pc_type lu
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-cc=gcc --with-fc=gfortran --with-cxx=g++ --with-debugging=0 --with-shared-libraries=0 --download-fblaslapack --download-mpich --download-parmetis --download-metis --download-ml=yes --download-hypre --download-superlu_dist --download-mumps --download-scalapack --download-suitesparse
-----------------------------------------
Libraries compiled on Mon Mar 9 10:58:10 2015 on euler
Machine characteristics: Linux-3.13.0-46-generic-x86_64-with-Ubuntu-14.04-trusty
Using PETSc directory: /home/luc/research/petsc-3.5.2
Using PETSc arch: arch-opt
-----------------------------------------
Using C compiler: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpicc -Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpif90 -Wall -Wno-unused-variable -ffree-line-length-0 -Wno-unused-dummy-argument -O ${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/luc/research/petsc-3.5.2/arch-opt/include -I/home/luc/research/petsc-3.5.2/include -I/home/luc/research/petsc-3.5.2/include -I/home/luc/research/petsc-3.5.2/arch-opt/include
-----------------------------------------
Using C linker: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpicc
Using Fortran linker: /home/luc/research/petsc-3.5.2/arch-opt/bin/mpif90
Using libraries: -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -L/home/luc/research/petsc-3.5.2/arch-opt/lib -lpetsc -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -L/home/luc/research/petsc-3.5.2/arch-opt/lib -lml -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -lmpichcxx -lstdc++ -lcmumps -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lscalapack -lHYPRE -lmpichcxx -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd -lsuitesparseconfig -lsuperlu_dist_3.3 -lflapack -lfblas -lparmetis -lmetis -lX11 -lpthread -lssl -lcrypto -lm -lmpichf90 -lgfortran -lm -lgfortran -lm -lquadmath -lm -lmpichcxx -lstdc++ -L/home/luc/research/petsc-3.5.2/arch-opt/lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8 -L/usr/lib/x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/home/luc/research/petsc-3.5.2/arch-opt/lib -lmpich -lopa -lmpl -lrt -lpthread -lgcc_s -ldl
-----------------------------------------