Re: [petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur

Barry Smith Wed, 11 Jan 2017 19:38:12 -0800

> On Jan 11, 2017, at 9:21 PM, Matthew Knepley <knep...@gmail.com> wrote:
> 
> On Wed, Jan 11, 2017 at 8:31 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> 
>    Thanks, this is very useful information. It means that
> 
> 1) the approximate Sp is actually a very good approximation to the true Schur 
> complement S, since using Sp^-1 to precondition S gives iteration counts from 
> 8 to 13.
> 
> 2)  using ilu(0) as a preconditioner for Sp is not good, since replacing 
> Sp^-1 with ilu(0) of Sp gives absurd iteration counts. This is actually not 
> super surprising since ilu(0) is generally "not so good" for elasticity.
> 
> So the next step is to try using -fieldsplit_FE_split_ksp_monitor  
> -fieldsplit_FE_split_pc_type gamg
> 
> the one open question is if any options should be passed to the gamg to tell 
> it that the underly problem comes from "elasticity"; that is something about 
> the null space.
> 
>    Mark Adams, since the GAMG is coming from inside another preconditioner it 
> may not be easy for the easy for the user to attach the near null space to 
> that inner matrix. Would it make sense for there to be a GAMG command line 
> option to indicate that it is a 3d elasticity problem so GAMG could set up 
> the near null space for itself? or does that not make sense?
> 
> We could do that if somehow we knew the problem geometry, which is the origin 
> of Mark's PCSetCoordinates() interface.


  Ah, so conveying Mat coordinates down to sub matrices? 

> 
>    Matt
>  
>    Barry
> 
> 
> 
> > On Jan 11, 2017, at 7:47 PM, David Knezevic <david.kneze...@akselos.com> 
> > wrote:
> >
> > I've attached the two log files. Using cholesky for "FE_split" seems to 
> > have helped a lot!
> >
> > David
> >
> >
> > --
> > David J. Knezevic | CTO
> > Akselos | 210 Broadway, #201 | Cambridge, MA | 02139
> >
> > Phone: +1-617-599-4755
> >
> > This e-mail and any attachments may contain confidential material for the 
> > sole use of the intended recipient(s). Any review or distribution by others 
> > is strictly prohibited. If you are not the intended recipient, please 
> > contact the sender and delete all copies.
> >
> > On Wed, Jan 11, 2017 at 8:32 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> >
> >    Can you please run with all the monitoring on? So we can see the 
> > convergence of all the inner solvers
> > -fieldsplit_FE_split_ksp_monitor
> >
> > Then run again with
> >
> > -fieldsplit_FE_split_ksp_monitor  -fieldsplit_FE_split_pc_type cholesky
> >
> >
> > and send both sets of results
> >
> > Barry
> >
> >
> > > On Jan 11, 2017, at 6:32 PM, David Knezevic <david.kneze...@akselos.com> 
> > > wrote:
> > >
> > > On Wed, Jan 11, 2017 at 5:52 PM, Dave May <dave.mayhe...@gmail.com> wrote:
> > > so I gather that I'll have to look into a user-defined approximation to S.
> > >
> > > Where does the 2x2 block system come from?
> > > Maybe someone on the list knows the right approximation to use for S.
> > >
> > > The model is 3D linear elasticity using a finite element discretization. 
> > > I applied substructuring to part of the system to "condense" it, and that 
> > > results in the small A00 block. The A11 block is just standard 3D 
> > > elasticity; no substructuring was applied there. There are constraints to 
> > > connect the degrees of freedom on the interface of the substructured and 
> > > non-substructured regions.
> > >
> > > If anyone has suggestions for a good way to precondition this type of 
> > > system, I'd be most appreciative!
> > >
> > > Thanks,
> > > David
> > >
> > >
> > >
> > > -----------------------------------------
> > >
> > >   0 KSP Residual norm 5.405528187695e+04
> > >   1 KSP Residual norm 2.187814910803e+02
> > >   2 KSP Residual norm 1.019051577515e-01
> > >   3 KSP Residual norm 4.370464012859e-04
> > > KSP Object: 1 MPI processes
> > >   type: cg
> > >   maximum iterations=1000
> > >   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> > >   left preconditioning
> > >   using nonzero initial guess
> > >   using PRECONDITIONED norm type for convergence test
> > > PC Object: 1 MPI processes
> > >   type: fieldsplit
> > >     FieldSplit with Schur preconditioner, factorization FULL
> > >     Preconditioner for the Schur complement formed from Sp, an assembled 
> > > approximation to S, which uses (lumped, if requested) A00's diagonal's 
> > > inverse
> > >     Split info:
> > >     Split number 0 Defined by IS
> > >     Split number 1 Defined by IS
> > >     KSP solver for A00 block
> > >       KSP Object:      (fieldsplit_RB_split_)       1 MPI processes
> > >         type: preonly
> > >         maximum iterations=10000, initial guess is zero
> > >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >         left preconditioning
> > >         using NONE norm type for convergence test
> > >       PC Object:      (fieldsplit_RB_split_)       1 MPI processes
> > >         type: cholesky
> > >           Cholesky: out-of-place factorization
> > >           tolerance for zero pivot 2.22045e-14
> > >           matrix ordering: natural
> > >           factor fill ratio given 0., needed 0.
> > >             Factored matrix follows:
> > >               Mat Object:               1 MPI processes
> > >                 type: seqaij
> > >                 rows=324, cols=324
> > >                 package used to perform factorization: mumps
> > >                 total: nonzeros=3042, allocated nonzeros=3042
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   MUMPS run parameters:
> > >                     SYM (matrix type):                   2
> > >                     PAR (host participation):            1
> > >                     ICNTL(1) (output for error):         6
> > >                     ICNTL(2) (output of diagnostic msg): 0
> > >                     ICNTL(3) (output for global info):   0
> > >                     ICNTL(4) (level of printing):        0
> > >                     ICNTL(5) (input mat struct):         0
> > >                     ICNTL(6) (matrix prescaling):        7
> > >                     ICNTL(7) (sequentia matrix ordering):7
> > >                     ICNTL(8) (scalling strategy):        77
> > >                     ICNTL(10) (max num of refinements):  0
> > >                     ICNTL(11) (error analysis):          0
> > >                     ICNTL(12) (efficiency control):                       
> > >   0
> > >                     ICNTL(13) (efficiency control):                       
> > >   0
> > >                     ICNTL(14) (percentage of estimated workspace 
> > > increase): 20
> > >                     ICNTL(18) (input mat struct):                         
> > >   0
> > >                     ICNTL(19) (Shur complement info):                     
> > >   0
> > >                     ICNTL(20) (rhs sparse pattern):                       
> > >   0
> > >                     ICNTL(21) (solution struct):                          
> > >   0
> > >                     ICNTL(22) (in-core/out-of-core facility):             
> > >   0
> > >                     ICNTL(23) (max size of memory can be allocated 
> > > locally):0
> > >                     ICNTL(24) (detection of null pivot rows):             
> > >   0
> > >                     ICNTL(25) (computation of a null space basis):        
> > >   0
> > >                     ICNTL(26) (Schur options for rhs or solution):        
> > >   0
> > >                     ICNTL(27) (experimental parameter):                   
> > >   -24
> > >                     ICNTL(28) (use parallel or sequential ordering):      
> > >   1
> > >                     ICNTL(29) (parallel ordering):                        
> > >   0
> > >                     ICNTL(30) (user-specified set of entries in inv(A)):  
> > >   0
> > >                     ICNTL(31) (factors is discarded in the solve phase):  
> > >   0
> > >                     ICNTL(33) (compute determinant):                      
> > >   0
> > >                     CNTL(1) (relative pivoting threshold):      0.01
> > >                     CNTL(2) (stopping criterion of refinement): 
> > > 1.49012e-08
> > >                     CNTL(3) (absolute pivoting threshold):      0.
> > >                     CNTL(4) (value of static pivoting):         -1.
> > >                     CNTL(5) (fixation for null pivots):         0.
> > >                     RINFO(1) (local estimated flops for the elimination 
> > > after analysis):
> > >                       [0] 29394.
> > >                     RINFO(2) (local estimated flops for the assembly 
> > > after factorization):
> > >                       [0]  1092.
> > >                     RINFO(3) (local estimated flops for the elimination 
> > > after factorization):
> > >                       [0]  29394.
> > >                     INFO(15) (estimated size of (in MB) MUMPS internal 
> > > data for running numerical factorization):
> > >                     [0] 1
> > >                     INFO(16) (size of (in MB) MUMPS internal data used 
> > > during numerical factorization):
> > >                       [0] 1
> > >                     INFO(23) (num of pivots eliminated on this processor 
> > > after factorization):
> > >                       [0] 324
> > >                     RINFOG(1) (global estimated flops for the elimination 
> > > after analysis): 29394.
> > >                     RINFOG(2) (global estimated flops for the assembly 
> > > after factorization): 1092.
> > >                     RINFOG(3) (global estimated flops for the elimination 
> > > after factorization): 29394.
> > >                     (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): 
> > > (0.,0.)*(2^0)
> > >                     INFOG(3) (estimated real workspace for factors on all 
> > > processors after analysis): 3888
> > >                     INFOG(4) (estimated integer workspace for factors on 
> > > all processors after analysis): 2067
> > >                     INFOG(5) (estimated maximum front size in the 
> > > complete tree): 12
> > >                     INFOG(6) (number of nodes in the complete tree): 53
> > >                     INFOG(7) (ordering option effectively use after 
> > > analysis): 2
> > >                     INFOG(8) (structural symmetry in percent of the 
> > > permuted matrix after analysis): 100
> > >                     INFOG(9) (total real/complex workspace to store the 
> > > matrix factors after factorization): 3888
> > >                     INFOG(10) (total integer space store the matrix 
> > > factors after factorization): 2067
> > >                     INFOG(11) (order of largest frontal matrix after 
> > > factorization): 12
> > >                     INFOG(12) (number of off-diagonal pivots): 0
> > >                     INFOG(13) (number of delayed pivots after 
> > > factorization): 0
> > >                     INFOG(14) (number of memory compress after 
> > > factorization): 0
> > >                     INFOG(15) (number of steps of iterative refinement 
> > > after solution): 0
> > >                     INFOG(16) (estimated size (in MB) of all MUMPS 
> > > internal data for factorization after analysis: value on the most memory 
> > > consuming processor): 1
> > >                     INFOG(17) (estimated size of all MUMPS internal data 
> > > for factorization after analysis: sum over all processors): 1
> > >                     INFOG(18) (size of all MUMPS internal data allocated 
> > > during factorization: value on the most memory consuming processor): 1
> > >                     INFOG(19) (size of all MUMPS internal data allocated 
> > > during factorization: sum over all processors): 1
> > >                     INFOG(20) (estimated number of entries in the 
> > > factors): 3042
> > >                     INFOG(21) (size in MB of memory effectively used 
> > > during factorization - value on the most memory consuming processor): 1
> > >                     INFOG(22) (size in MB of memory effectively used 
> > > during factorization - sum over all processors): 1
> > >                     INFOG(23) (after analysis: value of ICNTL(6) 
> > > effectively used): 5
> > >                     INFOG(24) (after analysis: value of ICNTL(12) 
> > > effectively used): 1
> > >                     INFOG(25) (after factorization: number of pivots 
> > > modified by static pivoting): 0
> > >                     INFOG(28) (after factorization: number of null pivots 
> > > encountered): 0
> > >                     INFOG(29) (after factorization: effective number of 
> > > entries in the factors (sum over all processors)): 3042
> > >                     INFOG(30, 31) (after solution: size in Mbytes of 
> > > memory used during solution phase): 0, 0
> > >                     INFOG(32) (after analysis: type of analysis done): 1
> > >                     INFOG(33) (value used for ICNTL(8)): -2
> > >                     INFOG(34) (exponent of the determinant if determinant 
> > > is requested): 0
> > >         linear system matrix = precond matrix:
> > >         Mat Object:        (fieldsplit_RB_split_)         1 MPI processes
> > >           type: seqaij
> > >           rows=324, cols=324
> > >           total: nonzeros=5760, allocated nonzeros=5760
> > >           total number of mallocs used during MatSetValues calls =0
> > >             using I-node routines: found 108 nodes, limit used is 5
> > >     KSP solver for S = A11 - A10 inv(A00) A01
> > >       KSP Object:      (fieldsplit_FE_split_)       1 MPI processes
> > >         type: cg
> > >         maximum iterations=10000, initial guess is zero
> > >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >         left preconditioning
> > >         using PRECONDITIONED norm type for convergence test
> > >       PC Object:      (fieldsplit_FE_split_)       1 MPI processes
> > >         type: bjacobi
> > >           block Jacobi: number of blocks = 1
> > >           Local solve is same for all blocks, in the following KSP and PC 
> > > objects:
> > >           KSP Object:          (fieldsplit_FE_split_sub_)           1 MPI 
> > > processes
> > >             type: preonly
> > >             maximum iterations=10000, initial guess is zero
> > >             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >             left preconditioning
> > >             using NONE norm type for convergence test
> > >           PC Object:          (fieldsplit_FE_split_sub_)           1 MPI 
> > > processes
> > >             type: ilu
> > >               ILU: out-of-place factorization
> > >               0 levels of fill
> > >               tolerance for zero pivot 2.22045e-14
> > >               matrix ordering: natural
> > >               factor fill ratio given 1., needed 1.
> > >                 Factored matrix follows:
> > >                   Mat Object:                   1 MPI processes
> > >                     type: seqaij
> > >                     rows=28476, cols=28476
> > >                     package used to perform factorization: petsc
> > >                     total: nonzeros=1037052, allocated nonzeros=1037052
> > >                     total number of mallocs used during MatSetValues 
> > > calls =0
> > >                       using I-node routines: found 9489 nodes, limit used 
> > > is 5
> > >             linear system matrix = precond matrix:
> > >             Mat Object:             1 MPI processes
> > >               type: seqaij
> > >               rows=28476, cols=28476
> > >               total: nonzeros=1037052, allocated nonzeros=1037052
> > >               total number of mallocs used during MatSetValues calls =0
> > >                 using I-node routines: found 9489 nodes, limit used is 5
> > >         linear system matrix followed by preconditioner matrix:
> > >         Mat Object:        (fieldsplit_FE_split_)         1 MPI processes
> > >           type: schurcomplement
> > >           rows=28476, cols=28476
> > >             Schur complement A11 - A10 inv(A00) A01
> > >             A11
> > >               Mat Object:              (fieldsplit_FE_split_)             
> > >   1 MPI processes
> > >                 type: seqaij
> > >                 rows=28476, cols=28476
> > >                 total: nonzeros=1017054, allocated nonzeros=1017054
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   using I-node routines: found 9492 nodes, limit used is 5
> > >             A10
> > >               Mat Object:               1 MPI processes
> > >                 type: seqaij
> > >                 rows=28476, cols=324
> > >                 total: nonzeros=936, allocated nonzeros=936
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   using I-node routines: found 5717 nodes, limit used is 5
> > >             KSP of A00
> > >               KSP Object:              (fieldsplit_RB_split_)             
> > >   1 MPI processes
> > >                 type: preonly
> > >                 maximum iterations=10000, initial guess is zero
> > >                 tolerances:  relative=1e-05, absolute=1e-50, 
> > > divergence=10000.
> > >                 left preconditioning
> > >                 using NONE norm type for convergence test
> > >               PC Object:              (fieldsplit_RB_split_)              
> > >  1 MPI processes
> > >                 type: cholesky
> > >                   Cholesky: out-of-place factorization
> > >                   tolerance for zero pivot 2.22045e-14
> > >                   matrix ordering: natural
> > >                   factor fill ratio given 0., needed 0.
> > >                     Factored matrix follows:
> > >                       Mat Object:                       1 MPI processes
> > >                         type: seqaij
> > >                         rows=324, cols=324
> > >                         package used to perform factorization: mumps
> > >                         total: nonzeros=3042, allocated nonzeros=3042
> > >                         total number of mallocs used during MatSetValues 
> > > calls =0
> > >                           MUMPS run parameters:
> > >                             SYM (matrix type):                   2
> > >                             PAR (host participation):            1
> > >                             ICNTL(1) (output for error):         6
> > >                             ICNTL(2) (output of diagnostic msg): 0
> > >                             ICNTL(3) (output for global info):   0
> > >                             ICNTL(4) (level of printing):        0
> > >                             ICNTL(5) (input mat struct):         0
> > >                             ICNTL(6) (matrix prescaling):        7
> > >                             ICNTL(7) (sequentia matrix ordering):7
> > >                             ICNTL(8) (scalling strategy):        77
> > >                             ICNTL(10) (max num of refinements):  0
> > >                             ICNTL(11) (error analysis):          0
> > >                             ICNTL(12) (efficiency control):               
> > >           0
> > >                             ICNTL(13) (efficiency control):               
> > >           0
> > >                             ICNTL(14) (percentage of estimated workspace 
> > > increase): 20
> > >                             ICNTL(18) (input mat struct):                 
> > >           0
> > >                             ICNTL(19) (Shur complement info):             
> > >           0
> > >                             ICNTL(20) (rhs sparse pattern):               
> > >           0
> > >                             ICNTL(21) (solution struct):                  
> > >           0
> > >                             ICNTL(22) (in-core/out-of-core facility):     
> > >           0
> > >                             ICNTL(23) (max size of memory can be 
> > > allocated locally):0
> > >                             ICNTL(24) (detection of null pivot rows):     
> > >           0
> > >                             ICNTL(25) (computation of a null space 
> > > basis):          0
> > >                             ICNTL(26) (Schur options for rhs or 
> > > solution):          0
> > >                             ICNTL(27) (experimental parameter):           
> > >           -24
> > >                             ICNTL(28) (use parallel or sequential 
> > > ordering):        1
> > >                             ICNTL(29) (parallel ordering):                
> > >           0
> > >                             ICNTL(30) (user-specified set of entries in 
> > > inv(A)):    0
> > >                             ICNTL(31) (factors is discarded in the solve 
> > > phase):    0
> > >                             ICNTL(33) (compute determinant):              
> > >           0
> > >                             CNTL(1) (relative pivoting threshold):      
> > > 0.01
> > >                             CNTL(2) (stopping criterion of refinement): 
> > > 1.49012e-08
> > >                             CNTL(3) (absolute pivoting threshold):      0.
> > >                             CNTL(4) (value of static pivoting):         
> > > -1.
> > >                             CNTL(5) (fixation for null pivots):         0.
> > >                             RINFO(1) (local estimated flops for the 
> > > elimination after analysis):
> > >                               [0] 29394.
> > >                             RINFO(2) (local estimated flops for the 
> > > assembly after factorization):
> > >                               [0]  1092.
> > >                             RINFO(3) (local estimated flops for the 
> > > elimination after factorization):
> > >                               [0]  29394.
> > >                             INFO(15) (estimated size of (in MB) MUMPS 
> > > internal data for running numerical factorization):
> > >                             [0] 1
> > >                             INFO(16) (size of (in MB) MUMPS internal data 
> > > used during numerical factorization):
> > >                               [0] 1
> > >                             INFO(23) (num of pivots eliminated on this 
> > > processor after factorization):
> > >                               [0] 324
> > >                             RINFOG(1) (global estimated flops for the 
> > > elimination after analysis): 29394.
> > >                             RINFOG(2) (global estimated flops for the 
> > > assembly after factorization): 1092.
> > >                             RINFOG(3) (global estimated flops for the 
> > > elimination after factorization): 29394.
> > >                             (RINFOG(12) RINFOG(13))*2^INFOG(34) 
> > > (determinant): (0.,0.)*(2^0)
> > >                             INFOG(3) (estimated real workspace for 
> > > factors on all processors after analysis): 3888
> > >                             INFOG(4) (estimated integer workspace for 
> > > factors on all processors after analysis): 2067
> > >                             INFOG(5) (estimated maximum front size in the 
> > > complete tree): 12
> > >                             INFOG(6) (number of nodes in the complete 
> > > tree): 53
> > >                             INFOG(7) (ordering option effectively use 
> > > after analysis): 2
> > >                             INFOG(8) (structural symmetry in percent of 
> > > the permuted matrix after analysis): 100
> > >                             INFOG(9) (total real/complex workspace to 
> > > store the matrix factors after factorization): 3888
> > >                             INFOG(10) (total integer space store the 
> > > matrix factors after factorization): 2067
> > >                             INFOG(11) (order of largest frontal matrix 
> > > after factorization): 12
> > >                             INFOG(12) (number of off-diagonal pivots): 0
> > >                             INFOG(13) (number of delayed pivots after 
> > > factorization): 0
> > >                             INFOG(14) (number of memory compress after 
> > > factorization): 0
> > >                             INFOG(15) (number of steps of iterative 
> > > refinement after solution): 0
> > >                             INFOG(16) (estimated size (in MB) of all 
> > > MUMPS internal data for factorization after analysis: value on the most 
> > > memory consuming processor): 1
> > >                             INFOG(17) (estimated size of all MUMPS 
> > > internal data for factorization after analysis: sum over all processors): 
> > > 1
> > >                             INFOG(18) (size of all MUMPS internal data 
> > > allocated during factorization: value on the most memory consuming 
> > > processor): 1
> > >                             INFOG(19) (size of all MUMPS internal data 
> > > allocated during factorization: sum over all processors): 1
> > >                             INFOG(20) (estimated number of entries in the 
> > > factors): 3042
> > >                             INFOG(21) (size in MB of memory effectively 
> > > used during factorization - value on the most memory consuming 
> > > processor): 1
> > >                             INFOG(22) (size in MB of memory effectively 
> > > used during factorization - sum over all processors): 1
> > >                             INFOG(23) (after analysis: value of ICNTL(6) 
> > > effectively used): 5
> > >                             INFOG(24) (after analysis: value of ICNTL(12) 
> > > effectively used): 1
> > >                             INFOG(25) (after factorization: number of 
> > > pivots modified by static pivoting): 0
> > >                             INFOG(28) (after factorization: number of 
> > > null pivots encountered): 0
> > >                             INFOG(29) (after factorization: effective 
> > > number of entries in the factors (sum over all processors)): 3042
> > >                             INFOG(30, 31) (after solution: size in Mbytes 
> > > of memory used during solution phase): 0, 0
> > >                             INFOG(32) (after analysis: type of analysis 
> > > done): 1
> > >                             INFOG(33) (value used for ICNTL(8)): -2
> > >                             INFOG(34) (exponent of the determinant if 
> > > determinant is requested): 0
> > >                 linear system matrix = precond matrix:
> > >                 Mat Object:                (fieldsplit_RB_split_)         
> > >         1 MPI processes
> > >                   type: seqaij
> > >                   rows=324, cols=324
> > >                   total: nonzeros=5760, allocated nonzeros=5760
> > >                   total number of mallocs used during MatSetValues calls 
> > > =0
> > >                     using I-node routines: found 108 nodes, limit used is 
> > > 5
> > >             A01
> > >               Mat Object:               1 MPI processes
> > >                 type: seqaij
> > >                 rows=324, cols=28476
> > >                 total: nonzeros=936, allocated nonzeros=936
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   using I-node routines: found 67 nodes, limit used is 5
> > >         Mat Object:         1 MPI processes
> > >           type: seqaij
> > >           rows=28476, cols=28476
> > >           total: nonzeros=1037052, allocated nonzeros=1037052
> > >           total number of mallocs used during MatSetValues calls =0
> > >             using I-node routines: found 9489 nodes, limit used is 5
> > >   linear system matrix = precond matrix:
> > >   Mat Object:  ()   1 MPI processes
> > >     type: seqaij
> > >     rows=28800, cols=28800
> > >     total: nonzeros=1024686, allocated nonzeros=1024794
> > >     total number of mallocs used during MatSetValues calls =0
> > >       using I-node routines: found 9600 nodes, limit used is 5
> > >
> > > ---------------------------------------------- PETSc Performance Summary: 
> > > ----------------------------------------------
> > >
> > > /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a 
> > > arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 
> > > 11 17:22:10 2017
> > > Using Petsc Release Version 3.7.3, unknown
> > >
> > >                          Max       Max/Min        Avg      Total
> > > Time (sec):           9.638e+01      1.00000   9.638e+01
> > > Objects:              2.030e+02      1.00000   2.030e+02
> > > Flops:                1.732e+11      1.00000   1.732e+11  1.732e+11
> > > Flops/sec:            1.797e+09      1.00000   1.797e+09  1.797e+09
> > > MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> > > MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> > > MPI Reductions:       0.000e+00      0.00000
> > >
> > > Flop counting convention: 1 flop = 1 real number operation of type 
> > > (multiply/divide/add/subtract)
> > >                             e.g., VecAXPY() for real vectors of length N 
> > > --> 2N flops
> > >                             and VecAXPY() for complex vectors of length N 
> > > --> 8N flops
> > >
> > > Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
> > > ---  -- Message Lengths --  -- Reductions --
> > >                         Avg     %Total     Avg     %Total   counts   
> > > %Total     Avg         %Total   counts   %Total
> > >  0:      Main Stage: 9.6379e+01 100.0%  1.7318e+11 100.0%  0.000e+00   
> > > 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
> > >
> > > ------------------------------------------------------------------------------------------------------------------------
> > > See the 'Profiling' chapter of the users' manual for details on 
> > > interpreting output.
> > > Phase summary info:
> > >    Count: number of times phase was executed
> > >    Time and Flops: Max - maximum over all processors
> > >                    Ratio - ratio of maximum to minimum over all processors
> > >    Mess: number of messages sent
> > >    Avg. len: average message length (bytes)
> > >    Reduct: number of global reductions
> > >    Global: entire computation
> > >    Stage: stages of a computation. Set stages with PetscLogStagePush() 
> > > and PetscLogStagePop().
> > >       %T - percent time in this phase         %F - percent flops in this 
> > > phase
> > >       %M - percent messages in this phase     %L - percent message 
> > > lengths in this phase
> > >       %R - percent reductions in this phase
> > >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
> > > over all processors)
> > > ------------------------------------------------------------------------------------------------------------------------
> > > Event                Count      Time (sec)     Flops                      
> > >        --- Global ---  --- Stage ---   Total
> > >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
> > > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > > ------------------------------------------------------------------------------------------------------------------------
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > > VecDot                42 1.0 2.2411e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   380
> > > VecTDot            77761 1.0 1.4294e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  1  3  0  0  0   1  3  0  0  0  3098
> > > VecNorm            38894 1.0 9.1002e-01 1.0 2.22e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  1  1  0  0  0   1  1  0  0  0  2434
> > > VecScale           38882 1.0 3.7314e-01 1.0 1.11e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  1  0  0  0   0  1  0  0  0  2967
> > > VecCopy            38908 1.0 2.1655e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecSet             77887 1.0 3.2034e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecAXPY            77777 1.0 1.8382e+00 1.0 4.43e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  2  3  0  0  0   2  3  0  0  0  2409
> > > VecAYPX            38875 1.0 1.2884e+00 1.0 2.21e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  1  1  0  0  0   1  1  0  0  0  1718
> > > VecAssemblyBegin      68 1.0 1.9407e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecAssemblyEnd        68 1.0 2.6941e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecScatterBegin       48 1.0 4.6349e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatMult            38891 1.0 4.3045e+01 1.0 8.03e+10 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 45 46  0  0  0  45 46  0  0  0  1866
> > > MatMultAdd         38889 1.0 3.5360e+01 1.0 7.91e+10 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 37 46  0  0  0  37 46  0  0  0  2236
> > > MatSolve           77769 1.0 4.8780e+01 1.0 7.95e+10 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 51 46  0  0  0  51 46  0  0  0  1631
> > > MatLUFactorNum         1 1.0 1.9575e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0  1274
> > > MatCholFctrSym         1 1.0 9.4891e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatCholFctrNum         1 1.0 3.7885e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatILUFactorSym        1 1.0 4.1780e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatConvert             1 1.0 3.0041e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatScale               2 1.0 2.7180e-05 1.0 2.53e+04 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   930
> > > MatAssemblyBegin      32 1.0 4.0531e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyEnd        32 1.0 1.2032e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetRow         114978 1.0 5.9254e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetRowIJ            2 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetSubMatrice       6 1.0 1.5707e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetOrdering         2 1.0 3.2425e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatZeroEntries         6 1.0 3.0580e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatView                7 1.0 3.5119e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAXPY                1 1.0 1.9384e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatMatMult             1 1.0 2.7120e-03 1.0 3.16e+05 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   117
> > > MatMatMultSym          1 1.0 1.8010e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatMatMultNum          1 1.0 6.1703e-04 1.0 3.16e+05 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   513
> > > KSPSetUp               4 1.0 9.8944e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSolve               1 1.0 9.3380e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 97100  0  0  0  97100  0  0  0  1855
> > > PCSetUp                4 1.0 6.6326e-02 1.0 2.53e+07 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   381
> > > PCSetUpOnBlocks        5 1.0 2.4082e-02 1.0 2.49e+07 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0  1036
> > > PCApply                5 1.0 9.3376e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 97100  0  0  0  97100  0  0  0  1855
> > > KSPSolve_FS_0          5 1.0 7.0214e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSolve_FS_Schu       5 1.0 9.3372e+01 1.0 1.73e+11 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 97100  0  0  0  97100  0  0  0  1855
> > > KSPSolve_FS_Low        5 1.0 2.1377e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > ------------------------------------------------------------------------------------------------------------------------
> > >
> > > Memory usage is given in bytes:
> > >
> > > Object Type          Creations   Destructions     Memory  Descendants' 
> > > Mem.
> > > Reports information only for process 0.
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > >               Vector    92             92      9698040     0.
> > >       Vector Scatter    24             24        15936     0.
> > >            Index Set    51             51       537876     0.
> > >    IS L to G Mapping     3              3       240408     0.
> > >               Matrix    16             16     77377776     0.
> > >        Krylov Solver     6              6         7888     0.
> > >       Preconditioner     6              6         6288     0.
> > >               Viewer     1              0            0     0.
> > >     Distributed Mesh     1              1         4624     0.
> > > Star Forest Bipartite Graph     2              2         1616     0.
> > >      Discrete System     1              1          872     0.
> > > ========================================================================================================================
> > > Average time to get PetscTime(): 0.
> > > #PETSc Option Table entries:
> > > -ksp_monitor
> > > -ksp_view
> > > -log_view
> > > #End of PETSc Option Table entries
> > > Compiled without FORTRAN kernels
> > > Compiled with full precision matrices (default)
> > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
> > > sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> > > Configure options: --with-shared-libraries=1 --with-debugging=0 
> > > --download-suitesparse --download-blacs --download-ptscotch=yes 
> > > --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl 
> > > --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps 
> > > --download-metis 
> > > --prefix=/home/dknez/software/libmesh_install/opt_real/petsc 
> > > --download-hypre --download-ml
> > > -----------------------------------------
> > > Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
> > > Machine characteristics: 
> > > Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
> > > Using PETSc directory: /home/dknez/software/petsc-src
> > > Using PETSc arch: arch-linux2-c-opt
> > > -----------------------------------------
> > >
> > > Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings 
> > > -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  
> > > ${COPTFLAGS} ${CFLAGS}
> > > Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 
> > > -Wno-unused-dummy-argument -g -O   ${FOPTFLAGS} ${FFLAGS}
> > > -----------------------------------------
> > >
> > > Using include paths: 
> > > -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include 
> > > -I/home/dknez/software/petsc-src/include 
> > > -I/home/dknez/software/petsc-src/include 
> > > -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include 
> > > -I/home/dknez/software/libmesh_install/opt_real/petsc/include 
> > > -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent 
> > > -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
> > >  -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
> > > -----------------------------------------
> > >
> > > Using C linker: mpicc
> > > Using Fortran linker: mpif90
> > > Using libraries: 
> > > -Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib 
> > > -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc 
> > > -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib 
> > > -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps 
> > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE 
> > > -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib 
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 
> > > -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu 
> > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu 
> > > -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx 
> > > -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd 
> > > -lsuitesparseconfig 
> > > -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 
> > > -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 
> > > -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch 
> > > -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08 
> > > -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm 
> > > -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz 
> > > -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib 
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 
> > > -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu 
> > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu 
> > > -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu 
> > > -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi 
> > > -lgcc_s -lpthread -ldl
> > > -----------------------------------------
> > >
> > >
> > >
> > >
> > > On Wed, Jan 11, 2017 at 4:49 PM, Dave May <dave.mayhe...@gmail.com> wrote:
> > > It looks like the Schur solve is requiring a huge number of iterates to 
> > > converge (based on the instances of MatMult).
> > > This is killing the performance.
> > >
> > > Are you sure that A11 is a good approximation to S? You might consider 
> > > trying the selfp option
> > >
> > > http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/PC/PCFieldSplitSetSchurPre.html#PCFieldSplitSetSchurPre
> > >
> > > Note that the best approx to S is likely both problem and discretisation 
> > > dependent so if selfp is also terrible, you might want to consider coding 
> > > up your own approx to S for your specific system.
> > >
> > >
> > > Thanks,
> > >   Dave
> > >
> > >
> > > On Wed, 11 Jan 2017 at 22:34, David Knezevic <david.kneze...@akselos.com> 
> > > wrote:
> > > I have a definite block 2x2 system and I figured it'd be good to apply 
> > > the PCFIELDSPLIT functionality with Schur complement, as described in 
> > > Section 4.5 of the manual.
> > >
> > > The A00 block of my matrix is very small so I figured I'd specify a 
> > > direct solver (i.e. MUMPS) for that block.
> > >
> > > So I did the following:
> > > - PCFieldSplitSetIS to specify the indices of the two splits
> > > - PCFieldSplitGetSubKSP to get the two KSP objects, and to set the solver 
> > > and PC types for each (MUMPS for A00, ILU+CG for A11)
> > > - I set -pc_fieldsplit_schur_fact_type full
> > >
> > > Below I have pasted the output of "-ksp_view -ksp_monitor -log_view" for 
> > > a test case. It seems to converge well, but I'm concerned about the speed 
> > > (about 90 seconds, vs. about 1 second if I use a direct solver for the 
> > > entire system). I just wanted to check if I'm setting this up in a good 
> > > way?
> > >
> > > Many thanks,
> > > David
> > >
> > > -----------------------------------------------------------------------------------
> > >
> > >   0 KSP Residual norm 5.405774214400e+04
> > >   1 KSP Residual norm 1.849649014371e+02
> > >   2 KSP Residual norm 7.462775074989e-02
> > >   3 KSP Residual norm 2.680497175260e-04
> > > KSP Object: 1 MPI processes
> > >   type: cg
> > >   maximum iterations=1000
> > >   tolerances:  relative=1e-06, absolute=1e-50, divergence=10000.
> > >   left preconditioning
> > >   using nonzero initial guess
> > >   using PRECONDITIONED norm type for convergence test
> > > PC Object: 1 MPI processes
> > >   type: fieldsplit
> > >     FieldSplit with Schur preconditioner, factorization FULL
> > >     Preconditioner for the Schur complement formed from A11
> > >     Split info:
> > >     Split number 0 Defined by IS
> > >     Split number 1 Defined by IS
> > >     KSP solver for A00 block
> > >       KSP Object:      (fieldsplit_RB_split_)       1 MPI processes
> > >         type: preonly
> > >         maximum iterations=10000, initial guess is zero
> > >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >         left preconditioning
> > >         using NONE norm type for convergence test
> > >       PC Object:      (fieldsplit_RB_split_)       1 MPI processes
> > >         type: cholesky
> > >           Cholesky: out-of-place factorization
> > >           tolerance for zero pivot 2.22045e-14
> > >           matrix ordering: natural
> > >           factor fill ratio given 0., needed 0.
> > >             Factored matrix follows:
> > >               Mat Object:               1 MPI processes
> > >                 type: seqaij
> > >                 rows=324, cols=324
> > >                 package used to perform factorization: mumps
> > >                 total: nonzeros=3042, allocated nonzeros=3042
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   MUMPS run parameters:
> > >                     SYM (matrix type):                   2
> > >                     PAR (host participation):            1
> > >                     ICNTL(1) (output for error):         6
> > >                     ICNTL(2) (output of diagnostic msg): 0
> > >                     ICNTL(3) (output for global info):   0
> > >                     ICNTL(4) (level of printing):        0
> > >                     ICNTL(5) (input mat struct):         0
> > >                     ICNTL(6) (matrix prescaling):        7
> > >                     ICNTL(7) (sequentia matrix ordering):7
> > >                     ICNTL(8) (scalling strategy):        77
> > >                     ICNTL(10) (max num of refinements):  0
> > >                     ICNTL(11) (error analysis):          0
> > >                     ICNTL(12) (efficiency control):                       
> > >   0
> > >                     ICNTL(13) (efficiency control):                       
> > >   0
> > >                     ICNTL(14) (percentage of estimated workspace 
> > > increase): 20
> > >                     ICNTL(18) (input mat struct):                         
> > >   0
> > >                     ICNTL(19) (Shur complement info):                     
> > >   0
> > >                     ICNTL(20) (rhs sparse pattern):                       
> > >   0
> > >                     ICNTL(21) (solution struct):                          
> > >   0
> > >                     ICNTL(22) (in-core/out-of-core facility):             
> > >   0
> > >                     ICNTL(23) (max size of memory can be allocated 
> > > locally):0
> > >                     ICNTL(24) (detection of null pivot rows):             
> > >   0
> > >                     ICNTL(25) (computation of a null space basis):        
> > >   0
> > >                     ICNTL(26) (Schur options for rhs or solution):        
> > >   0
> > >                     ICNTL(27) (experimental parameter):                   
> > >   -24
> > >                     ICNTL(28) (use parallel or sequential ordering):      
> > >   1
> > >                     ICNTL(29) (parallel ordering):                        
> > >   0
> > >                     ICNTL(30) (user-specified set of entries in inv(A)):  
> > >   0
> > >                     ICNTL(31) (factors is discarded in the solve phase):  
> > >   0
> > >                     ICNTL(33) (compute determinant):                      
> > >   0
> > >                     CNTL(1) (relative pivoting threshold):      0.01
> > >                     CNTL(2) (stopping criterion of refinement): 
> > > 1.49012e-08
> > >                     CNTL(3) (absolute pivoting threshold):      0.
> > >                     CNTL(4) (value of static pivoting):         -1.
> > >                     CNTL(5) (fixation for null pivots):         0.
> > >                     RINFO(1) (local estimated flops for the elimination 
> > > after analysis):
> > >                       [0] 29394.
> > >                     RINFO(2) (local estimated flops for the assembly 
> > > after factorization):
> > >                       [0]  1092.
> > >                     RINFO(3) (local estimated flops for the elimination 
> > > after factorization):
> > >                       [0]  29394.
> > >                     INFO(15) (estimated size of (in MB) MUMPS internal 
> > > data for running numerical factorization):
> > >                     [0] 1
> > >                     INFO(16) (size of (in MB) MUMPS internal data used 
> > > during numerical factorization):
> > >                       [0] 1
> > >                     INFO(23) (num of pivots eliminated on this processor 
> > > after factorization):
> > >                       [0] 324
> > >                     RINFOG(1) (global estimated flops for the elimination 
> > > after analysis): 29394.
> > >                     RINFOG(2) (global estimated flops for the assembly 
> > > after factorization): 1092.
> > >                     RINFOG(3) (global estimated flops for the elimination 
> > > after factorization): 29394.
> > >                     (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): 
> > > (0.,0.)*(2^0)
> > >                     INFOG(3) (estimated real workspace for factors on all 
> > > processors after analysis): 3888
> > >                     INFOG(4) (estimated integer workspace for factors on 
> > > all processors after analysis): 2067
> > >                     INFOG(5) (estimated maximum front size in the 
> > > complete tree): 12
> > >                     INFOG(6) (number of nodes in the complete tree): 53
> > >                     INFOG(7) (ordering option effectively use after 
> > > analysis): 2
> > >                     INFOG(8) (structural symmetry in percent of the 
> > > permuted matrix after analysis): 100
> > >                     INFOG(9) (total real/complex workspace to store the 
> > > matrix factors after factorization): 3888
> > >                     INFOG(10) (total integer space store the matrix 
> > > factors after factorization): 2067
> > >                     INFOG(11) (order of largest frontal matrix after 
> > > factorization): 12
> > >                     INFOG(12) (number of off-diagonal pivots): 0
> > >                     INFOG(13) (number of delayed pivots after 
> > > factorization): 0
> > >                     INFOG(14) (number of memory compress after 
> > > factorization): 0
> > >                     INFOG(15) (number of steps of iterative refinement 
> > > after solution): 0
> > >                     INFOG(16) (estimated size (in MB) of all MUMPS 
> > > internal data for factorization after analysis: value on the most memory 
> > > consuming processor): 1
> > >                     INFOG(17) (estimated size of all MUMPS internal data 
> > > for factorization after analysis: sum over all processors): 1
> > >                     INFOG(18) (size of all MUMPS internal data allocated 
> > > during factorization: value on the most memory consuming processor): 1
> > >                     INFOG(19) (size of all MUMPS internal data allocated 
> > > during factorization: sum over all processors): 1
> > >                     INFOG(20) (estimated number of entries in the 
> > > factors): 3042
> > >                     INFOG(21) (size in MB of memory effectively used 
> > > during factorization - value on the most memory consuming processor): 1
> > >                     INFOG(22) (size in MB of memory effectively used 
> > > during factorization - sum over all processors): 1
> > >                     INFOG(23) (after analysis: value of ICNTL(6) 
> > > effectively used): 5
> > >                     INFOG(24) (after analysis: value of ICNTL(12) 
> > > effectively used): 1
> > >                     INFOG(25) (after factorization: number of pivots 
> > > modified by static pivoting): 0
> > >                     INFOG(28) (after factorization: number of null pivots 
> > > encountered): 0
> > >                     INFOG(29) (after factorization: effective number of 
> > > entries in the factors (sum over all processors)): 3042
> > >                     INFOG(30, 31) (after solution: size in Mbytes of 
> > > memory used during solution phase): 0, 0
> > >                     INFOG(32) (after analysis: type of analysis done): 1
> > >                     INFOG(33) (value used for ICNTL(8)): -2
> > >                     INFOG(34) (exponent of the determinant if determinant 
> > > is requested): 0
> > >         linear system matrix = precond matrix:
> > >         Mat Object:        (fieldsplit_RB_split_)         1 MPI processes
> > >           type: seqaij
> > >           rows=324, cols=324
> > >           total: nonzeros=5760, allocated nonzeros=5760
> > >           total number of mallocs used during MatSetValues calls =0
> > >             using I-node routines: found 108 nodes, limit used is 5
> > >     KSP solver for S = A11 - A10 inv(A00) A01
> > >       KSP Object:      (fieldsplit_FE_split_)       1 MPI processes
> > >         type: cg
> > >         maximum iterations=10000, initial guess is zero
> > >         tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >         left preconditioning
> > >         using PRECONDITIONED norm type for convergence test
> > >       PC Object:      (fieldsplit_FE_split_)       1 MPI processes
> > >         type: bjacobi
> > >           block Jacobi: number of blocks = 1
> > >           Local solve is same for all blocks, in the following KSP and PC 
> > > objects:
> > >           KSP Object:          (fieldsplit_FE_split_sub_)           1 MPI 
> > > processes
> > >             type: preonly
> > >             maximum iterations=10000, initial guess is zero
> > >             tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
> > >             left preconditioning
> > >             using NONE norm type for convergence test
> > >           PC Object:          (fieldsplit_FE_split_sub_)           1 MPI 
> > > processes
> > >             type: ilu
> > >               ILU: out-of-place factorization
> > >               0 levels of fill
> > >               tolerance for zero pivot 2.22045e-14
> > >               matrix ordering: natural
> > >               factor fill ratio given 1., needed 1.
> > >                 Factored matrix follows:
> > >                   Mat Object:                   1 MPI processes
> > >                     type: seqaij
> > >                     rows=28476, cols=28476
> > >                     package used to perform factorization: petsc
> > >                     total: nonzeros=1017054, allocated nonzeros=1017054
> > >                     total number of mallocs used during MatSetValues 
> > > calls =0
> > >                       using I-node routines: found 9492 nodes, limit used 
> > > is 5
> > >             linear system matrix = precond matrix:
> > >             Mat Object:            (fieldsplit_FE_split_)             1 
> > > MPI processes
> > >               type: seqaij
> > >               rows=28476, cols=28476
> > >               total: nonzeros=1017054, allocated nonzeros=1017054
> > >               total number of mallocs used during MatSetValues calls =0
> > >                 using I-node routines: found 9492 nodes, limit used is 5
> > >         linear system matrix followed by preconditioner matrix:
> > >         Mat Object:        (fieldsplit_FE_split_)         1 MPI processes
> > >           type: schurcomplement
> > >           rows=28476, cols=28476
> > >             Schur complement A11 - A10 inv(A00) A01
> > >             A11
> > >               Mat Object:              (fieldsplit_FE_split_)             
> > >   1 MPI processes
> > >                 type: seqaij
> > >                 rows=28476, cols=28476
> > >                 total: nonzeros=1017054, allocated nonzeros=1017054
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   using I-node routines: found 9492 nodes, limit used is 5
> > >             A10
> > >               Mat Object:               1 MPI processes
> > >                 type: seqaij
> > >                 rows=28476, cols=324
> > >                 total: nonzeros=936, allocated nonzeros=936
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   using I-node routines: found 5717 nodes, limit used is 5
> > >             KSP of A00
> > >               KSP Object:              (fieldsplit_RB_split_)             
> > >   1 MPI processes
> > >                 type: preonly
> > >                 maximum iterations=10000, initial guess is zero
> > >                 tolerances:  relative=1e-05, absolute=1e-50, 
> > > divergence=10000.
> > >                 left preconditioning
> > >                 using NONE norm type for convergence test
> > >               PC Object:              (fieldsplit_RB_split_)              
> > >  1 MPI processes
> > >                 type: cholesky
> > >                   Cholesky: out-of-place factorization
> > >                   tolerance for zero pivot 2.22045e-14
> > >                   matrix ordering: natural
> > >                   factor fill ratio given 0., needed 0.
> > >                     Factored matrix follows:
> > >                       Mat Object:                       1 MPI processes
> > >                         type: seqaij
> > >                         rows=324, cols=324
> > >                         package used to perform factorization: mumps
> > >                         total: nonzeros=3042, allocated nonzeros=3042
> > >                         total number of mallocs used during MatSetValues 
> > > calls =0
> > >                           MUMPS run parameters:
> > >                             SYM (matrix type):                   2
> > >                             PAR (host participation):            1
> > >                             ICNTL(1) (output for error):         6
> > >                             ICNTL(2) (output of diagnostic msg): 0
> > >                             ICNTL(3) (output for global info):   0
> > >                             ICNTL(4) (level of printing):        0
> > >                             ICNTL(5) (input mat struct):         0
> > >                             ICNTL(6) (matrix prescaling):        7
> > >                             ICNTL(7) (sequentia matrix ordering):7
> > >                             ICNTL(8) (scalling strategy):        77
> > >                             ICNTL(10) (max num of refinements):  0
> > >                             ICNTL(11) (error analysis):          0
> > >                             ICNTL(12) (efficiency control):               
> > >           0
> > >                             ICNTL(13) (efficiency control):               
> > >           0
> > >                             ICNTL(14) (percentage of estimated workspace 
> > > increase): 20
> > >                             ICNTL(18) (input mat struct):                 
> > >           0
> > >                             ICNTL(19) (Shur complement info):             
> > >           0
> > >                             ICNTL(20) (rhs sparse pattern):               
> > >           0
> > >                             ICNTL(21) (solution struct):                  
> > >           0
> > >                             ICNTL(22) (in-core/out-of-core facility):     
> > >           0
> > >                             ICNTL(23) (max size of memory can be 
> > > allocated locally):0
> > >                             ICNTL(24) (detection of null pivot rows):     
> > >           0
> > >                             ICNTL(25) (computation of a null space 
> > > basis):          0
> > >                             ICNTL(26) (Schur options for rhs or 
> > > solution):          0
> > >                             ICNTL(27) (experimental parameter):           
> > >           -24
> > >                             ICNTL(28) (use parallel or sequential 
> > > ordering):        1
> > >                             ICNTL(29) (parallel ordering):                
> > >           0
> > >                             ICNTL(30) (user-specified set of entries in 
> > > inv(A)):    0
> > >                             ICNTL(31) (factors is discarded in the solve 
> > > phase):    0
> > >                             ICNTL(33) (compute determinant):              
> > >           0
> > >                             CNTL(1) (relative pivoting threshold):      
> > > 0.01
> > >                             CNTL(2) (stopping criterion of refinement): 
> > > 1.49012e-08
> > >                             CNTL(3) (absolute pivoting threshold):      0.
> > >                             CNTL(4) (value of static pivoting):         
> > > -1.
> > >                             CNTL(5) (fixation for null pivots):         0.
> > >                             RINFO(1) (local estimated flops for the 
> > > elimination after analysis):
> > >                               [0] 29394.
> > >                             RINFO(2) (local estimated flops for the 
> > > assembly after factorization):
> > >                               [0]  1092.
> > >                             RINFO(3) (local estimated flops for the 
> > > elimination after factorization):
> > >                               [0]  29394.
> > >                             INFO(15) (estimated size of (in MB) MUMPS 
> > > internal data for running numerical factorization):
> > >                             [0] 1
> > >                             INFO(16) (size of (in MB) MUMPS internal data 
> > > used during numerical factorization):
> > >                               [0] 1
> > >                             INFO(23) (num of pivots eliminated on this 
> > > processor after factorization):
> > >                               [0] 324
> > >                             RINFOG(1) (global estimated flops for the 
> > > elimination after analysis): 29394.
> > >                             RINFOG(2) (global estimated flops for the 
> > > assembly after factorization): 1092.
> > >                             RINFOG(3) (global estimated flops for the 
> > > elimination after factorization): 29394.
> > >                             (RINFOG(12) RINFOG(13))*2^INFOG(34) 
> > > (determinant): (0.,0.)*(2^0)
> > >                             INFOG(3) (estimated real workspace for 
> > > factors on all processors after analysis): 3888
> > >                             INFOG(4) (estimated integer workspace for 
> > > factors on all processors after analysis): 2067
> > >                             INFOG(5) (estimated maximum front size in the 
> > > complete tree): 12
> > >                             INFOG(6) (number of nodes in the complete 
> > > tree): 53
> > >                             INFOG(7) (ordering option effectively use 
> > > after analysis): 2
> > >                             INFOG(8) (structural symmetry in percent of 
> > > the permuted matrix after analysis): 100
> > >                             INFOG(9) (total real/complex workspace to 
> > > store the matrix factors after factorization): 3888
> > >                             INFOG(10) (total integer space store the 
> > > matrix factors after factorization): 2067
> > >                             INFOG(11) (order of largest frontal matrix 
> > > after factorization): 12
> > >                             INFOG(12) (number of off-diagonal pivots): 0
> > >                             INFOG(13) (number of delayed pivots after 
> > > factorization): 0
> > >                             INFOG(14) (number of memory compress after 
> > > factorization): 0
> > >                             INFOG(15) (number of steps of iterative 
> > > refinement after solution): 0
> > >                             INFOG(16) (estimated size (in MB) of all 
> > > MUMPS internal data for factorization after analysis: value on the most 
> > > memory consuming processor): 1
> > >                             INFOG(17) (estimated size of all MUMPS 
> > > internal data for factorization after analysis: sum over all processors): 
> > > 1
> > >                             INFOG(18) (size of all MUMPS internal data 
> > > allocated during factorization: value on the most memory consuming 
> > > processor): 1
> > >                             INFOG(19) (size of all MUMPS internal data 
> > > allocated during factorization: sum over all processors): 1
> > >                             INFOG(20) (estimated number of entries in the 
> > > factors): 3042
> > >                             INFOG(21) (size in MB of memory effectively 
> > > used during factorization - value on the most memory consuming 
> > > processor): 1
> > >                             INFOG(22) (size in MB of memory effectively 
> > > used during factorization - sum over all processors): 1
> > >                             INFOG(23) (after analysis: value of ICNTL(6) 
> > > effectively used): 5
> > >                             INFOG(24) (after analysis: value of ICNTL(12) 
> > > effectively used): 1
> > >                             INFOG(25) (after factorization: number of 
> > > pivots modified by static pivoting): 0
> > >                             INFOG(28) (after factorization: number of 
> > > null pivots encountered): 0
> > >                             INFOG(29) (after factorization: effective 
> > > number of entries in the factors (sum over all processors)): 3042
> > >                             INFOG(30, 31) (after solution: size in Mbytes 
> > > of memory used during solution phase): 0, 0
> > >                             INFOG(32) (after analysis: type of analysis 
> > > done): 1
> > >                             INFOG(33) (value used for ICNTL(8)): -2
> > >                             INFOG(34) (exponent of the determinant if 
> > > determinant is requested): 0
> > >                 linear system matrix = precond matrix:
> > >                 Mat Object:                (fieldsplit_RB_split_)         
> > >         1 MPI processes
> > >                   type: seqaij
> > >                   rows=324, cols=324
> > >                   total: nonzeros=5760, allocated nonzeros=5760
> > >                   total number of mallocs used during MatSetValues calls 
> > > =0
> > >                     using I-node routines: found 108 nodes, limit used is 
> > > 5
> > >             A01
> > >               Mat Object:               1 MPI processes
> > >                 type: seqaij
> > >                 rows=324, cols=28476
> > >                 total: nonzeros=936, allocated nonzeros=936
> > >                 total number of mallocs used during MatSetValues calls =0
> > >                   using I-node routines: found 67 nodes, limit used is 5
> > >         Mat Object:        (fieldsplit_FE_split_)         1 MPI processes
> > >           type: seqaij
> > >           rows=28476, cols=28476
> > >           total: nonzeros=1017054, allocated nonzeros=1017054
> > >           total number of mallocs used during MatSetValues calls =0
> > >             using I-node routines: found 9492 nodes, limit used is 5
> > >   linear system matrix = precond matrix:
> > >   Mat Object:  ()   1 MPI processes
> > >     type: seqaij
> > >     rows=28800, cols=28800
> > >     total: nonzeros=1024686, allocated nonzeros=1024794
> > >     total number of mallocs used during MatSetValues calls =0
> > >       using I-node routines: found 9600 nodes, limit used is 5
> > >
> > >
> > > ---------------------------------------------- PETSc Performance Summary: 
> > > ----------------------------------------------
> > >
> > > /home/dknez/akselos-dev/scrbe/build/bin/fe_solver-opt_real on a 
> > > arch-linux2-c-opt named david-Lenovo with 1 processor, by dknez Wed Jan 
> > > 11 16:16:47 2017
> > > Using Petsc Release Version 3.7.3, unknown
> > >
> > >                          Max       Max/Min        Avg      Total
> > > Time (sec):           9.179e+01      1.00000   9.179e+01
> > > Objects:              1.990e+02      1.00000   1.990e+02
> > > Flops:                1.634e+11      1.00000   1.634e+11  1.634e+11
> > > Flops/sec:            1.780e+09      1.00000   1.780e+09  1.780e+09
> > > MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
> > > MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
> > > MPI Reductions:       0.000e+00      0.00000
> > >
> > > Flop counting convention: 1 flop = 1 real number operation of type 
> > > (multiply/divide/add/subtract)
> > >                             e.g., VecAXPY() for real vectors of length N 
> > > --> 2N flops
> > >                             and VecAXPY() for complex vectors of length N 
> > > --> 8N flops
> > >
> > > Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
> > > ---  -- Message Lengths --  -- Reductions --
> > >                         Avg     %Total     Avg     %Total   counts   
> > > %Total     Avg         %Total   counts   %Total
> > >  0:      Main Stage: 9.1787e+01 100.0%  1.6336e+11 100.0%  0.000e+00   
> > > 0.0%  0.000e+00        0.0%  0.000e+00   0.0%
> > >
> > > ------------------------------------------------------------------------------------------------------------------------
> > > See the 'Profiling' chapter of the users' manual for details on 
> > > interpreting output.
> > > Phase summary info:
> > >    Count: number of times phase was executed
> > >    Time and Flops: Max - maximum over all processors
> > >                    Ratio - ratio of maximum to minimum over all processors
> > >    Mess: number of messages sent
> > >    Avg. len: average message length (bytes)
> > >    Reduct: number of global reductions
> > >    Global: entire computation
> > >    Stage: stages of a computation. Set stages with PetscLogStagePush() 
> > > and PetscLogStagePop().
> > >       %T - percent time in this phase         %F - percent flops in this 
> > > phase
> > >       %M - percent messages in this phase     %L - percent message 
> > > lengths in this phase
> > >       %R - percent reductions in this phase
> > >    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
> > > over all processors)
> > > ------------------------------------------------------------------------------------------------------------------------
> > > Event                Count      Time (sec)     Flops                      
> > >        --- Global ---  --- Stage ---   Total
> > >                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
> > > Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> > > ------------------------------------------------------------------------------------------------------------------------
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > > VecDot                42 1.0 2.4080e-05 1.0 8.53e+03 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   354
> > > VecTDot            74012 1.0 1.2440e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  1  3  0  0  0   1  3  0  0  0  3388
> > > VecNorm            37020 1.0 8.3580e-01 1.0 2.11e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  1  1  0  0  0   1  1  0  0  0  2523
> > > VecScale           37008 1.0 3.5800e-01 1.0 1.05e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  1  0  0  0   0  1  0  0  0  2944
> > > VecCopy            37034 1.0 2.5754e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecSet             74137 1.0 3.0537e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecAXPY            74029 1.0 1.7233e+00 1.0 4.22e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  2  3  0  0  0   2  3  0  0  0  2446
> > > VecAYPX            37001 1.0 1.2214e+00 1.0 2.11e+09 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  1  1  0  0  0   1  1  0  0  0  1725
> > > VecAssemblyBegin      68 1.0 2.0432e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecAssemblyEnd        68 1.0 2.5988e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > VecScatterBegin       48 1.0 4.6921e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatMult            37017 1.0 4.1269e+01 1.0 7.65e+10 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 45 47  0  0  0  45 47  0  0  0  1853
> > > MatMultAdd         37015 1.0 3.3638e+01 1.0 7.53e+10 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 37 46  0  0  0  37 46  0  0  0  2238
> > > MatSolve           74021 1.0 4.6602e+01 1.0 7.42e+10 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 51 45  0  0  0  51 45  0  0  0  1593
> > > MatLUFactorNum         1 1.0 1.7209e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0  1420
> > > MatCholFctrSym         1 1.0 8.8310e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatCholFctrNum         1 1.0 3.6907e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatILUFactorSym        1 1.0 3.7372e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyBegin      29 1.0 2.1458e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatAssemblyEnd        29 1.0 9.9473e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetRow          58026 1.0 2.8155e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetRowIJ            2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetSubMatrice       6 1.0 1.5399e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatGetOrdering         2 1.0 3.0112e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatZeroEntries         6 1.0 2.9490e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > MatView                7 1.0 3.4356e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSetUp               4 1.0 9.4891e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSolve               1 1.0 8.8793e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 97100  0  0  0  97100  0  0  0  1840
> > > PCSetUp                4 1.0 3.8375e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0   637
> > > PCSetUpOnBlocks        5 1.0 2.1250e-02 1.0 2.44e+07 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0  1150
> > > PCApply                5 1.0 8.8789e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 97100  0  0  0  97100  0  0  0  1840
> > > KSPSolve_FS_0          5 1.0 7.5364e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > KSPSolve_FS_Schu       5 1.0 8.8785e+01 1.0 1.63e+11 1.0 0.0e+00 0.0e+00 
> > > 0.0e+00 97100  0  0  0  97100  0  0  0  1840
> > > KSPSolve_FS_Low        5 1.0 2.1019e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > ------------------------------------------------------------------------------------------------------------------------
> > >
> > > Memory usage is given in bytes:
> > >
> > > Object Type          Creations   Destructions     Memory  Descendants' 
> > > Mem.
> > > Reports information only for process 0.
> > >
> > > --- Event Stage 0: Main Stage
> > >
> > >               Vector    91             91      9693912     0.
> > >       Vector Scatter    24             24        15936     0.
> > >            Index Set    51             51       537888     0.
> > >    IS L to G Mapping     3              3       240408     0.
> > >               Matrix    13             13     64097868     0.
> > >        Krylov Solver     6              6         7888     0.
> > >       Preconditioner     6              6         6288     0.
> > >               Viewer     1              0            0     0.
> > >     Distributed Mesh     1              1         4624     0.
> > > Star Forest Bipartite Graph     2              2         1616     0.
> > >      Discrete System     1              1          872     0.
> > > ========================================================================================================================
> > > Average time to get PetscTime(): 0.
> > > #PETSc Option Table entries:
> > > -ksp_monitor
> > > -ksp_view
> > > -log_view
> > > #End of PETSc Option Table entries
> > > Compiled without FORTRAN kernels
> > > Compiled with full precision matrices (default)
> > > sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
> > > sizeof(PetscScalar) 8 sizeof(PetscInt) 4
> > > Configure options: --with-shared-libraries=1 --with-debugging=0 
> > > --download-suitesparse --download-blacs --download-ptscotch=yes 
> > > --with-blas-lapack-dir=/opt/intel/system_studio_2015.2.050/mkl 
> > > --CXXFLAGS=-Wl,--no-as-needed --download-scalapack --download-mumps 
> > > --download-metis 
> > > --prefix=/home/dknez/software/libmesh_install/opt_real/petsc 
> > > --download-hypre --download-ml
> > > -----------------------------------------
> > > Libraries compiled on Wed Sep 21 17:38:52 2016 on david-Lenovo
> > > Machine characteristics: 
> > > Linux-4.4.0-38-generic-x86_64-with-Ubuntu-16.04-xenial
> > > Using PETSc directory: /home/dknez/software/petsc-src
> > > Using PETSc arch: arch-linux2-c-opt
> > > -----------------------------------------
> > >
> > > Using C compiler: mpicc  -fPIC  -Wall -Wwrite-strings 
> > > -Wno-strict-aliasing -Wno-unknown-pragmas -fvisibility=hidden -g -O  
> > > ${COPTFLAGS} ${CFLAGS}
> > > Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 
> > > -Wno-unused-dummy-argument -g -O   ${FOPTFLAGS} ${FFLAGS}
> > > -----------------------------------------
> > >
> > > Using include paths: 
> > > -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include 
> > > -I/home/dknez/software/petsc-src/include 
> > > -I/home/dknez/software/petsc-src/include 
> > > -I/home/dknez/software/petsc-src/arch-linux2-c-opt/include 
> > > -I/home/dknez/software/libmesh_install/opt_real/petsc/include 
> > > -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent 
> > > -I/usr/lib/openmpi/include/openmpi/opal/mca/event/libevent2021/libevent/include
> > >  -I/usr/lib/openmpi/include -I/usr/lib/openmpi/include/openmpi
> > > -----------------------------------------
> > >
> > > Using C linker: mpicc
> > > Using Fortran linker: mpif90
> > > Using libraries: 
> > > -Wl,-rpath,/home/dknez/software/petsc-src/arch-linux2-c-opt/lib 
> > > -L/home/dknez/software/petsc-src/arch-linux2-c-opt/lib -lpetsc 
> > > -Wl,-rpath,/home/dknez/software/libmesh_install/opt_real/petsc/lib 
> > > -L/home/dknez/software/libmesh_install/opt_real/petsc/lib -lcmumps 
> > > -ldmumps -lsmumps -lzmumps -lmumps_common -lpord -lmetis -lHYPRE 
> > > -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib 
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 
> > > -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu 
> > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu 
> > > -L/lib/x86_64-linux-gnu -lmpi_cxx -lstdc++ -lscalapack -lml -lmpi_cxx 
> > > -lstdc++ -lumfpack -lklu -lcholmod -lbtf -lccolamd -lcolamd -lcamd -lamd 
> > > -lsuitesparseconfig 
> > > -Wl,-rpath,/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 
> > > -L/opt/intel/system_studio_2015.2.050/mkl/lib/intel64 -lmkl_intel_lp64 
> > > -lmkl_sequential -lmkl_core -lpthread -lm -lhwloc -lptesmumps -lptscotch 
> > > -lptscotcherr -lscotch -lscotcherr -lX11 -lm -lmpi_usempif08 
> > > -lmpi_usempi_ignore_tkr -lmpi_mpifh -lgfortran -lm -lgfortran -lm 
> > > -lquadmath -lm -lmpi_cxx -lstdc++ -lrt -lm -lpthread -lz 
> > > -Wl,-rpath,/usr/lib/openmpi/lib -L/usr/lib/openmpi/lib 
> > > -Wl,-rpath,/usr/lib/gcc/x86_64-linux-gnu/5 
> > > -L/usr/lib/gcc/x86_64-linux-gnu/5 -Wl,-rpath,/usr/lib/x86_64-linux-gnu 
> > > -L/usr/lib/x86_64-linux-gnu -Wl,-rpath,/lib/x86_64-linux-gnu 
> > > -L/lib/x86_64-linux-gnu -Wl,-rpath,/usr/lib/x86_64-linux-gnu 
> > > -L/usr/lib/x86_64-linux-gnu -ldl -Wl,-rpath,/usr/lib/openmpi/lib -lmpi 
> > > -lgcc_s -lpthread -ldl
> > > -----------------------------------------
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > <logfile_1.txt><logfile_2.txt>
> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener

Re: [petsc-users] Using PCFIELDSPLIT with -pc_fieldsplit_type schur

Reply via email to