Re: [petsc-users] block ILU(K) is slower than the point-wise version?

Barry Smith Tue, 07 Mar 2017 13:35:40 -0800

> On Mar 7, 2017, at 3:26 PM, Kong, Fande <[email protected]> wrote:
> 
> 
> 
> On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith <[email protected]> wrote:
> 
>    The matrix is too small. Please post ONE big matrix
> 
> I am using "-ksp_view_pmat  binary" to save the matrix. How can I save the 
> latest one only for a time-dependent problem?


  No easy way. You can send us the first matrix or you can use 
bin/PetscBinaryIO.py to cut out one matrix from the file.

> 
> 
> Fande, 
> 
>  
> 
> > On Mar 7, 2017, at 2:26 PM, Kong, Fande <[email protected]> wrote:
> >
> > Uploaded to google drive, and sent you links in another email. Not sure if 
> > it works or not.
> >
> > Fande,
> >
> > On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith <[email protected]> wrote:
> >
> >    It is too big for email you can post it somewhere so we can download it.
> >
> >
> > > On Mar 7, 2017, at 12:01 PM, Kong, Fande <[email protected]> wrote:
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 10:23 AM, Hong <[email protected]> wrote:
> > > I checked
> > > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > > they are virtually same. Why the version for BAIJ is so much slower?
> > > I'll investigate it.
> > >
> > > Fande,
> > > How large is your matrix? Is it possible to send us your matrix so I can 
> > > test it?
> > >
> > > Thanks, Hong,
> > >
> > > It is a 3020875x3020875 matrix, and it is large. I can make a small one 
> > > if you like, but not sure it will reproduce this issue or not.
> > >
> > > Fande,
> > >
> > >
> > >
> > > Hong
> > >
> > >
> > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <[email protected]> wrote:
> > >
> > >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
> > > definitely should not be since it is (at least should be) doing a 
> > > symbolic factorization on a symbolic matrix 1/11th the size!
> > >
> > >    Keep us informed.
> > >
> > >
> > >
> > > > On Mar 6, 2017, at 5:44 PM, Kong, Fande <[email protected]> wrote:
> > > >
> > > > Thanks, Barry,
> > > >
> > > > Log info:
> > > >
> > > > AIJ:
> > > >
> > > > MatSolve             850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 
> > > > 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > > MatLUFactorNum        25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 
> > > > 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > > MatILUFactorSym       13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 
> > > > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > >
> > > > BAIJ:
> > > >
> > > > MatSolve             826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 
> > > > 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > > MatLUFactorNum        25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 
> > > > 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > > > MatILUFactorSym       13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 
> > > > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> > > >
> > > > It looks like both MatSolve and MatLUFactorNum are slower.
> > > >
> > > > I will try your suggestions.
> > > >
> > > > Fande
> > > >
> > > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <[email protected]> wrote:
> > > >
> > > >   Note also that if the 11 by 11 blocks are actually sparse (and you 
> > > > don't store all the zeros in the blocks in the AIJ format) then then 
> > > > AIJ non-block factorization involves less floating point operations and 
> > > > less memory access so can be faster than the BAIJ format, depending on 
> > > > "how sparse" the blocks are. If you actually "fill in" the 11 by 11 
> > > > blocks with AIJ (with zeros maybe in certain locations) then the above 
> > > > is not true.
> > > >
> > > >
> > > > > On Mar 6, 2017, at 5:10 PM, Barry Smith <[email protected]> wrote:
> > > > >
> > > > >
> > > > >   This is because for block size 11 it is using calls to LAPACK/BLAS 
> > > > > for the block operations instead of custom routines for that block 
> > > > > size.
> > > > >
> > > > >   Here is what you need to do. For a good sized case run both with 
> > > > > -log_view and check the time spent in
> > > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and 
> > > > > BAIJ. If they have a different number of function calls then divide 
> > > > > by the function call count to determine the time per function call.
> > > > >
> > > > >   This will tell you which routine needs to be optimized first either 
> > > > > MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > > > >
> > > > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> > > > > MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> > > > > MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for 
> > > > > the block size of 11.
> > > > >
> > > > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 
> > > > > it uses the new routine something like.
> > > > >
> > > > > if (both_identity) {
> > > > >   if (b->bs == 11)
> > > > >    C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > > > >   } else {
> > > > >    C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > > > >   }
> > > > >
> > > > >   Rerun and look at the new -log_view. Send all three -log_view to 
> > > > > use at this point.  If this optimization helps and now
> > > > > MatLUFactorNumeric is the time sink you can do the process to 
> > > > > MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size 
> > > > > block custom version.
> > > > >
> > > > >  Barry
> > > > >
> > > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <[email protected]> wrote:
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan 
> > > > >> <[email protected]> wrote:
> > > > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <[email protected]> 
> > > > >> wrote:
> > > > >>> Hi All,
> > > > >>>
> > > > >>> I am solving a nonlinear system whose Jacobian matrix has a block 
> > > > >>> structure.
> > > > >>> More precisely, there is a mesh, and for each vertex there are 11 
> > > > >>> variables
> > > > >>> associated with it. I am using BAIJ.
> > > > >>>
> > > > >>> I thought block ILU(k) should be more efficient than the point-wise 
> > > > >>> ILU(k).
> > > > >>> After some numerical experiments, I found that the block ILU(K) is 
> > > > >>> much
> > > > >>> slower than the point-wise version.
> > > > >> Do you mean that it takes more iterations to converge, or that the
> > > > >> time per iteration is greater, or both?
> > > > >>
> > > > >> The number of iterations is very similar, but the timer per 
> > > > >> iteration is greater.
> > > > >>
> > > > >>
> > > > >>>
> > > > >>> Any thoughts?
> > > > >>>
> > > > >>> Fande,
> > > > >>
> > > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> 
>

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

Reply via email to