Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
definitely should not be since it is (at least should be) doing a symbolic 
factorization on a symbolic matrix 1/11th the size! 
 
   Keep us informed.



> On Mar 6, 2017, at 5:44 PM, Kong, Fande <[email protected]> wrote:
> 
> Thanks, Barry,
> 
> Log info:
> 
> AIJ:
> 
> MatSolve             850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00 
> 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> MatLUFactorNum        25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00 
> 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> MatILUFactorSym       13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> 
> BAIJ:
> 
> MatSolve             826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 0.0e+00 
> 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> MatLUFactorNum        25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 0.0e+00 
> 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> MatILUFactorSym       13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0     0
> 
> It looks like both MatSolve and MatLUFactorNum are slower.
> 
> I will try your suggestions.
> 
> Fande
> 
> On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <[email protected]> wrote:
> 
>   Note also that if the 11 by 11 blocks are actually sparse (and you don't 
> store all the zeros in the blocks in the AIJ format) then then AIJ non-block 
> factorization involves less floating point operations and less memory access 
> so can be faster than the BAIJ format, depending on "how sparse" the blocks 
> are. If you actually "fill in" the 11 by 11 blocks with AIJ (with zeros maybe 
> in certain locations) then the above is not true.
> 
> 
> > On Mar 6, 2017, at 5:10 PM, Barry Smith <[email protected]> wrote:
> >
> >
> >   This is because for block size 11 it is using calls to LAPACK/BLAS for 
> > the block operations instead of custom routines for that block size.
> >
> >   Here is what you need to do. For a good sized case run both with 
> > -log_view and check the time spent in
> > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and BAIJ. 
> > If they have a different number of function calls then divide by the 
> > function call count to determine the time per function call.
> >
> >   This will tell you which routine needs to be optimized first either 
> > MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> >
> >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> > MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> > MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the 
> > block size of 11.
> >
> >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it 
> > uses the new routine something like.
> >
> > if (both_identity) {
> >   if (b->bs == 11)
> >    C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> >   } else {
> >    C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> >   }
> >
> >   Rerun and look at the new -log_view. Send all three -log_view to use at 
> > this point.  If this optimization helps and now
> > MatLUFactorNumeric is the time sink you can do the process to 
> > MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block 
> > custom version.
> >
> >  Barry
> >
> >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <[email protected]> wrote:
> >>
> >>
> >>
> >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <[email protected]> 
> >> wrote:
> >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <[email protected]> wrote:
> >>> Hi All,
> >>>
> >>> I am solving a nonlinear system whose Jacobian matrix has a block 
> >>> structure.
> >>> More precisely, there is a mesh, and for each vertex there are 11 
> >>> variables
> >>> associated with it. I am using BAIJ.
> >>>
> >>> I thought block ILU(k) should be more efficient than the point-wise 
> >>> ILU(k).
> >>> After some numerical experiments, I found that the block ILU(K) is much
> >>> slower than the point-wise version.
> >> Do you mean that it takes more iterations to converge, or that the
> >> time per iteration is greater, or both?
> >>
> >> The number of iterations is very similar, but the timer per iteration is 
> >> greater.
> >>
> >>
> >>>
> >>> Any thoughts?
> >>>
> >>> Fande,
> >>
> >
> 
> 

Reply via email to