This is because for block size 11 it is using calls to LAPACK/BLAS for the 
block operations instead of custom routines for that block size.

   Here is what you need to do. For a good sized case run both with -log_view 
and check the time spent in 
MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and BAIJ. If 
they have a different number of function calls then divide by the function call 
count to determine the time per function call.

   This will tell you which routine needs to be optimized first either 
MatLUFactorNumeric or MatSolve. My guess is MatSolve.

   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the block 
size of 11. 

   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it uses 
the new routine something like.

if (both_identity) {
   if (b->bs == 11) 
    C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
   } else {
    C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
   }
   
   Rerun and look at the new -log_view. Send all three -log_view to use at this 
point.  If this optimization helps and now 
MatLUFactorNumeric is the time sink you can do the process to 
MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block custom 
version.

  Barry

> On Mar 6, 2017, at 4:32 PM, Kong, Fande <fande.k...@inl.gov> wrote:
> 
> 
> 
> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <patrick.sa...@gmail.com> wrote:
> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <fande.k...@inl.gov> wrote:
> > Hi All,
> >
> > I am solving a nonlinear system whose Jacobian matrix has a block structure.
> > More precisely, there is a mesh, and for each vertex there are 11 variables
> > associated with it. I am using BAIJ.
> >
> > I thought block ILU(k) should be more efficient than the point-wise ILU(k).
> > After some numerical experiments, I found that the block ILU(K) is much
> > slower than the point-wise version.
> Do you mean that it takes more iterations to converge, or that the
> time per iteration is greater, or both?
> 
> The number of iterations is very similar, but the timer per iteration is 
> greater.
> 
>  
> >
> > Any thoughts?
> >
> > Fande,
> 

Reply via email to