On Tue, Mar 7, 2017 at 10:23 AM, Hong <hzh...@mcs.anl.gov> wrote: > I checked > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(), > they are virtually same. Why the version for BAIJ is so much slower? > I'll investigate it. >
> Fande, > How large is your matrix? Is it possible to send us your matrix so I can > test it? > Thanks, Hong, It is a 3020875x3020875 matrix, and it is large. I can make a small one if you like, but not sure it will reproduce this issue or not. Fande, > > Hong > > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: > >> >> Thanks. Even the symbolic is slower for BAIJ. I don't like that, it >> definitely should not be since it is (at least should be) doing a symbolic >> factorization on a symbolic matrix 1/11th the size! >> >> Keep us informed. >> >> >> >> > On Mar 6, 2017, at 5:44 PM, Kong, Fande <fande.k...@inl.gov> wrote: >> > >> > Thanks, Barry, >> > >> > Log info: >> > >> > AIJ: >> > >> > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 >> 0.0e+00 0.0e+00 0 41 0 0 0 0 41 0 0 0 49594 >> > MatLUFactorNum 25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 >> 0.0e+00 0.0e+00 0 26 0 0 0 0 26 0 0 0 153394 >> > MatILUFactorSym 13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > >> > BAIJ: >> > >> > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 >> 0.0e+00 0.0e+00 1 29 0 0 0 1 29 0 0 0 154617 >> > MatLUFactorNum 25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 >> 0.0e+00 0.0e+00 1 67 0 0 0 1 67 0 0 0 303190 >> > MatILUFactorSym 13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 >> 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> > >> > It looks like both MatSolve and MatLUFactorNum are slower. >> > >> > I will try your suggestions. >> > >> > Fande >> > >> > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >> > >> > Note also that if the 11 by 11 blocks are actually sparse (and you >> don't store all the zeros in the blocks in the AIJ format) then then AIJ >> non-block factorization involves less floating point operations and less >> memory access so can be faster than the BAIJ format, depending on "how >> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with >> AIJ (with zeros maybe in certain locations) then the above is not true. >> > >> > >> > > On Mar 6, 2017, at 5:10 PM, Barry Smith <bsm...@mcs.anl.gov> wrote: >> > > >> > > >> > > This is because for block size 11 it is using calls to LAPACK/BLAS >> for the block operations instead of custom routines for that block size. >> > > >> > > Here is what you need to do. For a good sized case run both with >> -log_view and check the time spent in >> > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and >> BAIJ. If they have a different number of function calls then divide by the >> function call count to determine the time per function call. >> > > >> > > This will tell you which routine needs to be optimized first either >> MatLUFactorNumeric or MatSolve. My guess is MatSolve. >> > > >> > > So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function >> MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function >> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the >> block size of 11. >> > > >> > > Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 >> it uses the new routine something like. >> > > >> > > if (both_identity) { >> > > if (b->bs == 11) >> > > C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1; >> > > } else { >> > > C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering; >> > > } >> > > >> > > Rerun and look at the new -log_view. Send all three -log_view to >> use at this point. If this optimization helps and now >> > > MatLUFactorNumeric is the time sink you can do the process to >> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block >> custom version. >> > > >> > > Barry >> > > >> > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande <fande.k...@inl.gov> wrote: >> > >> >> > >> >> > >> >> > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan < >> patrick.sa...@gmail.com> wrote: >> > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande <fande.k...@inl.gov> >> wrote: >> > >>> Hi All, >> > >>> >> > >>> I am solving a nonlinear system whose Jacobian matrix has a block >> structure. >> > >>> More precisely, there is a mesh, and for each vertex there are 11 >> variables >> > >>> associated with it. I am using BAIJ. >> > >>> >> > >>> I thought block ILU(k) should be more efficient than the point-wise >> ILU(k). >> > >>> After some numerical experiments, I found that the block ILU(K) is >> much >> > >>> slower than the point-wise version. >> > >> Do you mean that it takes more iterations to converge, or that the >> > >> time per iteration is greater, or both? >> > >> >> > >> The number of iterations is very similar, but the timer per >> iteration is greater. >> > >> >> > >> >> > >>> >> > >>> Any thoughts? >> > >>> >> > >>> Fande, >> > >> >> > > >> > >> > >> >> >