Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

> On Mar 7, 2017, at 9:41 PM, Fande Kong  wrote:
> 
> 
> 
> On Tue, Mar 7, 2017 at 7:37 PM, Barry Smith  wrote:
> 
> > On Mar 7, 2017, at 4:35 PM, Kong, Fande  wrote:
> >
> > I found one issue on my side. The preallocation is not right for the BAIJ 
> > matrix.  Will this slow down MatLUFactor and MatSolve?
> 
>   No, but you should still fix it.
> 
> >
> > How to converge AIJ to BAIJ using a command-line option?
> 
>Instead of using MatCreateSeq/MPIAIJ() at the command line you would use
> 
>MatCreate()
>MatSetSizes()
>MatSetBlockSize()
>MatSetFromOptions()
> 
>  MatSetFromOptions() has to be called before "MatSetPreallocation"?  What 
> happens if I call MatSetFromOptions() right after "MatSetPreallocation"?

   To late! The type has to be set before the preallocation, otherwise the 
preallocation is ignored.

   Note there is a a MatXAIJSetPreallocation() that works for both AIJ and BAIJ 
matrices in one line.


>  
>MatMPIAIJSetPreallocation()
>MatMPIBAIJSetPreallocation() and any other preallocations you want
>MatSetValues.MatAssemblyBegin/End()
> 
>Then you can use -mat_type baij or aij to set the type.
> 
>Barry
> 
> >
> > Fande,
> >
> > On Tue, Mar 7, 2017 at 3:26 PM, Jed Brown  wrote:
> > "Kong, Fande"  writes:
> >
> > > On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:
> > >
> > >> Hong  writes:
> > >>
> > >> > Fande,
> > >> > Got it. Below are what I get:
> > >>
> > >> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
> > >> get a somewhat larger benefit.)
> > >>
> > >
> > >
> > > I am using ILU(0). Will it be much better to use ILU(k>0)?
> >
> > It'll be slower, but might converge faster.  You asked about ILU(k) so I
> > assumed you were interested in k>0.
> >
> 
> 



Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Fande Kong
On Tue, Mar 7, 2017 at 7:55 PM, Barry Smith  wrote:

>
>I have run your larger matrix on my laptop with "default" optimization
> (so --with-debugging=0) this is what I get
>
> 
> 
> EventCount  Time (sec) Flop
>  --- Global ---  --- Stage ---   Total
>Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> 
> 
>
> AIJ
>
> MatMult5 1.0 7.7636e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 12 16  0  0  0  16 16  0  0  0  1830
> MatSolve   5 1.0 7.8164e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 12 16  0  0  0  16 16  0  0  0  1818
> MatLUFactorNum 1 1.0 2.3056e-01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 35 67  0  0  0  46 67  0  0  0  2580
> MatILUFactorSym1 1.0 8.3201e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 13  0  0  0  0  17  0  0  0  0 0
>
> BAIJ
>
> MatMult5 1.0 5.3482e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  6  6  0  0  0   9  6  0  0  0  2657
> MatSolve   5 1.0 6.2669e-02 1.0 1.39e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  7  6  0  0  0  11  6  0  0  0  2224
> MatLUFactorNum 1 1.0 3.7688e-01 1.0 2.12e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00 40 88  0  0  0  66 88  0  0  0  5635
> MatILUFactorSym1 1.0 4.4828e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  5  0  0  0  0   8  0  0  0  0 0
>
> So BAIJ symbolic is faster (which definitely should be). BAIJ MatMult and
> MatSolve are also faster, the numerical BAIJ factorization is slower.
>
> Providing custom code for block size 11 should definitely improve the
> performance of all three of these.
>
> I note that the number of iterations 5 is much less than in the case you
> emailed originally? Is this really the matrix of interest?
>

The matrix given to you is the matrix for the first nonlinear iteration of
the first time step. The number of iterations in the original email  is for
all nonlinear iterations and all time steps.



Fande,



>
>   Barry
>
> > On Mar 7, 2017, at 3:26 PM, Kong, Fande  wrote:
> >
> >
> >
> > On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith  wrote:
> >
> >The matrix is too small. Please post ONE big matrix
> >
> > I am using "-ksp_view_pmat  binary" to save the matrix. How can I save
> the latest one only for a time-dependent problem?
> >
> >
> > Fande,
> >
> >
> >
> > > On Mar 7, 2017, at 2:26 PM, Kong, Fande  wrote:
> > >
> > > Uploaded to google drive, and sent you links in another email. Not
> sure if it works or not.
> > >
> > > Fande,
> > >
> > > On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith 
> wrote:
> > >
> > >It is too big for email you can post it somewhere so we can
> download it.
> > >
> > >
> > > > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> > > >
> > > >
> > > >
> > > > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> > > > I checked
> > > > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > > > they are virtually same. Why the version for BAIJ is so much slower?
> > > > I'll investigate it.
> > > >
> > > > Fande,
> > > > How large is your matrix? Is it possible to send us your matrix so I
> can test it?
> > > >
> > > > Thanks, Hong,
> > > >
> > > > It is a 3020875x3020875 matrix, and it is large. I can make a small
> one if you like, but not sure it will reproduce this issue or not.
> > > >
> > > > Fande,
> > > >
> > > >
> > > >
> > > > Hong
> > > >
> > > >
> > > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith 
> wrote:
> > > >
> > > >   Thanks. Even the symbolic is slower for BAIJ. I don't like that,
> it definitely should not be since it is (at least should be) doing a
> symbolic factorization on a symbolic matrix 1/11th the size!
> > > >
> > > >Keep us informed.
> > > >
> > > >
> > > >
> > > > > On Mar 6, 2017, at 5:44 PM, Kong, Fande 
> wrote:
> > > > >
> > > > > Thanks, Barry,
> > > > >
> > > > > Log info:
> > > > >
> > > > > AIJ:
> > > > >
> > > > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > > >
> > > > > BAIJ:
> > > > >
> > > > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
> 0.0e+00 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

  Just for kicks I added MatMult_SeqBAIJ_11 to master and obtained a new 

MatMult5 1.0 4.4513e-02 1.0 1.94e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  5  8  0  0  0   8  8  0  0  0  2918

which demonstrates how the custom routines for different sizes can improve the 
performance. Note that better prefetching hints and use of SIMD instructions 
for KNL could potentially improve the performance (a great deal) more.

  What hardware are you running on?





> On Mar 7, 2017, at 8:55 PM, Barry Smith  wrote:
> 
> 
>   I have run your larger matrix on my laptop with "default" optimization (so 
> --with-debugging=0) this is what I get
> 
> 
> EventCount  Time (sec) Flop   
>   --- Global ---  --- Stage ---   Total
>   Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len 
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> 
> 
> AIJ
> 
> MatMult5 1.0 7.7636e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00 12 16  0  0  0  16 16  0  0  0  1830
> MatSolve   5 1.0 7.8164e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00 12 16  0  0  0  16 16  0  0  0  1818
> MatLUFactorNum 1 1.0 2.3056e-01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00 35 67  0  0  0  46 67  0  0  0  2580
> MatILUFactorSym1 1.0 8.3201e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00 13  0  0  0  0  17  0  0  0  0 0
> 
> BAIJ
> 
> MatMult5 1.0 5.3482e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00  6  6  0  0  0   9  6  0  0  0  2657
> MatSolve   5 1.0 6.2669e-02 1.0 1.39e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00  7  6  0  0  0  11  6  0  0  0  2224
> MatLUFactorNum 1 1.0 3.7688e-01 1.0 2.12e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00 40 88  0  0  0  66 88  0  0  0  5635
> MatILUFactorSym1 1.0 4.4828e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  5  0  0  0  0   8  0  0  0  0 0
> 
> So BAIJ symbolic is faster (which definitely should be). BAIJ MatMult and 
> MatSolve are also faster, the numerical BAIJ factorization is slower.
> 
> Providing custom code for block size 11 should definitely improve the 
> performance of all three of these.
> 
> I note that the number of iterations 5 is much less than in the case you 
> emailed originally? Is this really the matrix of interest?
> 
>  Barry
> 
>> On Mar 7, 2017, at 3:26 PM, Kong, Fande  wrote:
>> 
>> 
>> 
>> On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith  wrote:
>> 
>>   The matrix is too small. Please post ONE big matrix
>> 
>> I am using "-ksp_view_pmat  binary" to save the matrix. How can I save the 
>> latest one only for a time-dependent problem?
>> 
>> 
>> Fande, 
>> 
>> 
>> 
>>> On Mar 7, 2017, at 2:26 PM, Kong, Fande  wrote:
>>> 
>>> Uploaded to google drive, and sent you links in another email. Not sure if 
>>> it works or not.
>>> 
>>> Fande,
>>> 
>>> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
>>> 
>>>   It is too big for email you can post it somewhere so we can download it.
>>> 
>>> 
 On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
 
 
 
 On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
 I checked
 MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
 they are virtually same. Why the version for BAIJ is so much slower?
 I'll investigate it.
 
 Fande,
 How large is your matrix? Is it possible to send us your matrix so I can 
 test it?
 
 Thanks, Hong,
 
 It is a 3020875x3020875 matrix, and it is large. I can make a small one if 
 you like, but not sure it will reproduce this issue or not.
 
 Fande,
 
 
 
 Hong
 
 
 On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
 
  Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
 definitely should not be since it is (at least should be) doing a symbolic 
 factorization on a symbolic matrix 1/11th the size!
 
   Keep us informed.
 
 
 
> On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> 
> Thanks, Barry,
> 
> Log info:
> 
> AIJ:
> 
> MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00 
> 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00 
> 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> 
> BAIJ:
> 
> MatSolve 826 1.0 1.3016e+01 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Fande Kong
On Tue, Mar 7, 2017 at 7:37 PM, Barry Smith  wrote:

>
> > On Mar 7, 2017, at 4:35 PM, Kong, Fande  wrote:
> >
> > I found one issue on my side. The preallocation is not right for the
> BAIJ matrix.  Will this slow down MatLUFactor and MatSolve?
>
>   No, but you should still fix it.
>
> >
> > How to converge AIJ to BAIJ using a command-line option?
>
>Instead of using MatCreateSeq/MPIAIJ() at the command line you would use
>
>MatCreate()
>MatSetSizes()
>MatSetBlockSize()
>MatSetFromOptions()
>

 MatSetFromOptions() has to be called before "MatSetPreallocation"?
What happens if I call MatSetFromOptions() right after
"MatSetPreallocation"?


>MatMPIAIJSetPreallocation()
>MatMPIBAIJSetPreallocation() and any other preallocations you want
>MatSetValues.MatAssemblyBegin/End()
>
>Then you can use -mat_type baij or aij to set the type.
>
>Barry
>
> >
> > Fande,
> >
> > On Tue, Mar 7, 2017 at 3:26 PM, Jed Brown  wrote:
> > "Kong, Fande"  writes:
> >
> > > On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:
> > >
> > >> Hong  writes:
> > >>
> > >> > Fande,
> > >> > Got it. Below are what I get:
> > >>
> > >> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible
> to
> > >> get a somewhat larger benefit.)
> > >>
> > >
> > >
> > > I am using ILU(0). Will it be much better to use ILU(k>0)?
> >
> > It'll be slower, but might converge faster.  You asked about ILU(k) so I
> > assumed you were interested in k>0.
> >
>
>


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

   I have run your larger matrix on my laptop with "default" optimization (so 
--with-debugging=0) this is what I get


EventCount  Time (sec) Flop 
--- Global ---  --- Stage ---   Total
   Max Ratio  Max Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s


AIJ

MatMult5 1.0 7.7636e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 12 16  0  0  0  16 16  0  0  0  1830
MatSolve   5 1.0 7.8164e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 12 16  0  0  0  16 16  0  0  0  1818
MatLUFactorNum 1 1.0 2.3056e-01 1.0 5.95e+08 1.0 0.0e+00 0.0e+00 
0.0e+00 35 67  0  0  0  46 67  0  0  0  2580
MatILUFactorSym1 1.0 8.3201e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 13  0  0  0  0  17  0  0  0  0 0

BAIJ

MatMult5 1.0 5.3482e-02 1.0 1.42e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  6  6  0  0  0   9  6  0  0  0  2657
MatSolve   5 1.0 6.2669e-02 1.0 1.39e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  7  6  0  0  0  11  6  0  0  0  2224
MatLUFactorNum 1 1.0 3.7688e-01 1.0 2.12e+09 1.0 0.0e+00 0.0e+00 
0.0e+00 40 88  0  0  0  66 88  0  0  0  5635
MatILUFactorSym1 1.0 4.4828e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   8  0  0  0  0 0

So BAIJ symbolic is faster (which definitely should be). BAIJ MatMult and 
MatSolve are also faster, the numerical BAIJ factorization is slower.

Providing custom code for block size 11 should definitely improve the 
performance of all three of these.

I note that the number of iterations 5 is much less than in the case you 
emailed originally? Is this really the matrix of interest?

  Barry

> On Mar 7, 2017, at 3:26 PM, Kong, Fande  wrote:
> 
> 
> 
> On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith  wrote:
> 
>The matrix is too small. Please post ONE big matrix
> 
> I am using "-ksp_view_pmat  binary" to save the matrix. How can I save the 
> latest one only for a time-dependent problem?
> 
> 
> Fande, 
> 
>  
> 
> > On Mar 7, 2017, at 2:26 PM, Kong, Fande  wrote:
> >
> > Uploaded to google drive, and sent you links in another email. Not sure if 
> > it works or not.
> >
> > Fande,
> >
> > On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
> >
> >It is too big for email you can post it somewhere so we can download it.
> >
> >
> > > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> > > I checked
> > > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > > they are virtually same. Why the version for BAIJ is so much slower?
> > > I'll investigate it.
> > >
> > > Fande,
> > > How large is your matrix? Is it possible to send us your matrix so I can 
> > > test it?
> > >
> > > Thanks, Hong,
> > >
> > > It is a 3020875x3020875 matrix, and it is large. I can make a small one 
> > > if you like, but not sure it will reproduce this issue or not.
> > >
> > > Fande,
> > >
> > >
> > >
> > > Hong
> > >
> > >
> > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
> > >
> > >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
> > > definitely should not be since it is (at least should be) doing a 
> > > symbolic factorization on a symbolic matrix 1/11th the size!
> > >
> > >Keep us informed.
> > >
> > >
> > >
> > > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> > > >
> > > > Thanks, Barry,
> > > >
> > > > Log info:
> > > >
> > > > AIJ:
> > > >
> > > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 
> > > > 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 
> > > > 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 
> > > > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > >
> > > > BAIJ:
> > > >
> > > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 
> > > > 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 
> > > > 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 
> > > > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > >
> > > > It looks like both MatSolve and MatLUFactorNum are slower.
> > > >
> > > > I will try your suggestions.
> > > >
> > > > Fande
> > > >
> > > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
> > > 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

> On Mar 7, 2017, at 4:35 PM, Kong, Fande  wrote:
> 
> I found one issue on my side. The preallocation is not right for the BAIJ 
> matrix.  Will this slow down MatLUFactor and MatSolve?

  No, but you should still fix it.

> 
> How to converge AIJ to BAIJ using a command-line option?

   Instead of using MatCreateSeq/MPIAIJ() at the command line you would use 

   MatCreate()
   MatSetSizes()
   MatSetBlockSize()
   MatSetFromOptions()
   MatMPIAIJSetPreallocation()
   MatMPIBAIJSetPreallocation() and any other preallocations you want
   MatSetValues.MatAssemblyBegin/End()

   Then you can use -mat_type baij or aij to set the type.

   Barry

> 
> Fande,
> 
> On Tue, Mar 7, 2017 at 3:26 PM, Jed Brown  wrote:
> "Kong, Fande"  writes:
> 
> > On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:
> >
> >> Hong  writes:
> >>
> >> > Fande,
> >> > Got it. Below are what I get:
> >>
> >> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
> >> get a somewhat larger benefit.)
> >>
> >
> >
> > I am using ILU(0). Will it be much better to use ILU(k>0)?
> 
> It'll be slower, but might converge faster.  You asked about ILU(k) so I
> assumed you were interested in k>0.
> 



Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Hong
Fande :

> I found one issue on my side. The preallocation is not right for the BAIJ
> matrix.  Will this slow down MatLUFactor and MatSolve?
>

preallocation  should not affect ilu(0).

>
> How to converge AIJ to BAIJ using a command-line option?
>
-mat_type aij or -mat_type baij

Hong

>
>
> Fande,
>
> On Tue, Mar 7, 2017 at 3:26 PM, Jed Brown  wrote:
>
>> "Kong, Fande"  writes:
>>
>> > On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:
>> >
>> >> Hong  writes:
>> >>
>> >> > Fande,
>> >> > Got it. Below are what I get:
>> >>
>> >> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
>> >> get a somewhat larger benefit.)
>> >>
>> >
>> >
>> > I am using ILU(0). Will it be much better to use ILU(k>0)?
>>
>> It'll be slower, but might converge faster.  You asked about ILU(k) so I
>> assumed you were interested in k>0.
>>
>
>


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Kong, Fande
I found one issue on my side. The preallocation is not right for the BAIJ
matrix.  Will this slow down MatLUFactor and MatSolve?

How to converge AIJ to BAIJ using a command-line option?

Fande,

On Tue, Mar 7, 2017 at 3:26 PM, Jed Brown  wrote:

> "Kong, Fande"  writes:
>
> > On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:
> >
> >> Hong  writes:
> >>
> >> > Fande,
> >> > Got it. Below are what I get:
> >>
> >> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
> >> get a somewhat larger benefit.)
> >>
> >
> >
> > I am using ILU(0). Will it be much better to use ILU(k>0)?
>
> It'll be slower, but might converge faster.  You asked about ILU(k) so I
> assumed you were interested in k>0.
>


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Jed Brown
"Kong, Fande"  writes:

> On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:
>
>> Hong  writes:
>>
>> > Fande,
>> > Got it. Below are what I get:
>>
>> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
>> get a somewhat larger benefit.)
>>
>
>
> I am using ILU(0). Will it be much better to use ILU(k>0)?

It'll be slower, but might converge faster.  You asked about ILU(k) so I
assumed you were interested in k>0.


signature.asc
Description: PGP signature


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Kong, Fande
On Tue, Mar 7, 2017 at 3:16 PM, Jed Brown  wrote:

> Hong  writes:
>
> > Fande,
> > Got it. Below are what I get:
>
> Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
> get a somewhat larger benefit.)
>


I am using ILU(0). Will it be much better to use ILU(k>0)?

Fande,



>
> > petsc/src/ksp/ksp/examples/tutorials (master)
> > $ ./ex10 -f0 binaryoutput -rhs 0 -mat_view ascii::ascii_info
> > Mat Object: 1 MPI processes
> >   type: seqaij
> >   rows=8019, cols=8019, bs=11
> >   total: nonzeros=1890625, allocated nonzeros=1890625
> >   total number of mallocs used during MatSetValues calls =0
> > using I-node routines: found 2187 nodes, limit used is 5
> > Number of iterations =   3
> > Residual norm 0.00200589
> >
> > -mat_type aij
> > MatMult4 1.0 8.3621e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00  6  7  0  0  0   7  7  0  0  0  1805
> > MatSolve   4 1.0 8.3971e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00  6  7  0  0  0   7  7  0  0  0  1797
> > MatLUFactorNum 1 1.0 8.6171e-02 1.0 1.80e+08 1.0 0.0e+00 0.0e+00
> > 0.0e+00 57 85  0  0  0  70 85  0  0  0  2086
> > MatILUFactorSym1 1.0 1.4951e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00 10  0  0  0  0  12  0  0  0  0 0
> >
> > -mat_type baij
> > MatMult4 1.0 5.5540e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00  4  5  0  0  0   7  5  0  0  0  2718
> > MatSolve   4 1.0 7.0803e-03 1.0 1.48e+07 1.0 0.0e+00 0.0e+00
> > 0.0e+00  5  5  0  0  0   8  5  0  0  0  2086
> > MatLUFactorNum 1 1.0 6.0118e-02 1.0 2.55e+08 1.0 0.0e+00 0.0e+00
> > 0.0e+00 42 89  0  0  0  72 89  0  0  0  4241
> > MatILUFactorSym1 1.0 6.7251e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> > 0.0e+00  5  0  0  0  0   8  0  0  0  0 0
> >
> > I ran it on my macpro. baij is faster than aij in all routines.
> >
> > Hong
> >
> > On Tue, Mar 7, 2017 at 2:26 PM, Kong, Fande  wrote:
> >
> >> Uploaded to google drive, and sent you links in another email. Not sure
> if
> >> it works or not.
> >>
> >> Fande,
> >>
> >> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith 
> wrote:
> >>
> >>>
> >>>It is too big for email you can post it somewhere so we can download
> >>> it.
> >>>
> >>>
> >>>
> >>> > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> >>> >
> >>> >
> >>> >
> >>> > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> >>> > I checked
> >>> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> >>> > they are virtually same. Why the version for BAIJ is so much slower?
> >>> > I'll investigate it.
> >>> >
> >>> > Fande,
> >>> > How large is your matrix? Is it possible to send us your matrix so I
> >>> can test it?
> >>> >
> >>> > Thanks, Hong,
> >>> >
> >>> > It is a 3020875x3020875 matrix, and it is large. I can make a small
> one
> >>> if you like, but not sure it will reproduce this issue or not.
> >>> >
> >>> > Fande,
> >>> >
> >>> >
> >>> >
> >>> > Hong
> >>> >
> >>> >
> >>> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith 
> wrote:
> >>> >
> >>> >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
> >>> definitely should not be since it is (at least should be) doing a
> symbolic
> >>> factorization on a symbolic matrix 1/11th the size!
> >>> >
> >>> >Keep us informed.
> >>> >
> >>> >
> >>> >
> >>> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande 
> wrote:
> >>> > >
> >>> > > Thanks, Barry,
> >>> > >
> >>> > > Log info:
> >>> > >
> >>> > > AIJ:
> >>> > >
> >>> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
> >>> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> >>> > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
> >>> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> >>> > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
> >>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> >>> > >
> >>> > > BAIJ:
> >>> > >
> >>> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
> >>> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> >>> > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
> >>> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> >>> > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
> >>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> >>> > >
> >>> > > It looks like both MatSolve and MatLUFactorNum are slower.
> >>> > >
> >>> > > I will try your suggestions.
> >>> > >
> >>> > > Fande
> >>> > >
> >>> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith 
> >>> wrote:
> >>> > >
> >>> > >   Note also that if the 11 by 11 blocks are actually sparse (and
> you
> >>> don't store all the zeros in the blocks in the AIJ format) then then
> AIJ
> >>> non-block factorization involves less floating point operations and
> less
> >>> memory 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Jed Brown
Hong  writes:

> Fande,
> Got it. Below are what I get:

Is Fande using ILU(0) or ILU(k)?  (And I think it should be possible to
get a somewhat larger benefit.)

> petsc/src/ksp/ksp/examples/tutorials (master)
> $ ./ex10 -f0 binaryoutput -rhs 0 -mat_view ascii::ascii_info
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=8019, cols=8019, bs=11
>   total: nonzeros=1890625, allocated nonzeros=1890625
>   total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 2187 nodes, limit used is 5
> Number of iterations =   3
> Residual norm 0.00200589
>
> -mat_type aij
> MatMult4 1.0 8.3621e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  6  7  0  0  0   7  7  0  0  0  1805
> MatSolve   4 1.0 8.3971e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  6  7  0  0  0   7  7  0  0  0  1797
> MatLUFactorNum 1 1.0 8.6171e-02 1.0 1.80e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 57 85  0  0  0  70 85  0  0  0  2086
> MatILUFactorSym1 1.0 1.4951e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00 10  0  0  0  0  12  0  0  0  0 0
>
> -mat_type baij
> MatMult4 1.0 5.5540e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  4  5  0  0  0   7  5  0  0  0  2718
> MatSolve   4 1.0 7.0803e-03 1.0 1.48e+07 1.0 0.0e+00 0.0e+00
> 0.0e+00  5  5  0  0  0   8  5  0  0  0  2086
> MatLUFactorNum 1 1.0 6.0118e-02 1.0 2.55e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 42 89  0  0  0  72 89  0  0  0  4241
> MatILUFactorSym1 1.0 6.7251e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  5  0  0  0  0   8  0  0  0  0 0
>
> I ran it on my macpro. baij is faster than aij in all routines.
>
> Hong
>
> On Tue, Mar 7, 2017 at 2:26 PM, Kong, Fande  wrote:
>
>> Uploaded to google drive, and sent you links in another email. Not sure if
>> it works or not.
>>
>> Fande,
>>
>> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
>>
>>>
>>>It is too big for email you can post it somewhere so we can download
>>> it.
>>>
>>>
>>>
>>> > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
>>> >
>>> >
>>> >
>>> > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
>>> > I checked
>>> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
>>> > they are virtually same. Why the version for BAIJ is so much slower?
>>> > I'll investigate it.
>>> >
>>> > Fande,
>>> > How large is your matrix? Is it possible to send us your matrix so I
>>> can test it?
>>> >
>>> > Thanks, Hong,
>>> >
>>> > It is a 3020875x3020875 matrix, and it is large. I can make a small one
>>> if you like, but not sure it will reproduce this issue or not.
>>> >
>>> > Fande,
>>> >
>>> >
>>> >
>>> > Hong
>>> >
>>> >
>>> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
>>> >
>>> >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
>>> definitely should not be since it is (at least should be) doing a symbolic
>>> factorization on a symbolic matrix 1/11th the size!
>>> >
>>> >Keep us informed.
>>> >
>>> >
>>> >
>>> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
>>> > >
>>> > > Thanks, Barry,
>>> > >
>>> > > Log info:
>>> > >
>>> > > AIJ:
>>> > >
>>> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
>>> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
>>> > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
>>> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
>>> > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>>> > >
>>> > > BAIJ:
>>> > >
>>> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
>>> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
>>> > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
>>> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
>>> > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>>> > >
>>> > > It looks like both MatSolve and MatLUFactorNum are slower.
>>> > >
>>> > > I will try your suggestions.
>>> > >
>>> > > Fande
>>> > >
>>> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith 
>>> wrote:
>>> > >
>>> > >   Note also that if the 11 by 11 blocks are actually sparse (and you
>>> don't store all the zeros in the blocks in the AIJ format) then then AIJ
>>> non-block factorization involves less floating point operations and less
>>> memory access so can be faster than the BAIJ format, depending on "how
>>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
>>> AIJ (with zeros maybe in certain locations) then the above is not true.
>>> > >
>>> > >
>>> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
>>> > > >
>>> > > >
>>> > > >   This is because for block size 11 it is using calls to
>>> LAPACK/BLAS for 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

> On Mar 7, 2017, at 3:26 PM, Kong, Fande  wrote:
> 
> 
> 
> On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith  wrote:
> 
>The matrix is too small. Please post ONE big matrix
> 
> I am using "-ksp_view_pmat  binary" to save the matrix. How can I save the 
> latest one only for a time-dependent problem?

  No easy way. You can send us the first matrix or you can use 
bin/PetscBinaryIO.py to cut out one matrix from the file.

> 
> 
> Fande, 
> 
>  
> 
> > On Mar 7, 2017, at 2:26 PM, Kong, Fande  wrote:
> >
> > Uploaded to google drive, and sent you links in another email. Not sure if 
> > it works or not.
> >
> > Fande,
> >
> > On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
> >
> >It is too big for email you can post it somewhere so we can download it.
> >
> >
> > > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> > > I checked
> > > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > > they are virtually same. Why the version for BAIJ is so much slower?
> > > I'll investigate it.
> > >
> > > Fande,
> > > How large is your matrix? Is it possible to send us your matrix so I can 
> > > test it?
> > >
> > > Thanks, Hong,
> > >
> > > It is a 3020875x3020875 matrix, and it is large. I can make a small one 
> > > if you like, but not sure it will reproduce this issue or not.
> > >
> > > Fande,
> > >
> > >
> > >
> > > Hong
> > >
> > >
> > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
> > >
> > >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
> > > definitely should not be since it is (at least should be) doing a 
> > > symbolic factorization on a symbolic matrix 1/11th the size!
> > >
> > >Keep us informed.
> > >
> > >
> > >
> > > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> > > >
> > > > Thanks, Barry,
> > > >
> > > > Log info:
> > > >
> > > > AIJ:
> > > >
> > > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 
> > > > 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 
> > > > 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 
> > > > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > >
> > > > BAIJ:
> > > >
> > > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 
> > > > 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 
> > > > 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 
> > > > 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > >
> > > > It looks like both MatSolve and MatLUFactorNum are slower.
> > > >
> > > > I will try your suggestions.
> > > >
> > > > Fande
> > > >
> > > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
> > > >
> > > >   Note also that if the 11 by 11 blocks are actually sparse (and you 
> > > > don't store all the zeros in the blocks in the AIJ format) then then 
> > > > AIJ non-block factorization involves less floating point operations and 
> > > > less memory access so can be faster than the BAIJ format, depending on 
> > > > "how sparse" the blocks are. If you actually "fill in" the 11 by 11 
> > > > blocks with AIJ (with zeros maybe in certain locations) then the above 
> > > > is not true.
> > > >
> > > >
> > > > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> > > > >
> > > > >
> > > > >   This is because for block size 11 it is using calls to LAPACK/BLAS 
> > > > > for the block operations instead of custom routines for that block 
> > > > > size.
> > > > >
> > > > >   Here is what you need to do. For a good sized case run both with 
> > > > > -log_view and check the time spent in
> > > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and 
> > > > > BAIJ. If they have a different number of function calls then divide 
> > > > > by the function call count to determine the time per function call.
> > > > >
> > > > >   This will tell you which routine needs to be optimized first either 
> > > > > MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > > > >
> > > > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> > > > > MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> > > > > MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for 
> > > > > the block size of 11.
> > > > >
> > > > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 
> > > > > it uses the new routine something like.
> > > > >
> > > > > if (both_identity) {
> > > > >   if (b->bs == 11)
> > > > >C->ops->solve = 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Kong, Fande
On Tue, Mar 7, 2017 at 2:07 PM, Barry Smith  wrote:

>
>The matrix is too small. Please post ONE big matrix
>

I am using "-ksp_view_pmat  binary" to save the matrix. How can I save the
latest one only for a time-dependent problem?


Fande,



>
> > On Mar 7, 2017, at 2:26 PM, Kong, Fande  wrote:
> >
> > Uploaded to google drive, and sent you links in another email. Not sure
> if it works or not.
> >
> > Fande,
> >
> > On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
> >
> >It is too big for email you can post it somewhere so we can download
> it.
> >
> >
> > > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> > >
> > >
> > >
> > > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> > > I checked
> > > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > > they are virtually same. Why the version for BAIJ is so much slower?
> > > I'll investigate it.
> > >
> > > Fande,
> > > How large is your matrix? Is it possible to send us your matrix so I
> can test it?
> > >
> > > Thanks, Hong,
> > >
> > > It is a 3020875x3020875 matrix, and it is large. I can make a small
> one if you like, but not sure it will reproduce this issue or not.
> > >
> > > Fande,
> > >
> > >
> > >
> > > Hong
> > >
> > >
> > > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith 
> wrote:
> > >
> > >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
> definitely should not be since it is (at least should be) doing a symbolic
> factorization on a symbolic matrix 1/11th the size!
> > >
> > >Keep us informed.
> > >
> > >
> > >
> > > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> > > >
> > > > Thanks, Barry,
> > > >
> > > > Log info:
> > > >
> > > > AIJ:
> > > >
> > > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > >
> > > > BAIJ:
> > > >
> > > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > > >
> > > > It looks like both MatSolve and MatLUFactorNum are slower.
> > > >
> > > > I will try your suggestions.
> > > >
> > > > Fande
> > > >
> > > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith 
> wrote:
> > > >
> > > >   Note also that if the 11 by 11 blocks are actually sparse (and you
> don't store all the zeros in the blocks in the AIJ format) then then AIJ
> non-block factorization involves less floating point operations and less
> memory access so can be faster than the BAIJ format, depending on "how
> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
> AIJ (with zeros maybe in certain locations) then the above is not true.
> > > >
> > > >
> > > > > On Mar 6, 2017, at 5:10 PM, Barry Smith 
> wrote:
> > > > >
> > > > >
> > > > >   This is because for block size 11 it is using calls to
> LAPACK/BLAS for the block operations instead of custom routines for that
> block size.
> > > > >
> > > > >   Here is what you need to do. For a good sized case run both with
> -log_view and check the time spent in
> > > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ
> and BAIJ. If they have a different number of function calls then divide by
> the function call count to determine the time per function call.
> > > > >
> > > > >   This will tell you which routine needs to be optimized first
> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > > > >
> > > > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
> block size of 11.
> > > > >
> > > > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is
> 11 it uses the new routine something like.
> > > > >
> > > > > if (both_identity) {
> > > > >   if (b->bs == 11)
> > > > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > > > >   } else {
> > > > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > > > >   }
> > > > >
> > > > >   Rerun and look at the new -log_view. Send all three -log_view to
> use at this point.  If this optimization helps and now
> > > > > MatLUFactorNumeric is the time sink you can do the process to
> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Hong
Fande,
Got it. Below are what I get:

petsc/src/ksp/ksp/examples/tutorials (master)
$ ./ex10 -f0 binaryoutput -rhs 0 -mat_view ascii::ascii_info
Mat Object: 1 MPI processes
  type: seqaij
  rows=8019, cols=8019, bs=11
  total: nonzeros=1890625, allocated nonzeros=1890625
  total number of mallocs used during MatSetValues calls =0
using I-node routines: found 2187 nodes, limit used is 5
Number of iterations =   3
Residual norm 0.00200589

-mat_type aij
MatMult4 1.0 8.3621e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
0.0e+00  6  7  0  0  0   7  7  0  0  0  1805
MatSolve   4 1.0 8.3971e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
0.0e+00  6  7  0  0  0   7  7  0  0  0  1797
MatLUFactorNum 1 1.0 8.6171e-02 1.0 1.80e+08 1.0 0.0e+00 0.0e+00
0.0e+00 57 85  0  0  0  70 85  0  0  0  2086
MatILUFactorSym1 1.0 1.4951e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 10  0  0  0  0  12  0  0  0  0 0

-mat_type baij
MatMult4 1.0 5.5540e-03 1.0 1.51e+07 1.0 0.0e+00 0.0e+00
0.0e+00  4  5  0  0  0   7  5  0  0  0  2718
MatSolve   4 1.0 7.0803e-03 1.0 1.48e+07 1.0 0.0e+00 0.0e+00
0.0e+00  5  5  0  0  0   8  5  0  0  0  2086
MatLUFactorNum 1 1.0 6.0118e-02 1.0 2.55e+08 1.0 0.0e+00 0.0e+00
0.0e+00 42 89  0  0  0  72 89  0  0  0  4241
MatILUFactorSym1 1.0 6.7251e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  5  0  0  0  0   8  0  0  0  0 0

I ran it on my macpro. baij is faster than aij in all routines.

Hong

On Tue, Mar 7, 2017 at 2:26 PM, Kong, Fande  wrote:

> Uploaded to google drive, and sent you links in another email. Not sure if
> it works or not.
>
> Fande,
>
> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
>
>>
>>It is too big for email you can post it somewhere so we can download
>> it.
>>
>>
>>
>> > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
>> >
>> >
>> >
>> > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
>> > I checked
>> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
>> > they are virtually same. Why the version for BAIJ is so much slower?
>> > I'll investigate it.
>> >
>> > Fande,
>> > How large is your matrix? Is it possible to send us your matrix so I
>> can test it?
>> >
>> > Thanks, Hong,
>> >
>> > It is a 3020875x3020875 matrix, and it is large. I can make a small one
>> if you like, but not sure it will reproduce this issue or not.
>> >
>> > Fande,
>> >
>> >
>> >
>> > Hong
>> >
>> >
>> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
>> >
>> >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
>> definitely should not be since it is (at least should be) doing a symbolic
>> factorization on a symbolic matrix 1/11th the size!
>> >
>> >Keep us informed.
>> >
>> >
>> >
>> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
>> > >
>> > > Thanks, Barry,
>> > >
>> > > Log info:
>> > >
>> > > AIJ:
>> > >
>> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
>> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
>> > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
>> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
>> > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>> > >
>> > > BAIJ:
>> > >
>> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
>> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
>> > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
>> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
>> > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>> > >
>> > > It looks like both MatSolve and MatLUFactorNum are slower.
>> > >
>> > > I will try your suggestions.
>> > >
>> > > Fande
>> > >
>> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith 
>> wrote:
>> > >
>> > >   Note also that if the 11 by 11 blocks are actually sparse (and you
>> don't store all the zeros in the blocks in the AIJ format) then then AIJ
>> non-block factorization involves less floating point operations and less
>> memory access so can be faster than the BAIJ format, depending on "how
>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
>> AIJ (with zeros maybe in certain locations) then the above is not true.
>> > >
>> > >
>> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
>> > > >
>> > > >
>> > > >   This is because for block size 11 it is using calls to
>> LAPACK/BLAS for the block operations instead of custom routines for that
>> block size.
>> > > >
>> > > >   Here is what you need to do. For a good sized case run both with
>> -log_view and check the time spent in
>> > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
>> BAIJ. If they have a 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

   The matrix is too small. Please post ONE big matrix

> On Mar 7, 2017, at 2:26 PM, Kong, Fande  wrote:
> 
> Uploaded to google drive, and sent you links in another email. Not sure if it 
> works or not.
> 
> Fande,
> 
> On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:
> 
>It is too big for email you can post it somewhere so we can download it.
> 
> 
> > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> >
> >
> >
> > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> > I checked
> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > they are virtually same. Why the version for BAIJ is so much slower?
> > I'll investigate it.
> >
> > Fande,
> > How large is your matrix? Is it possible to send us your matrix so I can 
> > test it?
> >
> > Thanks, Hong,
> >
> > It is a 3020875x3020875 matrix, and it is large. I can make a small one if 
> > you like, but not sure it will reproduce this issue or not.
> >
> > Fande,
> >
> >
> >
> > Hong
> >
> >
> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
> >
> >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
> > definitely should not be since it is (at least should be) doing a symbolic 
> > factorization on a symbolic matrix 1/11th the size!
> >
> >Keep us informed.
> >
> >
> >
> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> > >
> > > Thanks, Barry,
> > >
> > > Log info:
> > >
> > > AIJ:
> > >
> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00 
> > > 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00 
> > > 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > >
> > > BAIJ:
> > >
> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 0.0e+00 
> > > 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 0.0e+00 
> > > 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
> > > 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > >
> > > It looks like both MatSolve and MatLUFactorNum are slower.
> > >
> > > I will try your suggestions.
> > >
> > > Fande
> > >
> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
> > >
> > >   Note also that if the 11 by 11 blocks are actually sparse (and you 
> > > don't store all the zeros in the blocks in the AIJ format) then then AIJ 
> > > non-block factorization involves less floating point operations and less 
> > > memory access so can be faster than the BAIJ format, depending on "how 
> > > sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks 
> > > with AIJ (with zeros maybe in certain locations) then the above is not 
> > > true.
> > >
> > >
> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> > > >
> > > >
> > > >   This is because for block size 11 it is using calls to LAPACK/BLAS 
> > > > for the block operations instead of custom routines for that block size.
> > > >
> > > >   Here is what you need to do. For a good sized case run both with 
> > > > -log_view and check the time spent in
> > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and 
> > > > BAIJ. If they have a different number of function calls then divide by 
> > > > the function call count to determine the time per function call.
> > > >
> > > >   This will tell you which routine needs to be optimized first either 
> > > > MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > > >
> > > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> > > > MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> > > > MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the 
> > > > block size of 11.
> > > >
> > > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 
> > > > it uses the new routine something like.
> > > >
> > > > if (both_identity) {
> > > >   if (b->bs == 11)
> > > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > > >   } else {
> > > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > > >   }
> > > >
> > > >   Rerun and look at the new -log_view. Send all three -log_view to use 
> > > > at this point.  If this optimization helps and now
> > > > MatLUFactorNumeric is the time sink you can do the process to 
> > > > MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size 
> > > > block custom version.
> > > >
> > > >  Barry
> > > >
> > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan 
> > > >> 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Kong, Fande
Uploaded to google drive, and sent you links in another email. Not sure if
it works or not.

Fande,

On Tue, Mar 7, 2017 at 12:29 PM, Barry Smith  wrote:

>
>It is too big for email you can post it somewhere so we can download it.
>
>
> > On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> >
> >
> >
> > On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> > I checked
> > MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> > they are virtually same. Why the version for BAIJ is so much slower?
> > I'll investigate it.
> >
> > Fande,
> > How large is your matrix? Is it possible to send us your matrix so I can
> test it?
> >
> > Thanks, Hong,
> >
> > It is a 3020875x3020875 matrix, and it is large. I can make a small one
> if you like, but not sure it will reproduce this issue or not.
> >
> > Fande,
> >
> >
> >
> > Hong
> >
> >
> > On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
> >
> >   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
> definitely should not be since it is (at least should be) doing a symbolic
> factorization on a symbolic matrix 1/11th the size!
> >
> >Keep us informed.
> >
> >
> >
> > > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> > >
> > > Thanks, Barry,
> > >
> > > Log info:
> > >
> > > AIJ:
> > >
> > > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > >
> > > BAIJ:
> > >
> > > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> > >
> > > It looks like both MatSolve and MatLUFactorNum are slower.
> > >
> > > I will try your suggestions.
> > >
> > > Fande
> > >
> > > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith 
> wrote:
> > >
> > >   Note also that if the 11 by 11 blocks are actually sparse (and you
> don't store all the zeros in the blocks in the AIJ format) then then AIJ
> non-block factorization involves less floating point operations and less
> memory access so can be faster than the BAIJ format, depending on "how
> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
> AIJ (with zeros maybe in certain locations) then the above is not true.
> > >
> > >
> > > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> > > >
> > > >
> > > >   This is because for block size 11 it is using calls to LAPACK/BLAS
> for the block operations instead of custom routines for that block size.
> > > >
> > > >   Here is what you need to do. For a good sized case run both with
> -log_view and check the time spent in
> > > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
> BAIJ. If they have a different number of function calls then divide by the
> function call count to determine the time per function call.
> > > >
> > > >   This will tell you which routine needs to be optimized first
> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > > >
> > > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
> block size of 11.
> > > >
> > > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is
> 11 it uses the new routine something like.
> > > >
> > > > if (both_identity) {
> > > >   if (b->bs == 11)
> > > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > > >   } else {
> > > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > > >   }
> > > >
> > > >   Rerun and look at the new -log_view. Send all three -log_view to
> use at this point.  If this optimization helps and now
> > > > MatLUFactorNumeric is the time sink you can do the process to
> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block
> custom version.
> > > >
> > > >  Barry
> > > >
> > > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
> patrick.sa...@gmail.com> wrote:
> > > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande 
> wrote:
> > > >>> Hi All,
> > > >>>
> > > >>> I am solving a nonlinear system whose Jacobian matrix has a block
> structure.
> > > >>> More precisely, there is a mesh, and for each vertex there are 11
> variables
> > > >>> 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Barry Smith

   It is too big for email you can post it somewhere so we can download it.


> On Mar 7, 2017, at 12:01 PM, Kong, Fande  wrote:
> 
> 
> 
> On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
> I checked 
> MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> they are virtually same. Why the version for BAIJ is so much slower?
> I'll investigate it. 
> 
> Fande,
> How large is your matrix? Is it possible to send us your matrix so I can test 
> it?
> 
> Thanks, Hong,
> 
> It is a 3020875x3020875 matrix, and it is large. I can make a small one if 
> you like, but not sure it will reproduce this issue or not.
> 
> Fande,
> 
>  
> 
> Hong
> 
> 
> On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
> 
>   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
> definitely should not be since it is (at least should be) doing a symbolic 
> factorization on a symbolic matrix 1/11th the size!
> 
>Keep us informed.
> 
> 
> 
> > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> >
> > Thanks, Barry,
> >
> > Log info:
> >
> > AIJ:
> >
> > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00 
> > 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00 
> > 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> >
> > BAIJ:
> >
> > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 0.0e+00 
> > 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 0.0e+00 
> > 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
> > 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> >
> > It looks like both MatSolve and MatLUFactorNum are slower.
> >
> > I will try your suggestions.
> >
> > Fande
> >
> > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
> >
> >   Note also that if the 11 by 11 blocks are actually sparse (and you don't 
> > store all the zeros in the blocks in the AIJ format) then then AIJ 
> > non-block factorization involves less floating point operations and less 
> > memory access so can be faster than the BAIJ format, depending on "how 
> > sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with 
> > AIJ (with zeros maybe in certain locations) then the above is not true.
> >
> >
> > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> > >
> > >
> > >   This is because for block size 11 it is using calls to LAPACK/BLAS for 
> > > the block operations instead of custom routines for that block size.
> > >
> > >   Here is what you need to do. For a good sized case run both with 
> > > -log_view and check the time spent in
> > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and BAIJ. 
> > > If they have a different number of function calls then divide by the 
> > > function call count to determine the time per function call.
> > >
> > >   This will tell you which routine needs to be optimized first either 
> > > MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > >
> > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> > > MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> > > MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the 
> > > block size of 11.
> > >
> > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it 
> > > uses the new routine something like.
> > >
> > > if (both_identity) {
> > >   if (b->bs == 11)
> > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > >   } else {
> > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > >   }
> > >
> > >   Rerun and look at the new -log_view. Send all three -log_view to use at 
> > > this point.  If this optimization helps and now
> > > MatLUFactorNumeric is the time sink you can do the process to 
> > > MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block 
> > > custom version.
> > >
> > >  Barry
> > >
> > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> > >>
> > >>
> > >>
> > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan  
> > >> wrote:
> > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
> > >>> Hi All,
> > >>>
> > >>> I am solving a nonlinear system whose Jacobian matrix has a block 
> > >>> structure.
> > >>> More precisely, there is a mesh, and for each vertex there are 11 
> > >>> variables
> > >>> associated with it. I am using BAIJ.
> > >>>
> > >>> I thought block ILU(k) should be more efficient than the point-wise 
> > >>> ILU(k).
> > >>> After some numerical experiments, I found that the block ILU(K) is much
> > >>> slower than the 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Hong
Fande :
A small one, e.g., the size used by a sequential diagonal block  for ilu
preconditioner would work.

Thanks,
Hong

>
>
> On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:
>
>> I checked
>> MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
>> they are virtually same. Why the version for BAIJ is so much slower?
>> I'll investigate it.
>>
>
>> Fande,
>> How large is your matrix? Is it possible to send us your matrix so I can
>> test it?
>>
>
> Thanks, Hong,
>
> It is a 3020875x3020875 matrix, and it is large. I can make a small one if
> you like, but not sure it will reproduce this issue or not.
>
> Fande,
>
>
>
>>
>> Hong
>>
>>
>> On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
>>
>>>
>>>   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
>>> definitely should not be since it is (at least should be) doing a symbolic
>>> factorization on a symbolic matrix 1/11th the size!
>>>
>>>Keep us informed.
>>>
>>>
>>>
>>> > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
>>> >
>>> > Thanks, Barry,
>>> >
>>> > Log info:
>>> >
>>> > AIJ:
>>> >
>>> > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
>>> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
>>> > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
>>> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
>>> > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>>> >
>>> > BAIJ:
>>> >
>>> > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
>>> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
>>> > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
>>> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
>>> > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
>>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>>> >
>>> > It looks like both MatSolve and MatLUFactorNum are slower.
>>> >
>>> > I will try your suggestions.
>>> >
>>> > Fande
>>> >
>>> > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith 
>>> wrote:
>>> >
>>> >   Note also that if the 11 by 11 blocks are actually sparse (and you
>>> don't store all the zeros in the blocks in the AIJ format) then then AIJ
>>> non-block factorization involves less floating point operations and less
>>> memory access so can be faster than the BAIJ format, depending on "how
>>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
>>> AIJ (with zeros maybe in certain locations) then the above is not true.
>>> >
>>> >
>>> > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
>>> > >
>>> > >
>>> > >   This is because for block size 11 it is using calls to LAPACK/BLAS
>>> for the block operations instead of custom routines for that block size.
>>> > >
>>> > >   Here is what you need to do. For a good sized case run both with
>>> -log_view and check the time spent in
>>> > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
>>> BAIJ. If they have a different number of function calls then divide by the
>>> function call count to determine the time per function call.
>>> > >
>>> > >   This will tell you which routine needs to be optimized first
>>> either MatLUFactorNumeric or MatSolve. My guess is MatSolve.
>>> > >
>>> > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the
>>> function MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
>>> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
>>> block size of 11.
>>> > >
>>> > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is
>>> 11 it uses the new routine something like.
>>> > >
>>> > > if (both_identity) {
>>> > >   if (b->bs == 11)
>>> > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
>>> > >   } else {
>>> > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
>>> > >   }
>>> > >
>>> > >   Rerun and look at the new -log_view. Send all three -log_view to
>>> use at this point.  If this optimization helps and now
>>> > > MatLUFactorNumeric is the time sink you can do the process to
>>> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size
>>> block custom version.
>>> > >
>>> > >  Barry
>>> > >
>>> > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
>>> > >>
>>> > >>
>>> > >>
>>> > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
>>> patrick.sa...@gmail.com> wrote:
>>> > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande 
>>> wrote:
>>> > >>> Hi All,
>>> > >>>
>>> > >>> I am solving a nonlinear system whose Jacobian matrix has a block
>>> structure.
>>> > >>> More precisely, there is a mesh, and for each vertex there are 11
>>> variables
>>> > >>> associated with it. I am using BAIJ.
>>> > >>>
>>> > >>> I thought block ILU(k) should be more efficient than the
>>> point-wise ILU(k).
>>> > >>> After some numerical 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Kong, Fande
On Tue, Mar 7, 2017 at 10:23 AM, Hong  wrote:

> I checked
> MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
> they are virtually same. Why the version for BAIJ is so much slower?
> I'll investigate it.
>

> Fande,
> How large is your matrix? Is it possible to send us your matrix so I can
> test it?
>

Thanks, Hong,

It is a 3020875x3020875 matrix, and it is large. I can make a small one if
you like, but not sure it will reproduce this issue or not.

Fande,



>
> Hong
>
>
> On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:
>
>>
>>   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
>> definitely should not be since it is (at least should be) doing a symbolic
>> factorization on a symbolic matrix 1/11th the size!
>>
>>Keep us informed.
>>
>>
>>
>> > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
>> >
>> > Thanks, Barry,
>> >
>> > Log info:
>> >
>> > AIJ:
>> >
>> > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00
>> 0.0e+00 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
>> > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00
>> 0.0e+00 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
>> > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>> >
>> > BAIJ:
>> >
>> > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00
>> 0.0e+00 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
>> > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00
>> 0.0e+00 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
>> > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>> >
>> > It looks like both MatSolve and MatLUFactorNum are slower.
>> >
>> > I will try your suggestions.
>> >
>> > Fande
>> >
>> > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
>> >
>> >   Note also that if the 11 by 11 blocks are actually sparse (and you
>> don't store all the zeros in the blocks in the AIJ format) then then AIJ
>> non-block factorization involves less floating point operations and less
>> memory access so can be faster than the BAIJ format, depending on "how
>> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
>> AIJ (with zeros maybe in certain locations) then the above is not true.
>> >
>> >
>> > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
>> > >
>> > >
>> > >   This is because for block size 11 it is using calls to LAPACK/BLAS
>> for the block operations instead of custom routines for that block size.
>> > >
>> > >   Here is what you need to do. For a good sized case run both with
>> -log_view and check the time spent in
>> > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
>> BAIJ. If they have a different number of function calls then divide by the
>> function call count to determine the time per function call.
>> > >
>> > >   This will tell you which routine needs to be optimized first either
>> MatLUFactorNumeric or MatSolve. My guess is MatSolve.
>> > >
>> > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function
>> MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
>> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
>> block size of 11.
>> > >
>> > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11
>> it uses the new routine something like.
>> > >
>> > > if (both_identity) {
>> > >   if (b->bs == 11)
>> > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
>> > >   } else {
>> > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
>> > >   }
>> > >
>> > >   Rerun and look at the new -log_view. Send all three -log_view to
>> use at this point.  If this optimization helps and now
>> > > MatLUFactorNumeric is the time sink you can do the process to
>> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block
>> custom version.
>> > >
>> > >  Barry
>> > >
>> > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
>> > >>
>> > >>
>> > >>
>> > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
>> patrick.sa...@gmail.com> wrote:
>> > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande 
>> wrote:
>> > >>> Hi All,
>> > >>>
>> > >>> I am solving a nonlinear system whose Jacobian matrix has a block
>> structure.
>> > >>> More precisely, there is a mesh, and for each vertex there are 11
>> variables
>> > >>> associated with it. I am using BAIJ.
>> > >>>
>> > >>> I thought block ILU(k) should be more efficient than the point-wise
>> ILU(k).
>> > >>> After some numerical experiments, I found that the block ILU(K) is
>> much
>> > >>> slower than the point-wise version.
>> > >> Do you mean that it takes more iterations to converge, or that the
>> > >> time per iteration is greater, or both?
>> > >>
>> > >> The number of iterations is very 

Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-07 Thread Hong
I checked
MatILUFactorSymbolic_SeqBAIJ() and MatILUFactorSymbolic_SeqAIJ(),
they are virtually same. Why the version for BAIJ is so much slower?
I'll investigate it.

Fande,
How large is your matrix? Is it possible to send us your matrix so I can
test it?

Hong


On Mon, Mar 6, 2017 at 9:08 PM, Barry Smith  wrote:

>
>   Thanks. Even the symbolic is slower for BAIJ. I don't like that, it
> definitely should not be since it is (at least should be) doing a symbolic
> factorization on a symbolic matrix 1/11th the size!
>
>Keep us informed.
>
>
>
> > On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> >
> > Thanks, Barry,
> >
> > Log info:
> >
> > AIJ:
> >
> > MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00
> 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> > MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00
> 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> > MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> >
> > BAIJ:
> >
> > MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 0.0e+00
> 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> > MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 0.0e+00
> 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> > MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> >
> > It looks like both MatSolve and MatLUFactorNum are slower.
> >
> > I will try your suggestions.
> >
> > Fande
> >
> > On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
> >
> >   Note also that if the 11 by 11 blocks are actually sparse (and you
> don't store all the zeros in the blocks in the AIJ format) then then AIJ
> non-block factorization involves less floating point operations and less
> memory access so can be faster than the BAIJ format, depending on "how
> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
> AIJ (with zeros maybe in certain locations) then the above is not true.
> >
> >
> > > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> > >
> > >
> > >   This is because for block size 11 it is using calls to LAPACK/BLAS
> for the block operations instead of custom routines for that block size.
> > >
> > >   Here is what you need to do. For a good sized case run both with
> -log_view and check the time spent in
> > > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
> BAIJ. If they have a different number of function calls then divide by the
> function call count to determine the time per function call.
> > >
> > >   This will tell you which routine needs to be optimized first either
> MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> > >
> > >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function
> MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
> block size of 11.
> > >
> > >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11
> it uses the new routine something like.
> > >
> > > if (both_identity) {
> > >   if (b->bs == 11)
> > >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> > >   } else {
> > >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> > >   }
> > >
> > >   Rerun and look at the new -log_view. Send all three -log_view to use
> at this point.  If this optimization helps and now
> > > MatLUFactorNumeric is the time sink you can do the process to
> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block
> custom version.
> > >
> > >  Barry
> > >
> > >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> > >>
> > >>
> > >>
> > >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan <
> patrick.sa...@gmail.com> wrote:
> > >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande 
> wrote:
> > >>> Hi All,
> > >>>
> > >>> I am solving a nonlinear system whose Jacobian matrix has a block
> structure.
> > >>> More precisely, there is a mesh, and for each vertex there are 11
> variables
> > >>> associated with it. I am using BAIJ.
> > >>>
> > >>> I thought block ILU(k) should be more efficient than the point-wise
> ILU(k).
> > >>> After some numerical experiments, I found that the block ILU(K) is
> much
> > >>> slower than the point-wise version.
> > >> Do you mean that it takes more iterations to converge, or that the
> > >> time per iteration is greater, or both?
> > >>
> > >> The number of iterations is very similar, but the timer per iteration
> is greater.
> > >>
> > >>
> > >>>
> > >>> Any thoughts?
> > >>>
> > >>> Fande,
> > >>
> > >
> >
> >
>
>


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Barry Smith

  Thanks. Even the symbolic is slower for BAIJ. I don't like that, it 
definitely should not be since it is (at least should be) doing a symbolic 
factorization on a symbolic matrix 1/11th the size! 
 
   Keep us informed.



> On Mar 6, 2017, at 5:44 PM, Kong, Fande  wrote:
> 
> Thanks, Barry,
> 
> Log info:
> 
> AIJ:
> 
> MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00 
> 0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
> MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00 
> 0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
> MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> 
> BAIJ:
> 
> MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 0.0e+00 
> 0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
> MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 0.0e+00 
> 0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
> MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> 
> It looks like both MatSolve and MatLUFactorNum are slower.
> 
> I will try your suggestions.
> 
> Fande
> 
> On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:
> 
>   Note also that if the 11 by 11 blocks are actually sparse (and you don't 
> store all the zeros in the blocks in the AIJ format) then then AIJ non-block 
> factorization involves less floating point operations and less memory access 
> so can be faster than the BAIJ format, depending on "how sparse" the blocks 
> are. If you actually "fill in" the 11 by 11 blocks with AIJ (with zeros maybe 
> in certain locations) then the above is not true.
> 
> 
> > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> >
> >
> >   This is because for block size 11 it is using calls to LAPACK/BLAS for 
> > the block operations instead of custom routines for that block size.
> >
> >   Here is what you need to do. For a good sized case run both with 
> > -log_view and check the time spent in
> > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and BAIJ. 
> > If they have a different number of function calls then divide by the 
> > function call count to determine the time per function call.
> >
> >   This will tell you which routine needs to be optimized first either 
> > MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> >
> >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> > MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> > MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the 
> > block size of 11.
> >
> >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it 
> > uses the new routine something like.
> >
> > if (both_identity) {
> >   if (b->bs == 11)
> >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> >   } else {
> >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> >   }
> >
> >   Rerun and look at the new -log_view. Send all three -log_view to use at 
> > this point.  If this optimization helps and now
> > MatLUFactorNumeric is the time sink you can do the process to 
> > MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block 
> > custom version.
> >
> >  Barry
> >
> >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> >>
> >>
> >>
> >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan  
> >> wrote:
> >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
> >>> Hi All,
> >>>
> >>> I am solving a nonlinear system whose Jacobian matrix has a block 
> >>> structure.
> >>> More precisely, there is a mesh, and for each vertex there are 11 
> >>> variables
> >>> associated with it. I am using BAIJ.
> >>>
> >>> I thought block ILU(k) should be more efficient than the point-wise 
> >>> ILU(k).
> >>> After some numerical experiments, I found that the block ILU(K) is much
> >>> slower than the point-wise version.
> >> Do you mean that it takes more iterations to converge, or that the
> >> time per iteration is greater, or both?
> >>
> >> The number of iterations is very similar, but the timer per iteration is 
> >> greater.
> >>
> >>
> >>>
> >>> Any thoughts?
> >>>
> >>> Fande,
> >>
> >
> 
> 



Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Kong, Fande
Thanks, Barry,

Log info:

AIJ:

MatSolve 850 1.0 8.6543e+00 4.2 3.04e+09 1.8 0.0e+00 0.0e+00
0.0e+00  0 41  0  0  0   0 41  0  0  0 49594
MatLUFactorNum25 1.0 1.7622e+00 2.0 2.04e+09 2.1 0.0e+00 0.0e+00
0.0e+00  0 26  0  0  0   0 26  0  0  0 153394
MatILUFactorSym   13 1.0 2.8002e-01 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0

BAIJ:

MatSolve 826 1.0 1.3016e+01 1.7 1.42e+10 1.8 0.0e+00 0.0e+00
0.0e+00  1 29  0  0  0   1 29  0  0  0 154617
MatLUFactorNum25 1.0 1.5503e+01 2.0 3.55e+10 2.1 0.0e+00 0.0e+00
0.0e+00  1 67  0  0  0   1 67  0  0  0 303190
MatILUFactorSym   13 1.0 5.7561e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0

It looks like both MatSolve and MatLUFactorNum are slower.

I will try your suggestions.

Fande

On Mon, Mar 6, 2017 at 4:14 PM, Barry Smith  wrote:

>
>   Note also that if the 11 by 11 blocks are actually sparse (and you don't
> store all the zeros in the blocks in the AIJ format) then then AIJ
> non-block factorization involves less floating point operations and less
> memory access so can be faster than the BAIJ format, depending on "how
> sparse" the blocks are. If you actually "fill in" the 11 by 11 blocks with
> AIJ (with zeros maybe in certain locations) then the above is not true.
>
>
> > On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> >
> >
> >   This is because for block size 11 it is using calls to LAPACK/BLAS for
> the block operations instead of custom routines for that block size.
> >
> >   Here is what you need to do. For a good sized case run both with
> -log_view and check the time spent in
> > MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and
> BAIJ. If they have a different number of function calls then divide by the
> function call count to determine the time per function call.
> >
> >   This will tell you which routine needs to be optimized first either
> MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> >
> >   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function
> MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function
> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the
> block size of 11.
> >
> >   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it
> uses the new routine something like.
> >
> > if (both_identity) {
> >   if (b->bs == 11)
> >C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
> >   } else {
> >C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
> >   }
> >
> >   Rerun and look at the new -log_view. Send all three -log_view to use
> at this point.  If this optimization helps and now
> > MatLUFactorNumeric is the time sink you can do the process to
> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block
> custom version.
> >
> >  Barry
> >
> >> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> >>
> >>
> >>
> >> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan 
> wrote:
> >> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
> >>> Hi All,
> >>>
> >>> I am solving a nonlinear system whose Jacobian matrix has a block
> structure.
> >>> More precisely, there is a mesh, and for each vertex there are 11
> variables
> >>> associated with it. I am using BAIJ.
> >>>
> >>> I thought block ILU(k) should be more efficient than the point-wise
> ILU(k).
> >>> After some numerical experiments, I found that the block ILU(K) is much
> >>> slower than the point-wise version.
> >> Do you mean that it takes more iterations to converge, or that the
> >> time per iteration is greater, or both?
> >>
> >> The number of iterations is very similar, but the timer per iteration
> is greater.
> >>
> >>
> >>>
> >>> Any thoughts?
> >>>
> >>> Fande,
> >>
> >
>
>


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Barry Smith

  Note also that if the 11 by 11 blocks are actually sparse (and you don't 
store all the zeros in the blocks in the AIJ format) then then AIJ non-block 
factorization involves less floating point operations and less memory access so 
can be faster than the BAIJ format, depending on "how sparse" the blocks are. 
If you actually "fill in" the 11 by 11 blocks with AIJ (with zeros maybe in 
certain locations) then the above is not true.


> On Mar 6, 2017, at 5:10 PM, Barry Smith  wrote:
> 
> 
>   This is because for block size 11 it is using calls to LAPACK/BLAS for the 
> block operations instead of custom routines for that block size.
> 
>   Here is what you need to do. For a good sized case run both with -log_view 
> and check the time spent in 
> MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and BAIJ. If 
> they have a different number of function calls then divide by the function 
> call count to determine the time per function call.
> 
>   This will tell you which routine needs to be optimized first either 
> MatLUFactorNumeric or MatSolve. My guess is MatSolve.
> 
>   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
> MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
> MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the block 
> size of 11. 
> 
>   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it uses 
> the new routine something like.
> 
> if (both_identity) {
>   if (b->bs == 11) 
>C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
>   } else {
>C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
>   }
> 
>   Rerun and look at the new -log_view. Send all three -log_view to use at 
> this point.  If this optimization helps and now 
> MatLUFactorNumeric is the time sink you can do the process to 
> MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block 
> custom version.
> 
>  Barry
> 
>> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
>> 
>> 
>> 
>> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan  
>> wrote:
>> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
>>> Hi All,
>>> 
>>> I am solving a nonlinear system whose Jacobian matrix has a block structure.
>>> More precisely, there is a mesh, and for each vertex there are 11 variables
>>> associated with it. I am using BAIJ.
>>> 
>>> I thought block ILU(k) should be more efficient than the point-wise ILU(k).
>>> After some numerical experiments, I found that the block ILU(K) is much
>>> slower than the point-wise version.
>> Do you mean that it takes more iterations to converge, or that the
>> time per iteration is greater, or both?
>> 
>> The number of iterations is very similar, but the timer per iteration is 
>> greater.
>> 
>> 
>>> 
>>> Any thoughts?
>>> 
>>> Fande,
>> 
> 



Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Barry Smith

   This is because for block size 11 it is using calls to LAPACK/BLAS for the 
block operations instead of custom routines for that block size.

   Here is what you need to do. For a good sized case run both with -log_view 
and check the time spent in 
MatLUFactorNumeric, MatLUFactorSymbolic and in MatSolve for AIJ and BAIJ. If 
they have a different number of function calls then divide by the function call 
count to determine the time per function call.

   This will tell you which routine needs to be optimized first either 
MatLUFactorNumeric or MatSolve. My guess is MatSolve.

   So edit src/mat/impls/baij/seq/baijsolvnat.c and copy the function 
MatSolve_SeqBAIJ_15_NaturalOrdering_ver1() to a new function 
MatSolve_SeqBAIJ_11_NaturalOrdering_ver1. Edit the new function for the block 
size of 11. 

   Now edit MatLUFactorNumeric_SeqBAIJ_N() so that if block size is 11 it uses 
the new routine something like.

if (both_identity) {
   if (b->bs == 11) 
C->ops->solve = MatSolve_SeqBAIJ_11_NaturalOrdering_ver1;
   } else {
C->ops->solve = MatSolve_SeqBAIJ_N_NaturalOrdering;
   }
   
   Rerun and look at the new -log_view. Send all three -log_view to use at this 
point.  If this optimization helps and now 
MatLUFactorNumeric is the time sink you can do the process to 
MatLUFactorNumeric_SeqBAIJ_15_NaturalOrdering() to make an 11 size block custom 
version.

  Barry

> On Mar 6, 2017, at 4:32 PM, Kong, Fande  wrote:
> 
> 
> 
> On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan  wrote:
> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
> > Hi All,
> >
> > I am solving a nonlinear system whose Jacobian matrix has a block structure.
> > More precisely, there is a mesh, and for each vertex there are 11 variables
> > associated with it. I am using BAIJ.
> >
> > I thought block ILU(k) should be more efficient than the point-wise ILU(k).
> > After some numerical experiments, I found that the block ILU(K) is much
> > slower than the point-wise version.
> Do you mean that it takes more iterations to converge, or that the
> time per iteration is greater, or both?
> 
> The number of iterations is very similar, but the timer per iteration is 
> greater.
> 
>  
> >
> > Any thoughts?
> >
> > Fande,
> 



Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Kong, Fande
On Mon, Mar 6, 2017 at 3:27 PM, Patrick Sanan 
wrote:

> On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
> > Hi All,
> >
> > I am solving a nonlinear system whose Jacobian matrix has a block
> structure.
> > More precisely, there is a mesh, and for each vertex there are 11
> variables
> > associated with it. I am using BAIJ.
> >
> > I thought block ILU(k) should be more efficient than the point-wise
> ILU(k).
> > After some numerical experiments, I found that the block ILU(K) is much
> > slower than the point-wise version.
> Do you mean that it takes more iterations to converge, or that the
> time per iteration is greater, or both?
>

The number of iterations is very similar, but the timer per iteration is
greater.



> >
> > Any thoughts?
> >
> > Fande,
>


Re: [petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Patrick Sanan
On Mon, Mar 6, 2017 at 1:48 PM, Kong, Fande  wrote:
> Hi All,
>
> I am solving a nonlinear system whose Jacobian matrix has a block structure.
> More precisely, there is a mesh, and for each vertex there are 11 variables
> associated with it. I am using BAIJ.
>
> I thought block ILU(k) should be more efficient than the point-wise ILU(k).
> After some numerical experiments, I found that the block ILU(K) is much
> slower than the point-wise version.
Do you mean that it takes more iterations to converge, or that the
time per iteration is greater, or both?
>
> Any thoughts?
>
> Fande,


[petsc-users] block ILU(K) is slower than the point-wise version?

2017-03-06 Thread Kong, Fande
Hi All,

I am solving a nonlinear system whose Jacobian matrix has a block
structure. More precisely, there is a mesh, and for each vertex there are
11 variables associated with it. I am using BAIJ.

I thought block ILU(k) should be more efficient than the point-wise ILU(k).
After some numerical experiments, I found that the block ILU(K) is much
slower than the point-wise version.

Any thoughts?

Fande,