Dear all,

I have a few questions about the implementation of diagonal ILU PC in PETSc. I 
want to solve a very simple system with KSP (in parallel), the nature of the 
system (finite difference time-harmonic Maxwell) is probably not important to 
the question itself. Long story short, I just need to solve a set of equations 
of Ax = b with a block diagonal system matrix, like (not sure if the mono font 
works):

   |X    |
A =|  Y  |
   |    Z|

Note that A is not really block-diagonal, it’s just a multi-diagonal matrix 
determined by the FD mesh, where most elements are close to diagonal. So 
instead of a full ILU decomposition, a D-ILU is a good approximation as a 
preconditioner, and the number of blocks should not matter too much:

    |Lx      |         |Ux      |
L = |   Ly   | and U = |   Uy   |
    |      Lz|         |      Uz|

Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N 
blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in 
my sequential Fortran/Matlab code.

So like most users, I am looking for a parallel implement with this problem in 
PETSc. After looking through the manual, it seems that the most straightforward 
way to do it is through PCBJACOBI. Not sure I understand it right, I just setup 
a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this 
supposed to be equivalent to my D-ILU preconditioner? My little block of 
fortran code would look like:
...
      call PCBJacobiSetTotalBlocks(pc_local,Ntotal,                   &
     &     isubs,ierr)
      call PCBJacobiSetLocalBlocks(pc_local, Nsub,                    &
     &    isubs(istart:iend),ierr)
      ! set up the block jacobi structure
      call KSPSetup(ksp_local,ierr)
      ! allocate sub ksps
      allocate(ksp_sub(Nsub))
      call PCBJacobiGetSubKSP(pc_local,Nsub,istart,                   &
     &     ksp_sub,ierr)
      do i=1,Nsub
          call KSPGetPC(ksp_sub(i),pc_sub,ierr)
          !ILU preconditioner
          call PCSetType(pc_sub,ptype,ierr)
          call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
          call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
      end do
      call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
     &     PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
…

I understand that the parallel performance may not be comparable, so I first 
set up a one-process test (with MPIAij, but all the rows are local since there 
is only one process). The system is solved without any problem (identical 
results within error). But the performance is actually a lot worse (code built 
without debugging flags in performance tests) than my own home-brew 
implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), 
which is hard to believe. I suspect the difference is from the PC as the PETSc 
version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to 
the same relative tol.

This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks 
instead of 3). While my Fortran/Matlab codes see minimal performance difference 
(<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks 
from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance 
decrease of more than 50% in sequential test. So my implementation IS somewhat 
different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have 
some fundamental misunderstanding of how PCBJACOBI works in PETSc?

If this is not the right way to implement a block diagonal ILU as (parallel) 
PC, please kindly point me to the right direction. I searched through the mail 
list to find some answers, only to find a couple of similar questions... An 
example would be nice.

On the other hand, does PETSc support a simple way to use explicit L/U matrix 
as a preconditioner? I can import the  D-ILU matrix (I already converted my A 
matrix into Mat) constructed in my Fortran code to make a better comparison. Or 
do I have to construct my own PC using PCshell? If so, is there a good 
tutorial/example to learn about how to use PCSHELL (in a more advanced sense, 
like how to setup pc side and transpose)?

Thanks in advance,

Hao

Reply via email to