Hao: I would suggest to use a parallel sparse direct solver, e.g., superlu_dist or mumps. These solvers can take advantage of your sparse data structure. Once it works, then you may play with other preconditioners, such as bjacobi + lu/ilu. See https://www.mcs.anl.gov/petsc/miscellaneous/external.html Hong
Dear all, > > > I have a few questions about the implementation of diagonal ILU PC in > PETSc. I want to solve a very simple system with KSP (in parallel), the > nature of the system (finite difference time-harmonic Maxwell) is probably > not important to the question itself. Long story short, I just need to > solve a set of equations of Ax = b with a block diagonal system matrix, > like (not sure if the mono font works): > > |X | > A =| Y | > | Z| > > Note that A is not really block-diagonal, it’s just a multi-diagonal > matrix determined by the FD mesh, where most elements are close to > diagonal. So instead of a full ILU decomposition, a D-ILU is a good > approximation as a preconditioner, and the number of blocks should not > matter too much: > > |Lx | |Ux | > L = | Ly | and U = | Uy | > | Lz| | Uz| > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N > blocks) is quite efficient with Krylov subspace methods like BiCGstab or > QMR in my sequential Fortran/Matlab code. > > So like most users, I am looking for a parallel implement with this > problem in PETSc. After looking through the manual, it seems that the > most straightforward way to do it is through PCBJACOBI. Not sure I > understand it right, I just setup a 3-block PCJACOBI and give each of the > block a KSP with PCILU. Is this supposed to be equivalent to my > D-ILU preconditioner? My little block of fortran code would look like: > ... > * call* PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > & isubs,ierr) > *call* PCBJacobiSetLocalBlocks(pc_local, Nsub, & > & isubs(istart:iend),ierr) > ! set up the block jacobi structure > *call* KSPSetup(ksp_local,ierr) > ! allocate sub ksps > allocate(ksp_sub(Nsub)) > *call* PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > & ksp_sub,ierr) > do i=1,Nsub > *call* KSPGetPC(ksp_sub(i),pc_sub,ierr) > !ILU preconditioner > *call* PCSetType(pc_sub,ptype,ierr) > *call* PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > *call* KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > end do > *call* KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > … > > I understand that the parallel performance may not be comparable, so I > first set up a one-process test (with MPIAij, but all the rows are local > since there is only one process). The system is solved without any > problem (identical results within error). But the performance is actually a > lot worse (code built without debugging flags in performance tests) than my > own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse > matrix format), which is hard to believe. I suspect the difference is from > the PC as the PETSc version took much more BiCGstab iterations (60-ish vs > 100-ish) to converge to the same relative tol. > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 > blocks instead of 3). While my Fortran/Matlab codes see minimal performance > difference (<5%) when I play with the D-ILU setup, increasing the number of > D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a > performance decrease of more than 50% in sequential test. So my > implementation IS somewhat different in PETSc. Do I miss something in the > PCBJACOBI setup? Or do I have some fundamental misunderstanding of how > PCBJACOBI works in PETSc? > > If this is not the right way to implement a block diagonal ILU as > (parallel) PC, please kindly point me to the right direction. I searched > through the mail list to find some answers, only to find a couple of > similar questions... An example would be nice. > > On the other hand, does PETSc support a simple way to use explicit L/U > matrix as a preconditioner? I can import the D-ILU matrix (I already > converted my A matrix into Mat) constructed in my Fortran code to make a > better comparison. Or do I have to construct my own PC using PCshell? If > so, is there a good tutorial/example to learn about how to use PCSHELL (in > a more advanced sense, like how to setup pc side and transpose)? > > Thanks in advance, > > Hao > >
