Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?

2020-02-04 Thread Smith, Barry F. via petsc-users

   I am not sure of everything in your email but it sounds like you want to use 
a "Picard" iteration to solve [K(u)−kaaT]Δu=−F(u). That is solve

  A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n}  where A(u) = K(u) - 
kaaT

 PETSc provides code to this with SNESSetPicard() (see the manual pages) I 
don't know if Petsc4py has bindings for this.

  Adding missing python bindings is not terribly difficult and you may be able 
to do it yourself if this is the approach you want.

   Barry

   

> On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski  wrote:
> 
> Hello, 
> I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver 
> through the SNES object to implement the Sherman-Morrison formula(e.g.  
> http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a 
> nonlinear system of the form [K(u)−kaaT]Δu=−F(u). Here the jacobian matrix K 
> is modified by the term kaaT, where k is a scalar.  Notably, K is sparse, 
> while the term kaaT results in a full matrix. This problem can be solved 
> efficiently using the Sherman-Morrison formula : 
> 
> [K−kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
> I have managed to successfully implement this at the linear solve level (by 
> modifying the KSP solver) inside a custom Newton solver in python by 
> following an incomplete tutorial at 
> https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner
>  :
> • while (norm(delU) > alpha):  # while not converged
> •   
> • self.update_F()  # call to method to update r.h.s form
> • self.update_K()  # call to update the jacobian form
> • K = assemble(self.K)  # assemble the jacobian matrix
> • F = assemble(self.F)  # assemble the r.h.s vector
> • a = assemble(self.a_form)  # assemble the a_form (see 
> Sherman Morrison formula)
> •   
> • for bc in self.mem.bc:  # apply boundary conditions
> • bc.apply(K, F)  
> • bc.apply(K, a)  
> •   
> • B = PETSc.Mat().create()  
> •   
> • # Assemble the bilinear form that defines A and get the 
> concrete  
> • # PETSc matrix  
> • A = as_backend_type(K).mat()  # get the PETSc objects for K 
> and a
> • u = as_backend_type(a).vec()  
> •   
> • # Build the matrix "context"  # see firedrake docs
> • Bctx = MatrixFreeB(A, u, u, self.k)  
> •   
> • # Set up B  
> • # B is the same size as A  
> • B.setSizes(*A.getSizes())  
> • B.setType(B.Type.PYTHON)  
> • B.setPythonContext(Bctx)  
> • B.setUp()  
> •   
> •   
> • ksp = PETSc.KSP().create()   # create the KSP linear solver 
> object
> • ksp.setOperators(B)  
> • ksp.setUp()  
> • pc = ksp.pc  
> • pc.setType(pc.Type.PYTHON)  
> • pc.setPythonContext(MatrixFreePC())  
> • ksp.setFromOptions()  
> •   
> • solution = delU# the incremental displacement at this 
> iteration
> •   
> • b = as_backend_type(-F).vec()  
> • delu = solution.vector().vec()  
> •   
> • ksp.solve(b, delu) 
> 
> • self.mem.u.vector().axpy(0.25, self.delU.vector())  # poor 
> man's linesearch
> • counter += 1  
> Here is the corresponding petsc4py code adapted from the firedrake docs:
> 
>   • class MatrixFreePC(object):  
>   •   
>   • def setUp(self, pc):  
>   • B, P = pc.getOperators()  
>   • # extract the MatrixFreeB object from B  
>   • ctx = B.getPythonContext()  
>   • self.A = ctx.A  
>   • self.u = ctx.u  
>   • self.v = ctx.v  
>   • self.k = ctx.k  
>   • # Here we build the PC object that uses the concrete,  
>   • # assembled matrix A.  We will use this to apply the action  
>   • # of A^{-1}  
>   • self.pc = PETSc.PC().create()  
>   • self.pc.setOptionsPrefix("mf_")  
>   • self.pc.setOperators(self.A)  
>   • self.pc.setFromOptions()  
>   • # Since u and v do not change, we can build the denominator  
>   • # and the action of A^{-1} on u only once, in the setup  
>   • # phase.  
>   • tmp = self.A.createVecLeft()  
>   • self.pc.apply(self.u, tmp)  
>   • self._Ainvu = tmp  
>   • self._denom = 1 + self.k*self.v.dot(self._Ainvu)  
>   •   
>   • def apply(self, pc, x, y):  
>   • # y <- A^{-1}x  
>   • self.pc.apply(x, y)  
>   • # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)  
>   • alpha = 

Re: [petsc-users] Required structure and attrs for MatLoad from hdf5

2020-02-04 Thread Smith, Barry F. via petsc-users

  I think this is a Python-Matlab question, not specifically related to PETSc 
in any way. Googling python matrix hdf5 matlab there are mentions of h5py 
library that can be used to write out sparse matrices in Matlab HDF5 format. 
Which could presumably be read by PETSc. PETSc can also read in the 
non-transposed version which presumably can also be written out with h5py

  https://www.loc.gov/preservation/digital/formats/fdd/fdd000440.shtml gives 
some indication that an indication of whether the transpose is stored in the 
file might exist, if so and it is used by Matlab we wouldn't need the special 
matlab format flag.

   Barry


> On Feb 4, 2020, at 6:30 PM, Sajid Ali  
> wrote:
> 
> Hi PETSc-developers,
> 
> The example src/mat/examples/tutorials/ex10.c shows how one would read a 
> matrix from a hdf5 file. Since MatView isn’t implemented for hdf5_mat format, 
> how is the hdf5 file (to be used to run ex10) generated ?
> 
> I tried reading from an hdf5 file but I saw an error stating object 'jc' 
> doesn't exist and thus would like to know how I should store a sparse matrix 
> in an hdf5 file so that MatLoad works.
> 
> PS: I’m guessing that MATLAB stores the matrix in the format that PETSc 
> expects (group/dset/attrs) but I’m creating this from Python. If the 
> recommended approach is to transfer numpy arrays to PETSc matrices via 
> petsc4py, I’d switch to that instead of directly creating hdf5 files.
> 
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io
> 



[petsc-users] Required structure and attrs for MatLoad from hdf5

2020-02-04 Thread Sajid Ali
Hi PETSc-developers,

The example src/mat/examples/tutorials/ex10.c shows how one would read a
matrix from a hdf5 file. Since MatView isn’t implemented for hdf5_mat
format, how is the hdf5 file (to be used to run ex10) generated ?

I tried reading from an hdf5 file but I saw an error stating object 'jc'
doesn't exist and thus would like to know how I should store a sparse
matrix in an hdf5 file so that MatLoad works.

PS: I’m guessing that MATLAB stores the matrix in the format that PETSc
expects (group/dset/attrs) but I’m creating this from Python. If the
recommended approach is to transfer numpy arrays to PETSc matrices via
petsc4py, I’d switch to that instead of directly creating hdf5 files.

Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io


[petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?

2020-02-04 Thread Olek Niewiarowski
Hello,
I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver 
through the SNES object to implement the Sherman-Morrison formula (e.g.  
http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a 
nonlinear system of the form [K(u)−kaaT]Δu=−F(u). Here the jacobian matrix K is 
modified by the term kaaT, where k is a scalar.  Notably, K is sparse, while 
the term kaaT results in a full matrix. This problem can be solved efficiently 
using the Sherman-Morrison formula :

[K−kaaT]-1 = K-1  - (kK-1 aaTK-1)/(1+kaTK-1a)
I have managed to successfully implement this at the linear solve level (by 
modifying the KSP solver) inside a custom Newton solver in python by following 
an incomplete tutorial at 
https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner 
:
  *   while (norm(delU) > alpha):  # while not converged
  *
  *   self.update_F()  # call to method to update r.h.s form
  *   self.update_K()  # call to update the jacobian form
  *   K = assemble(self.K)  # assemble the jacobian matrix
  *   F = assemble(self.F)  # assemble the r.h.s vector
  *   a = assemble(self.a_form)  # assemble the a_form (see 
Sherman Morrison formula)
  *
  *   for bc in self.mem.bc:  # apply boundary conditions
  *   bc.apply(K, F)
  *   bc.apply(K, a)
  *
  *   B = PETSc.Mat().create()
  *
  *   # Assemble the bilinear form that defines A and get the 
concrete
  *   # PETSc matrix
  *   A = as_backend_type(K).mat()  # get the PETSc objects for 
K and a
  *   u = as_backend_type(a).vec()
  *
  *   # Build the matrix "context"  # see firedrake docs
  *   Bctx = MatrixFreeB(A, u, u, self.k)
  *
  *   # Set up B
  *   # B is the same size as A
  *   B.setSizes(*A.getSizes())
  *   B.setType(B.Type.PYTHON)
  *   B.setPythonContext(Bctx)
  *   B.setUp()
  *
  *
  *   ksp = PETSc.KSP().create()   # create the KSP linear 
solver object
  *   ksp.setOperators(B)
  *   ksp.setUp()
  *   pc = ksp.pc
  *   pc.setType(pc.Type.PYTHON)
  *   pc.setPythonContext(MatrixFreePC())
  *   ksp.setFromOptions()
  *
  *   solution = delU# the incremental displacement at this 
iteration
  *
  *   b = as_backend_type(-F).vec()
  *   delu = solution.vector().vec()
  *
  *   ksp.solve(b, delu)
  *
  *   self.mem.u.vector().axpy(0.25, self.delU.vector())  # 
poor man's linesearch
  *   counter += 1
Here is the corresponding petsc4py code adapted from the firedrake docs:


  1.  class MatrixFreePC(object):
  2.
  3.  def setUp(self, pc):
  4.  B, P = pc.getOperators()
  5.  # extract the MatrixFreeB object from B
  6.  ctx = B.getPythonContext()
  7.  self.A = ctx.A
  8.  self.u = ctx.u
  9.  self.v = ctx.v
  10. self.k = ctx.k
  11. # Here we build the PC object that uses the concrete,
  12. # assembled matrix A.  We will use this to apply the action
  13. # of A^{-1}
  14. self.pc = PETSc.PC().create()
  15. self.pc.setOptionsPrefix("mf_")
  16. self.pc.setOperators(self.A)
  17. self.pc.setFromOptions()
  18. # Since u and v do not change, we can build the denominator
  19. # and the action of A^{-1} on u only once, in the setup
  20. # phase.
  21. tmp = self.A.createVecLeft()
  22. self.pc.apply(self.u, tmp)
  23. self._Ainvu = tmp
  24. self._denom = 1 + self.k*self.v.dot(self._Ainvu)
  25.
  26. def apply(self, pc, x, y):
  27. # y <- A^{-1}x
  28. self.pc.apply(x, y)
  29. # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u)
  30. alpha = (self.k*self.v.dot(y)) / self._denom
  31. # y <- y - alpha * A^{-1}u
  32. y.axpy(-alpha, self._Ainvu)
  33.
  34.
  35. class MatrixFreeB(object):
  36.
  37. def __init__(self, A, u, v, k):
  38. self.A = A
  39. self.u = u
  40. self.v = v
  41. self.k = k
  42.
  43. def mult(self, mat, x, y):
  44. # y <- A x
  45. self.A.mult(x, y)
  46.
  47. # alpha <- v^T x
  48. alpha = self.v.dot(x)
  49.
  50. # y <- y + alpha*u
  51. y.axpy(alpha, self.u)

However, this approach is not efficient as it requires many iterations due to 
the Newton step being fixed, so I would like to implement it using SNES and use 
line search. Unfortunately, I have not been able to find any 
documentation/tutorial 

Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?

2020-02-04 Thread Smith, Barry F. via petsc-users


> On Feb 4, 2020, at 12:41 PM, Hao DONG  wrote:
> 
> Dear all, 
> 
> 
> I have a few questions about the implementation of diagonal ILU PC in PETSc. 
> I want to solve a very simple system with KSP (in parallel), the nature of 
> the system (finite difference time-harmonic Maxwell) is probably not 
> important to the question itself. Long story short, I just need to solve a 
> set of equations of Ax = b with a block diagonal system matrix, like (not 
> sure if the mono font works): 
> 
>|X|  
> A =|  Y  |  
>|Z| 
> 
> Note that A is not really block-diagonal, it’s just a multi-diagonal matrix 
> determined by the FD mesh, where most elements are close to diagonal. So 
> instead of a full ILU decomposition, a D-ILU is a good approximation as a 
> preconditioner, and the number of blocks should not matter too much: 
> 
> |Lx  | |Ux  |
> L = |   Ly   | and U = |   Uy   |
> |  Lz| |  Uz|
> 
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N 
> blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR 
> in my sequential Fortran/Matlab code. 
> 
> So like most users, I am looking for a parallel implement with this problem 
> in PETSc. After looking through the manual, it seems that the most 
> straightforward way to do it is through PCBJACOBI. Not sure I understand it 
> right, I just setup a 3-block PCJACOBI and give each of the block a KSP with 
> PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My 
> little block of fortran code would look like: 
> ...
>   call PCBJacobiSetTotalBlocks(pc_local,Ntotal,   &
>  & isubs,ierr)
>   call PCBJacobiSetLocalBlocks(pc_local, Nsub,&
>  &isubs(istart:iend),ierr)
>   ! set up the block jacobi structure
>   call KSPSetup(ksp_local,ierr)
>   ! allocate sub ksps
>   allocate(ksp_sub(Nsub))
>   call PCBJacobiGetSubKSP(pc_local,Nsub,istart,   &
>  & ksp_sub,ierr)
>   do i=1,Nsub
>   call KSPGetPC(ksp_sub(i),pc_sub,ierr)
>   !ILU preconditioner
>   call PCSetType(pc_sub,ptype,ierr)
>   call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>   call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>   end do
>   call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>  & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> … 

 This code looks essentially right. You should call with -ksp_view to see 
exactly what PETSc is using for a solver. 

> 
> I understand that the parallel performance may not be comparable, so I first 
> set up a one-process test (with MPIAij, but all the rows are local since 
> there is only one process). The system is solved without any problem 
> (identical results within error). But the performance is actually a lot worse 
> (code built without debugging flags in performance tests) than my own 
> home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix 
> format), which is hard to believe. I suspect the difference is from the PC as 
> the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to 
> converge to the same relative tol. 

   PETSc uses GMRES by default with a restart of 30 and left preconditioning. 
It could be different exact choices in the solver (which is why -ksp_view is so 
useful) can explain the differences in the runs between your code and PETSc's
> 
> This is further confirmed when I change the setup of D-ILU (using 6 or 9 
> blocks instead of 3). While my Fortran/Matlab codes see minimal performance 
> difference (<5%) when I play with the D-ILU setup, increasing the number of 
> D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a 
> performance decrease of more than 50% in sequential test.

   This is odd, the more blocks the smaller each block so the quicker the ILU 
set up should be. You can run various cases with -log_view and send us the 
output to see what is happening at each part of the computation in time.
 
> So my implementation IS somewhat different in PETSc. Do I miss something in 
> the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how 
> PCBJACOBI works in PETSc? 

   Probably not.
> 
> If this is not the right way to implement a block diagonal ILU as (parallel) 
> PC, please kindly point me to the right direction. I searched through the 
> mail list to find some answers, only to find a couple of similar questions... 
> An example would be nice.

   You approach seems fundamentally right but I cannot be sure of possible bugs.
> 
> On the other hand, does PETSc support a simple way to use explicit L/U matrix 
> as a preconditioner? I can import the  D-ILU matrix (I already converted my A 
> matrix into Mat) constructed in my Fortran code to make a better comparison. 
> Or do I have to construct my own PC using PCshell? If so, is there a good 
> tutorial/example to 

Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?

2020-02-04 Thread hzhang--- via petsc-users
Hao:
I would suggest to use a parallel sparse direct solver, e.g., superlu_dist
or mumps. These solvers can take advantage of your sparse data structure.
Once it works, then you may play with other preconditioners, such as
bjacobi + lu/ilu. See
https://www.mcs.anl.gov/petsc/miscellaneous/external.html
Hong

Dear all,
>
>
> I have a few questions about the implementation of diagonal ILU PC in
> PETSc. I want to solve a very simple system with KSP (in parallel), the
> nature of the system (finite difference time-harmonic Maxwell) is probably
> not important to the question itself. Long story short, I just need to
> solve a set of equations of Ax = b with a block diagonal system matrix,
> like (not sure if the mono font works):
>
>|X|
> A =|  Y  |
>|Z|
>
> Note that A is not really block-diagonal, it’s just a multi-diagonal
> matrix determined by the FD mesh, where most elements are close to
> diagonal. So instead of a full ILU decomposition, a D-ILU is a good
> approximation as a preconditioner, and the number of blocks should not
> matter too much:
>
> |Lx  | |Ux  |
> L = |   Ly   | and U = |   Uy   |
> |  Lz| |  Uz|
>
> Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N
> blocks) is quite efficient with Krylov subspace methods like BiCGstab or
> QMR in my sequential Fortran/Matlab code.
>
> So like most users, I am looking for a parallel implement with this
> problem in PETSc. After looking through the manual, it seems that the
> most straightforward way to do it is through PCBJACOBI. Not sure I
> understand it right, I just setup a 3-block PCJACOBI and give each of the
> block a KSP with PCILU. Is this supposed to be equivalent to my
> D-ILU preconditioner? My little block of fortran code would look like:
> ...
> *  call* PCBJacobiSetTotalBlocks(pc_local,Ntotal,   &
>  & isubs,ierr)
>   *call* PCBJacobiSetLocalBlocks(pc_local, Nsub,&
>  &isubs(istart:iend),ierr)
>   ! set up the block jacobi structure
>   *call* KSPSetup(ksp_local,ierr)
>   ! allocate sub ksps
>   allocate(ksp_sub(Nsub))
>   *call* PCBJacobiGetSubKSP(pc_local,Nsub,istart,   &
>  & ksp_sub,ierr)
>   do i=1,Nsub
>   *call* KSPGetPC(ksp_sub(i),pc_sub,ierr)
>   !ILU preconditioner
>   *call* PCSetType(pc_sub,ptype,ierr)
>   *call* PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
>   *call* KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
>   end do
>   *call* KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
>  & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
> …
>
> I understand that the parallel performance may not be comparable, so I
> first set up a one-process test (with MPIAij, but all the rows are local
> since there is only one process). The system is solved without any
> problem (identical results within error). But the performance is actually a
> lot worse (code built without debugging flags in performance tests) than my
> own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse
> matrix format), which is hard to believe. I suspect the difference is from
> the PC as the PETSc version took much more BiCGstab iterations (60-ish vs
> 100-ish) to converge to the same relative tol.
>
> This is further confirmed when I change the setup of D-ILU (using 6 or 9
> blocks instead of 3). While my Fortran/Matlab codes see minimal performance
> difference (<5%) when I play with the D-ILU setup, increasing the number of
> D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a
> performance decrease of more than 50% in sequential test. So my
> implementation IS somewhat different in PETSc. Do I miss something in the
> PCBJACOBI setup? Or do I have some fundamental misunderstanding of how
> PCBJACOBI works in PETSc?
>
> If this is not the right way to implement a block diagonal ILU as
> (parallel) PC, please kindly point me to the right direction. I searched
> through the mail list to find some answers, only to find a couple of
> similar questions... An example would be nice.
>
> On the other hand, does PETSc support a simple way to use explicit L/U
> matrix as a preconditioner? I can import the  D-ILU matrix (I already
> converted my A matrix into Mat) constructed in my Fortran code to make a
> better comparison. Or do I have to construct my own PC using PCshell? If
> so, is there a good tutorial/example to learn about how to use PCSHELL (in
> a more advanced sense, like how to setup pc side and transpose)?
>
> Thanks in advance,
>
> Hao
>
>


[petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?

2020-02-04 Thread Hao DONG
Dear all,


I have a few questions about the implementation of diagonal ILU PC in PETSc. I 
want to solve a very simple system with KSP (in parallel), the nature of the 
system (finite difference time-harmonic Maxwell) is probably not important to 
the question itself. Long story short, I just need to solve a set of equations 
of Ax = b with a block diagonal system matrix, like (not sure if the mono font 
works):

   |X|
A =|  Y  |
   |Z|

Note that A is not really block-diagonal, it’s just a multi-diagonal matrix 
determined by the FD mesh, where most elements are close to diagonal. So 
instead of a full ILU decomposition, a D-ILU is a good approximation as a 
preconditioner, and the number of blocks should not matter too much:

|Lx  | |Ux  |
L = |   Ly   | and U = |   Uy   |
|  Lz| |  Uz|

Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N 
blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in 
my sequential Fortran/Matlab code.

So like most users, I am looking for a parallel implement with this problem in 
PETSc. After looking through the manual, it seems that the most straightforward 
way to do it is through PCBJACOBI. Not sure I understand it right, I just setup 
a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this 
supposed to be equivalent to my D-ILU preconditioner? My little block of 
fortran code would look like:
...
  call PCBJacobiSetTotalBlocks(pc_local,Ntotal,   &
 & isubs,ierr)
  call PCBJacobiSetLocalBlocks(pc_local, Nsub,&
 &isubs(istart:iend),ierr)
  ! set up the block jacobi structure
  call KSPSetup(ksp_local,ierr)
  ! allocate sub ksps
  allocate(ksp_sub(Nsub))
  call PCBJacobiGetSubKSP(pc_local,Nsub,istart,   &
 & ksp_sub,ierr)
  do i=1,Nsub
  call KSPGetPC(ksp_sub(i),pc_sub,ierr)
  !ILU preconditioner
  call PCSetType(pc_sub,ptype,ierr)
  call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here
  call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)]
  end do
  call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, &
 & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr)
…

I understand that the parallel performance may not be comparable, so I first 
set up a one-process test (with MPIAij, but all the rows are local since there 
is only one process). The system is solved without any problem (identical 
results within error). But the performance is actually a lot worse (code built 
without debugging flags in performance tests) than my own home-brew 
implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), 
which is hard to believe. I suspect the difference is from the PC as the PETSc 
version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to 
the same relative tol.

This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks 
instead of 3). While my Fortran/Matlab codes see minimal performance difference 
(<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks 
from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance 
decrease of more than 50% in sequential test. So my implementation IS somewhat 
different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have 
some fundamental misunderstanding of how PCBJACOBI works in PETSc?

If this is not the right way to implement a block diagonal ILU as (parallel) 
PC, please kindly point me to the right direction. I searched through the mail 
list to find some answers, only to find a couple of similar questions... An 
example would be nice.

On the other hand, does PETSc support a simple way to use explicit L/U matrix 
as a preconditioner? I can import the  D-ILU matrix (I already converted my A 
matrix into Mat) constructed in my Fortran code to make a better comparison. Or 
do I have to construct my own PC using PCshell? If so, is there a good 
tutorial/example to learn about how to use PCSHELL (in a more advanced sense, 
like how to setup pc side and transpose)?

Thanks in advance,

Hao



[petsc-users] petsc-3.12.4.tar.gz now available

2020-02-04 Thread Satish Balay via petsc-users
Dear PETSc users,

The patch release petsc-3.12.4 is now available for download,
with change list at 'PETSc-3.12 Changelog'

http://www.mcs.anl.gov/petsc/download/index.html

Satish




Re: [petsc-users] Flagging the solver to restart

2020-02-04 Thread Zhang, Hong via petsc-users



> On Feb 2, 2020, at 11:24 AM, Mohammed Ashour  wrote:
> 
> Dear All,
> I'm solving a constraint phase-field problem using PetIGA. This question i'm 
> having is more relevant to PETSc, so I'm posting here.
> 
> I have an algorithm involving iterating on the solution vector until certain 
> criteria are met before moving forward for the next time step. The sequence 
> inside the TSSolve is to call TSMonitor first, to print a user-defined set of 
> values and the move to solve at TSStep and then call TSPostEvaluate.
> 
> So I'm using the TSMonitor to update some variables at time n , those 
> variables are used the in the residual and jacobian calculations at time n+1, 
> and then solving and then check if those criteria are met or not in a 
> function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll 
> move forward, if not, it'll engaged the routine TSRollBack(), which based on 
> my understanding is the proper way yo flag the solver to recalculate n+1. My 
> question is, is this the proper way to do it? what is the difference between 
> TSRollBack and TSRestart?

You are right that TSRollBack() recalculates the current time step. But I would 
not suggest to use TSPostEvaluate in your case. Presumably you are not using 
the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the 
stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() 
if the criteria are not met and update the variables accordingly. The variables 
can also be updated in TSPreStep(), but TSMonitor should not be used since it 
is designed for read-only operations.

TSRestart may be needed when you are using non-self-starting integration 
methods such as multiple step methods and FSAL RK methods (-ts_rk_type 
<3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from 
previous time steps, thus need a flag to hard restart the time integration 
whenever discontinuity is introduced (e.g. a parameter in the RHS function is 
changed). So TSRestart sets the flag to tell the integrator to treat the next 
time step like the first time step in a time integration.

Hong (Mr.) 

> Thanks a lot
> 
> -- 



Re: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM)

2020-02-04 Thread Smith, Barry F. via petsc-users

   Please run with the option -ksp_view so we know the exact solver options you 
are using. 

   From the lines 

MatCreateSubMats   1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
MatIncreaseOvrlp   1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 0

   and the fact you have three matrices I would guess you are using the 
additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which 
converges the same as ILU on one process but does use much more memory).

   Note: your code is still built with 32 bit integers. 

   I would guess the basic matrix formed plus the vectors in this example could 
take ~200 MB. It is the two matrices in the additive Schwarz that is taking the 
additional memory.

   What kind of PDEs are you solving and what kind of formulation?  

   ASM plus ILU is the "work mans" type preconditioner, relatively robust but 
not particularly fast for convergence. Depending on your problem you might be 
able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on 
one of the splits. In your own run you see the ILU is chugging along rather 
slowly to the solution. 

   With your current solvers you can use the option -sub_pc_factor_in_place 
which will shave off one of the matrices memories. Please try that.

   Avoiding the ASM you can avoid both extra matrices but at the cost of even 
slower convergence. Use, for example -pc_type sor 

   
The petroleum industry also has a variety of "custom" 
preconditioners/solvers for particular models and formulations that can beat 
the convergence of general purpose solvers; and require less memory. Some of 
these can be implemented or simulated with PETSc. Some of these are implemented 
in the commercial petroleum simulation codes and it can be difficult to get a 
handle on exactly what they do because of proprietary issues. I think I have an 
old text on these approaches in my office, there may be modern books that 
discuss these.


   Barry




   



> On Feb 4, 2020, at 6:04 AM, Дмитрий Мельничук 
>  wrote:
> 
> Hello again!
> Thank you very much for your replies!
> Log is attached.
>  
> 1. The main problem now is following. To solve the matrix that is attached to 
> my previous e-mail PETSc consumes ~550 MB.
> I know definitely that there are commercial softwares in petroleum industry 
> (e.g., Schlumberger Petrel) that solve the same initial problem consuming 
> only ~200 MB.
> Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, 
> it also consumed ~200 MB for this matrix.
>  
> So, my question is: do you have any advice on how to decrease the amount of 
> RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific 
> preconditioner or other ways?
>  
> I will be very grateful for any thoughts!
>  
> 2. The second problem is more particular.
> According to resource manager in Windows 10, Fortran solver based on PETSc 
> consumes 548 MB RAM  while solving the system of linear equations.
> As I understand it form logs, it is required 459 MB and 52 MB for matrix and 
> vector storage respectively. After summing of all objects for which memory is 
>  allocated we get only 517 MB.
>  
> Thank you again for your time! Have a nice day.
>  
> Kind regards,
> Dmitry
> 
> 
> 03.02.2020, 19:55, "Smith, Barry F." :
> 
>GMRES also can by default require about 35 work vectors if it reaches the 
> full restart. You can set a smaller restart with -ksp_gmres_restart 15 for 
> example but this can also hurt the convergence of GMRES dramatically. People 
> sometimes use the KSPBCGS algorithm since it does not require all the restart 
> vectors but it can also converge more slowly.
> 
> Depending on how much memory the sparse matrices use relative to the 
> vectors the vector memory may matter or not.
> 
>If you are using a recent version of PETSc you can run with -log_view 
> -log_view_memory and it will show on the right side of the columns how much 
> memory is being allocated for each of the operations in various ways. 
> 
>Barry
> 
> 
> 
>  On Feb 3, 2020, at 10:34 AM, Matthew Knepley  wrote:
> 
>  On Mon, Feb 3, 2020 at 10:38 AM Дмитрий Мельничук 
>  wrote:
>  Hello all!
> 
>  Now I am faced with a problem associated with the memory allocation when 
> calling of KSPSolve .
> 
>  GMRES preconditioned by ASM for solving linear algebra system (obtained by 
> the finite element spatial discretisation of Biot poroelasticity model) was 
> chosen.
>  According to the output value of PetscMallocGetCurrentUsage subroutine 176 
> MB for matrix and RHS vector storage is required (before KSPSolve calling).
>  But during solving linear algebra system 543 MB of RAM is required (during 
> KSPSolve calling).
>  Thus, the amount of allocated memory after preconditioning stage 

Re: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM)

2020-02-04 Thread Дмитрий Мельничук
Hello again!Thank you very much for your replies!Log is attached. 1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB.I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB.Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix. So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways? I will be very grateful for any thoughts! 2. The second problem is more particular.According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM  while solving the system of linear equations.As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is  allocated we get only 517 MB. Thank you again for your time! Have a nice day. Kind regards,Dmitry03.02.2020, 19:55, "Smith, Barry F." :   GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly.Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not.   If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways.    Barry On Feb 3, 2020, at 10:34 AM, Matthew Knepley  wrote: On Mon, Feb 3, 2020 at 10:38 AM Дмитрий Мельничук  wrote: Hello all! Now I am faced with a problem associated with the memory allocation when calling of KSPSolve . GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen. According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling). But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling). Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells.   1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner. You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap and you might have fill-in depending on the solver. 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation   Thanks,  Matt Is there a way to decrease amout of allocated memory? Is that an expected behaviour for GMRES-ASM combination? As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing.   ... Vec :: Vec_F, Vec_U Mat :: Mat_K ... ... call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr)  call VecAssemblyBegin(Vec_F_mod,ierr) call VecAssemblyEnd(Vec_F_mod,ierr) ... ... call PetscMallocGetCurrentUsage(mem, ierr) print *,"Memory used: ",mem ... ... call KSPSetType(Krylov,KSPGMRES,ierr) call KSPGetPC(Krylov,PreCon,ierr) call PCSetType(PreCon,PCASM,ierr) call KSPSetFromOptions(Krylov,ierr) ... call KSPSolve(Krylov,Vec_F,Vec_U,ierr) ... ... options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason" Kind regards, Dmitry Melnichuk Matrix.dat (265288024) --  What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/

Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit
Description: Binary data