Re: [petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?
I am not sure of everything in your email but it sounds like you want to use a "Picard" iteration to solve [K(u)−kaaT]Δu=−F(u). That is solve A(u^{n}) (u^{n+1} - u^{n}) = F(u^{n}) - A(u^{n})u^{n} where A(u) = K(u) - kaaT PETSc provides code to this with SNESSetPicard() (see the manual pages) I don't know if Petsc4py has bindings for this. Adding missing python bindings is not terribly difficult and you may be able to do it yourself if this is the approach you want. Barry > On Feb 4, 2020, at 5:07 PM, Olek Niewiarowski wrote: > > Hello, > I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver > through the SNES object to implement the Sherman-Morrison formula(e.g. > http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a > nonlinear system of the form [K(u)−kaaT]Δu=−F(u). Here the jacobian matrix K > is modified by the term kaaT, where k is a scalar. Notably, K is sparse, > while the term kaaT results in a full matrix. This problem can be solved > efficiently using the Sherman-Morrison formula : > > [K−kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) > I have managed to successfully implement this at the linear solve level (by > modifying the KSP solver) inside a custom Newton solver in python by > following an incomplete tutorial at > https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner > : > • while (norm(delU) > alpha): # while not converged > • > • self.update_F() # call to method to update r.h.s form > • self.update_K() # call to update the jacobian form > • K = assemble(self.K) # assemble the jacobian matrix > • F = assemble(self.F) # assemble the r.h.s vector > • a = assemble(self.a_form) # assemble the a_form (see > Sherman Morrison formula) > • > • for bc in self.mem.bc: # apply boundary conditions > • bc.apply(K, F) > • bc.apply(K, a) > • > • B = PETSc.Mat().create() > • > • # Assemble the bilinear form that defines A and get the > concrete > • # PETSc matrix > • A = as_backend_type(K).mat() # get the PETSc objects for K > and a > • u = as_backend_type(a).vec() > • > • # Build the matrix "context" # see firedrake docs > • Bctx = MatrixFreeB(A, u, u, self.k) > • > • # Set up B > • # B is the same size as A > • B.setSizes(*A.getSizes()) > • B.setType(B.Type.PYTHON) > • B.setPythonContext(Bctx) > • B.setUp() > • > • > • ksp = PETSc.KSP().create() # create the KSP linear solver > object > • ksp.setOperators(B) > • ksp.setUp() > • pc = ksp.pc > • pc.setType(pc.Type.PYTHON) > • pc.setPythonContext(MatrixFreePC()) > • ksp.setFromOptions() > • > • solution = delU# the incremental displacement at this > iteration > • > • b = as_backend_type(-F).vec() > • delu = solution.vector().vec() > • > • ksp.solve(b, delu) > > • self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor > man's linesearch > • counter += 1 > Here is the corresponding petsc4py code adapted from the firedrake docs: > > • class MatrixFreePC(object): > • > • def setUp(self, pc): > • B, P = pc.getOperators() > • # extract the MatrixFreeB object from B > • ctx = B.getPythonContext() > • self.A = ctx.A > • self.u = ctx.u > • self.v = ctx.v > • self.k = ctx.k > • # Here we build the PC object that uses the concrete, > • # assembled matrix A. We will use this to apply the action > • # of A^{-1} > • self.pc = PETSc.PC().create() > • self.pc.setOptionsPrefix("mf_") > • self.pc.setOperators(self.A) > • self.pc.setFromOptions() > • # Since u and v do not change, we can build the denominator > • # and the action of A^{-1} on u only once, in the setup > • # phase. > • tmp = self.A.createVecLeft() > • self.pc.apply(self.u, tmp) > • self._Ainvu = tmp > • self._denom = 1 + self.k*self.v.dot(self._Ainvu) > • > • def apply(self, pc, x, y): > • # y <- A^{-1}x > • self.pc.apply(x, y) > • # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) > • alpha =
Re: [petsc-users] Required structure and attrs for MatLoad from hdf5
I think this is a Python-Matlab question, not specifically related to PETSc in any way. Googling python matrix hdf5 matlab there are mentions of h5py library that can be used to write out sparse matrices in Matlab HDF5 format. Which could presumably be read by PETSc. PETSc can also read in the non-transposed version which presumably can also be written out with h5py https://www.loc.gov/preservation/digital/formats/fdd/fdd000440.shtml gives some indication that an indication of whether the transpose is stored in the file might exist, if so and it is used by Matlab we wouldn't need the special matlab format flag. Barry > On Feb 4, 2020, at 6:30 PM, Sajid Ali > wrote: > > Hi PETSc-developers, > > The example src/mat/examples/tutorials/ex10.c shows how one would read a > matrix from a hdf5 file. Since MatView isn’t implemented for hdf5_mat format, > how is the hdf5 file (to be used to run ex10) generated ? > > I tried reading from an hdf5 file but I saw an error stating object 'jc' > doesn't exist and thus would like to know how I should store a sparse matrix > in an hdf5 file so that MatLoad works. > > PS: I’m guessing that MATLAB stores the matrix in the format that PETSc > expects (group/dset/attrs) but I’m creating this from Python. If the > recommended approach is to transfer numpy arrays to PETSc matrices via > petsc4py, I’d switch to that instead of directly creating hdf5 files. > > Thank You, > Sajid Ali | PhD Candidate > Applied Physics > Northwestern University > s-sajid-ali.github.io >
[petsc-users] Required structure and attrs for MatLoad from hdf5
Hi PETSc-developers, The example src/mat/examples/tutorials/ex10.c shows how one would read a matrix from a hdf5 file. Since MatView isn’t implemented for hdf5_mat format, how is the hdf5 file (to be used to run ex10) generated ? I tried reading from an hdf5 file but I saw an error stating object 'jc' doesn't exist and thus would like to know how I should store a sparse matrix in an hdf5 file so that MatLoad works. PS: I’m guessing that MATLAB stores the matrix in the format that PETSc expects (group/dset/attrs) but I’m creating this from Python. If the recommended approach is to transfer numpy arrays to PETSc matrices via petsc4py, I’d switch to that instead of directly creating hdf5 files. Thank You, Sajid Ali | PhD Candidate Applied Physics Northwestern University s-sajid-ali.github.io
[petsc-users] Implementing the Sherman Morisson formula (low rank update) in petsc4py and FEniCS?
Hello, I am a FEniCS user but new to petsc4py. I am trying to modify the KSP solver through the SNES object to implement the Sherman-Morrison formula (e.g. http://fourier.eng.hmc.edu/e176/lectures/algebra/node6.html ). I am solving a nonlinear system of the form [K(u)−kaaT]Δu=−F(u). Here the jacobian matrix K is modified by the term kaaT, where k is a scalar. Notably, K is sparse, while the term kaaT results in a full matrix. This problem can be solved efficiently using the Sherman-Morrison formula : [K−kaaT]-1 = K-1 - (kK-1 aaTK-1)/(1+kaTK-1a) I have managed to successfully implement this at the linear solve level (by modifying the KSP solver) inside a custom Newton solver in python by following an incomplete tutorial at https://www.firedrakeproject.org/petsc-interface.html#defining-a-preconditioner : * while (norm(delU) > alpha): # while not converged * * self.update_F() # call to method to update r.h.s form * self.update_K() # call to update the jacobian form * K = assemble(self.K) # assemble the jacobian matrix * F = assemble(self.F) # assemble the r.h.s vector * a = assemble(self.a_form) # assemble the a_form (see Sherman Morrison formula) * * for bc in self.mem.bc: # apply boundary conditions * bc.apply(K, F) * bc.apply(K, a) * * B = PETSc.Mat().create() * * # Assemble the bilinear form that defines A and get the concrete * # PETSc matrix * A = as_backend_type(K).mat() # get the PETSc objects for K and a * u = as_backend_type(a).vec() * * # Build the matrix "context" # see firedrake docs * Bctx = MatrixFreeB(A, u, u, self.k) * * # Set up B * # B is the same size as A * B.setSizes(*A.getSizes()) * B.setType(B.Type.PYTHON) * B.setPythonContext(Bctx) * B.setUp() * * * ksp = PETSc.KSP().create() # create the KSP linear solver object * ksp.setOperators(B) * ksp.setUp() * pc = ksp.pc * pc.setType(pc.Type.PYTHON) * pc.setPythonContext(MatrixFreePC()) * ksp.setFromOptions() * * solution = delU# the incremental displacement at this iteration * * b = as_backend_type(-F).vec() * delu = solution.vector().vec() * * ksp.solve(b, delu) * * self.mem.u.vector().axpy(0.25, self.delU.vector()) # poor man's linesearch * counter += 1 Here is the corresponding petsc4py code adapted from the firedrake docs: 1. class MatrixFreePC(object): 2. 3. def setUp(self, pc): 4. B, P = pc.getOperators() 5. # extract the MatrixFreeB object from B 6. ctx = B.getPythonContext() 7. self.A = ctx.A 8. self.u = ctx.u 9. self.v = ctx.v 10. self.k = ctx.k 11. # Here we build the PC object that uses the concrete, 12. # assembled matrix A. We will use this to apply the action 13. # of A^{-1} 14. self.pc = PETSc.PC().create() 15. self.pc.setOptionsPrefix("mf_") 16. self.pc.setOperators(self.A) 17. self.pc.setFromOptions() 18. # Since u and v do not change, we can build the denominator 19. # and the action of A^{-1} on u only once, in the setup 20. # phase. 21. tmp = self.A.createVecLeft() 22. self.pc.apply(self.u, tmp) 23. self._Ainvu = tmp 24. self._denom = 1 + self.k*self.v.dot(self._Ainvu) 25. 26. def apply(self, pc, x, y): 27. # y <- A^{-1}x 28. self.pc.apply(x, y) 29. # alpha <- (v^T A^{-1} x) / (1 + v^T A^{-1} u) 30. alpha = (self.k*self.v.dot(y)) / self._denom 31. # y <- y - alpha * A^{-1}u 32. y.axpy(-alpha, self._Ainvu) 33. 34. 35. class MatrixFreeB(object): 36. 37. def __init__(self, A, u, v, k): 38. self.A = A 39. self.u = u 40. self.v = v 41. self.k = k 42. 43. def mult(self, mat, x, y): 44. # y <- A x 45. self.A.mult(x, y) 46. 47. # alpha <- v^T x 48. alpha = self.v.dot(x) 49. 50. # y <- y + alpha*u 51. y.axpy(alpha, self.u) However, this approach is not efficient as it requires many iterations due to the Newton step being fixed, so I would like to implement it using SNES and use line search. Unfortunately, I have not been able to find any documentation/tutorial
Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
> On Feb 4, 2020, at 12:41 PM, Hao DONG wrote: > > Dear all, > > > I have a few questions about the implementation of diagonal ILU PC in PETSc. > I want to solve a very simple system with KSP (in parallel), the nature of > the system (finite difference time-harmonic Maxwell) is probably not > important to the question itself. Long story short, I just need to solve a > set of equations of Ax = b with a block diagonal system matrix, like (not > sure if the mono font works): > >|X| > A =| Y | >|Z| > > Note that A is not really block-diagonal, it’s just a multi-diagonal matrix > determined by the FD mesh, where most elements are close to diagonal. So > instead of a full ILU decomposition, a D-ILU is a good approximation as a > preconditioner, and the number of blocks should not matter too much: > > |Lx | |Ux | > L = | Ly | and U = | Uy | > | Lz| | Uz| > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N > blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR > in my sequential Fortran/Matlab code. > > So like most users, I am looking for a parallel implement with this problem > in PETSc. After looking through the manual, it seems that the most > straightforward way to do it is through PCBJACOBI. Not sure I understand it > right, I just setup a 3-block PCJACOBI and give each of the block a KSP with > PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My > little block of fortran code would look like: > ... > call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > & isubs,ierr) > call PCBJacobiSetLocalBlocks(pc_local, Nsub,& > &isubs(istart:iend),ierr) > ! set up the block jacobi structure > call KSPSetup(ksp_local,ierr) > ! allocate sub ksps > allocate(ksp_sub(Nsub)) > call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > & ksp_sub,ierr) > do i=1,Nsub > call KSPGetPC(ksp_sub(i),pc_sub,ierr) > !ILU preconditioner > call PCSetType(pc_sub,ptype,ierr) > call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > end do > call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > … This code looks essentially right. You should call with -ksp_view to see exactly what PETSc is using for a solver. > > I understand that the parallel performance may not be comparable, so I first > set up a one-process test (with MPIAij, but all the rows are local since > there is only one process). The system is solved without any problem > (identical results within error). But the performance is actually a lot worse > (code built without debugging flags in performance tests) than my own > home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix > format), which is hard to believe. I suspect the difference is from the PC as > the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to > converge to the same relative tol. PETSc uses GMRES by default with a restart of 30 and left preconditioning. It could be different exact choices in the solver (which is why -ksp_view is so useful) can explain the differences in the runs between your code and PETSc's > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 > blocks instead of 3). While my Fortran/Matlab codes see minimal performance > difference (<5%) when I play with the D-ILU setup, increasing the number of > D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a > performance decrease of more than 50% in sequential test. This is odd, the more blocks the smaller each block so the quicker the ILU set up should be. You can run various cases with -log_view and send us the output to see what is happening at each part of the computation in time. > So my implementation IS somewhat different in PETSc. Do I miss something in > the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how > PCBJACOBI works in PETSc? Probably not. > > If this is not the right way to implement a block diagonal ILU as (parallel) > PC, please kindly point me to the right direction. I searched through the > mail list to find some answers, only to find a couple of similar questions... > An example would be nice. You approach seems fundamentally right but I cannot be sure of possible bugs. > > On the other hand, does PETSc support a simple way to use explicit L/U matrix > as a preconditioner? I can import the D-ILU matrix (I already converted my A > matrix into Mat) constructed in my Fortran code to make a better comparison. > Or do I have to construct my own PC using PCshell? If so, is there a good > tutorial/example to
Re: [petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
Hao: I would suggest to use a parallel sparse direct solver, e.g., superlu_dist or mumps. These solvers can take advantage of your sparse data structure. Once it works, then you may play with other preconditioners, such as bjacobi + lu/ilu. See https://www.mcs.anl.gov/petsc/miscellaneous/external.html Hong Dear all, > > > I have a few questions about the implementation of diagonal ILU PC in > PETSc. I want to solve a very simple system with KSP (in parallel), the > nature of the system (finite difference time-harmonic Maxwell) is probably > not important to the question itself. Long story short, I just need to > solve a set of equations of Ax = b with a block diagonal system matrix, > like (not sure if the mono font works): > >|X| > A =| Y | >|Z| > > Note that A is not really block-diagonal, it’s just a multi-diagonal > matrix determined by the FD mesh, where most elements are close to > diagonal. So instead of a full ILU decomposition, a D-ILU is a good > approximation as a preconditioner, and the number of blocks should not > matter too much: > > |Lx | |Ux | > L = | Ly | and U = | Uy | > | Lz| | Uz| > > Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N > blocks) is quite efficient with Krylov subspace methods like BiCGstab or > QMR in my sequential Fortran/Matlab code. > > So like most users, I am looking for a parallel implement with this > problem in PETSc. After looking through the manual, it seems that the > most straightforward way to do it is through PCBJACOBI. Not sure I > understand it right, I just setup a 3-block PCJACOBI and give each of the > block a KSP with PCILU. Is this supposed to be equivalent to my > D-ILU preconditioner? My little block of fortran code would look like: > ... > * call* PCBJacobiSetTotalBlocks(pc_local,Ntotal, & > & isubs,ierr) > *call* PCBJacobiSetLocalBlocks(pc_local, Nsub,& > &isubs(istart:iend),ierr) > ! set up the block jacobi structure > *call* KSPSetup(ksp_local,ierr) > ! allocate sub ksps > allocate(ksp_sub(Nsub)) > *call* PCBJacobiGetSubKSP(pc_local,Nsub,istart, & > & ksp_sub,ierr) > do i=1,Nsub > *call* KSPGetPC(ksp_sub(i),pc_sub,ierr) > !ILU preconditioner > *call* PCSetType(pc_sub,ptype,ierr) > *call* PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here > *call* KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] > end do > *call* KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & > & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) > … > > I understand that the parallel performance may not be comparable, so I > first set up a one-process test (with MPIAij, but all the rows are local > since there is only one process). The system is solved without any > problem (identical results within error). But the performance is actually a > lot worse (code built without debugging flags in performance tests) than my > own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse > matrix format), which is hard to believe. I suspect the difference is from > the PC as the PETSc version took much more BiCGstab iterations (60-ish vs > 100-ish) to converge to the same relative tol. > > This is further confirmed when I change the setup of D-ILU (using 6 or 9 > blocks instead of 3). While my Fortran/Matlab codes see minimal performance > difference (<5%) when I play with the D-ILU setup, increasing the number of > D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a > performance decrease of more than 50% in sequential test. So my > implementation IS somewhat different in PETSc. Do I miss something in the > PCBJACOBI setup? Or do I have some fundamental misunderstanding of how > PCBJACOBI works in PETSc? > > If this is not the right way to implement a block diagonal ILU as > (parallel) PC, please kindly point me to the right direction. I searched > through the mail list to find some answers, only to find a couple of > similar questions... An example would be nice. > > On the other hand, does PETSc support a simple way to use explicit L/U > matrix as a preconditioner? I can import the D-ILU matrix (I already > converted my A matrix into Mat) constructed in my Fortran code to make a > better comparison. Or do I have to construct my own PC using PCshell? If > so, is there a good tutorial/example to learn about how to use PCSHELL (in > a more advanced sense, like how to setup pc side and transpose)? > > Thanks in advance, > > Hao > >
[petsc-users] What is the right way to implement a (block) Diagonal ILU as PC?
Dear all, I have a few questions about the implementation of diagonal ILU PC in PETSc. I want to solve a very simple system with KSP (in parallel), the nature of the system (finite difference time-harmonic Maxwell) is probably not important to the question itself. Long story short, I just need to solve a set of equations of Ax = b with a block diagonal system matrix, like (not sure if the mono font works): |X| A =| Y | |Z| Note that A is not really block-diagonal, it’s just a multi-diagonal matrix determined by the FD mesh, where most elements are close to diagonal. So instead of a full ILU decomposition, a D-ILU is a good approximation as a preconditioner, and the number of blocks should not matter too much: |Lx | |Ux | L = | Ly | and U = | Uy | | Lz| | Uz| Where [Lx, Ux] = ILU0(X), etc. Indeed, the D-ILU preconditioner (with 3N blocks) is quite efficient with Krylov subspace methods like BiCGstab or QMR in my sequential Fortran/Matlab code. So like most users, I am looking for a parallel implement with this problem in PETSc. After looking through the manual, it seems that the most straightforward way to do it is through PCBJACOBI. Not sure I understand it right, I just setup a 3-block PCJACOBI and give each of the block a KSP with PCILU. Is this supposed to be equivalent to my D-ILU preconditioner? My little block of fortran code would look like: ... call PCBJacobiSetTotalBlocks(pc_local,Ntotal, & & isubs,ierr) call PCBJacobiSetLocalBlocks(pc_local, Nsub,& &isubs(istart:iend),ierr) ! set up the block jacobi structure call KSPSetup(ksp_local,ierr) ! allocate sub ksps allocate(ksp_sub(Nsub)) call PCBJacobiGetSubKSP(pc_local,Nsub,istart, & & ksp_sub,ierr) do i=1,Nsub call KSPGetPC(ksp_sub(i),pc_sub,ierr) !ILU preconditioner call PCSetType(pc_sub,ptype,ierr) call PCFactorSetLevels(pc_sub,1,ierr) ! use ILU(1) here call KSPSetType(ksp_sub(i),KSPPREONLY,ierr)] end do call KSPSetTolerances(ksp_local,KSSiter%tol,PETSC_DEFAULT_REAL, & & PETSC_DEFAULT_REAL,KSSiter%maxit,ierr) … I understand that the parallel performance may not be comparable, so I first set up a one-process test (with MPIAij, but all the rows are local since there is only one process). The system is solved without any problem (identical results within error). But the performance is actually a lot worse (code built without debugging flags in performance tests) than my own home-brew implementation in Fortran (I wrote my own ILU0 in CSR sparse matrix format), which is hard to believe. I suspect the difference is from the PC as the PETSc version took much more BiCGstab iterations (60-ish vs 100-ish) to converge to the same relative tol. This is further confirmed when I change the setup of D-ILU (using 6 or 9 blocks instead of 3). While my Fortran/Matlab codes see minimal performance difference (<5%) when I play with the D-ILU setup, increasing the number of D-ILU blocks from 3 to 9 caused the ksp setup with PCBJACOBI to suffer a performance decrease of more than 50% in sequential test. So my implementation IS somewhat different in PETSc. Do I miss something in the PCBJACOBI setup? Or do I have some fundamental misunderstanding of how PCBJACOBI works in PETSc? If this is not the right way to implement a block diagonal ILU as (parallel) PC, please kindly point me to the right direction. I searched through the mail list to find some answers, only to find a couple of similar questions... An example would be nice. On the other hand, does PETSc support a simple way to use explicit L/U matrix as a preconditioner? I can import the D-ILU matrix (I already converted my A matrix into Mat) constructed in my Fortran code to make a better comparison. Or do I have to construct my own PC using PCshell? If so, is there a good tutorial/example to learn about how to use PCSHELL (in a more advanced sense, like how to setup pc side and transpose)? Thanks in advance, Hao
[petsc-users] petsc-3.12.4.tar.gz now available
Dear PETSc users, The patch release petsc-3.12.4 is now available for download, with change list at 'PETSc-3.12 Changelog' http://www.mcs.anl.gov/petsc/download/index.html Satish
Re: [petsc-users] Flagging the solver to restart
> On Feb 2, 2020, at 11:24 AM, Mohammed Ashour wrote: > > Dear All, > I'm solving a constraint phase-field problem using PetIGA. This question i'm > having is more relevant to PETSc, so I'm posting here. > > I have an algorithm involving iterating on the solution vector until certain > criteria are met before moving forward for the next time step. The sequence > inside the TSSolve is to call TSMonitor first, to print a user-defined set of > values and the move to solve at TSStep and then call TSPostEvaluate. > > So I'm using the TSMonitor to update some variables at time n , those > variables are used the in the residual and jacobian calculations at time n+1, > and then solving and then check if those criteria are met or not in a > function assigned to TS via TSSetPostEvaluate, if the criteria are met, it'll > move forward, if not, it'll engaged the routine TSRollBack(), which based on > my understanding is the proper way yo flag the solver to recalculate n+1. My > question is, is this the proper way to do it? what is the difference between > TSRollBack and TSRestart? You are right that TSRollBack() recalculates the current time step. But I would not suggest to use TSPostEvaluate in your case. Presumably you are not using the PETSc adaptor (e.g. via -ts_adapt_type none) and want to control the stepsize yourself. You can check the criteria in TSPostStep, call TSRollBack() if the criteria are not met and update the variables accordingly. The variables can also be updated in TSPreStep(), but TSMonitor should not be used since it is designed for read-only operations. TSRestart may be needed when you are using non-self-starting integration methods such as multiple step methods and FSAL RK methods (-ts_rk_type <3bs,5dp,5bs,6vr,7vr,8vr>). These methods rely on solutions or residuals from previous time steps, thus need a flag to hard restart the time integration whenever discontinuity is introduced (e.g. a parameter in the RHS function is changed). So TSRestart sets the flag to tell the integrator to treat the next time step like the first time step in a time integration. Hong (Mr.) > Thanks a lot > > --
Re: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM)
Please run with the option -ksp_view so we know the exact solver options you are using. From the lines MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 and the fact you have three matrices I would guess you are using the additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which converges the same as ILU on one process but does use much more memory). Note: your code is still built with 32 bit integers. I would guess the basic matrix formed plus the vectors in this example could take ~200 MB. It is the two matrices in the additive Schwarz that is taking the additional memory. What kind of PDEs are you solving and what kind of formulation? ASM plus ILU is the "work mans" type preconditioner, relatively robust but not particularly fast for convergence. Depending on your problem you might be able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG on one of the splits. In your own run you see the ILU is chugging along rather slowly to the solution. With your current solvers you can use the option -sub_pc_factor_in_place which will shave off one of the matrices memories. Please try that. Avoiding the ASM you can avoid both extra matrices but at the cost of even slower convergence. Use, for example -pc_type sor The petroleum industry also has a variety of "custom" preconditioners/solvers for particular models and formulations that can beat the convergence of general purpose solvers; and require less memory. Some of these can be implemented or simulated with PETSc. Some of these are implemented in the commercial petroleum simulation codes and it can be difficult to get a handle on exactly what they do because of proprietary issues. I think I have an old text on these approaches in my office, there may be modern books that discuss these. Barry > On Feb 4, 2020, at 6:04 AM, Дмитрий Мельничук > wrote: > > Hello again! > Thank you very much for your replies! > Log is attached. > > 1. The main problem now is following. To solve the matrix that is attached to > my previous e-mail PETSc consumes ~550 MB. > I know definitely that there are commercial softwares in petroleum industry > (e.g., Schlumberger Petrel) that solve the same initial problem consuming > only ~200 MB. > Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, > it also consumed ~200 MB for this matrix. > > So, my question is: do you have any advice on how to decrease the amount of > RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific > preconditioner or other ways? > > I will be very grateful for any thoughts! > > 2. The second problem is more particular. > According to resource manager in Windows 10, Fortran solver based on PETSc > consumes 548 MB RAM while solving the system of linear equations. > As I understand it form logs, it is required 459 MB and 52 MB for matrix and > vector storage respectively. After summing of all objects for which memory is > allocated we get only 517 MB. > > Thank you again for your time! Have a nice day. > > Kind regards, > Dmitry > > > 03.02.2020, 19:55, "Smith, Barry F." : > >GMRES also can by default require about 35 work vectors if it reaches the > full restart. You can set a smaller restart with -ksp_gmres_restart 15 for > example but this can also hurt the convergence of GMRES dramatically. People > sometimes use the KSPBCGS algorithm since it does not require all the restart > vectors but it can also converge more slowly. > > Depending on how much memory the sparse matrices use relative to the > vectors the vector memory may matter or not. > >If you are using a recent version of PETSc you can run with -log_view > -log_view_memory and it will show on the right side of the columns how much > memory is being allocated for each of the operations in various ways. > >Barry > > > > On Feb 3, 2020, at 10:34 AM, Matthew Knepley wrote: > > On Mon, Feb 3, 2020 at 10:38 AM Дмитрий Мельничук > wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when > calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained by > the finite element spatial discretisation of Biot poroelasticity model) was > chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine 176 > MB for matrix and RHS vector storage is required (before KSPSolve calling). > But during solving linear algebra system 543 MB of RAM is required (during > KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage
Re: [petsc-users] Triple increasing of allocated memory during KSPSolve calling(GMRES preconditioned by ASM)
Hello again!Thank you very much for your replies!Log is attached. 1. The main problem now is following. To solve the matrix that is attached to my previous e-mail PETSc consumes ~550 MB.I know definitely that there are commercial softwares in petroleum industry (e.g., Schlumberger Petrel) that solve the same initial problem consuming only ~200 MB.Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, it also consumed ~200 MB for this matrix. So, my question is: do you have any advice on how to decrease the amount of RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific preconditioner or other ways? I will be very grateful for any thoughts! 2. The second problem is more particular.According to resource manager in Windows 10, Fortran solver based on PETSc consumes 548 MB RAM while solving the system of linear equations.As I understand it form logs, it is required 459 MB and 52 MB for matrix and vector storage respectively. After summing of all objects for which memory is allocated we get only 517 MB. Thank you again for your time! Have a nice day. Kind regards,Dmitry03.02.2020, 19:55, "Smith, Barry F." : GMRES also can by default require about 35 work vectors if it reaches the full restart. You can set a smaller restart with -ksp_gmres_restart 15 for example but this can also hurt the convergence of GMRES dramatically. People sometimes use the KSPBCGS algorithm since it does not require all the restart vectors but it can also converge more slowly.Depending on how much memory the sparse matrices use relative to the vectors the vector memory may matter or not. If you are using a recent version of PETSc you can run with -log_view -log_view_memory and it will show on the right side of the columns how much memory is being allocated for each of the operations in various ways. Barry On Feb 3, 2020, at 10:34 AM, Matthew Knepleywrote: On Mon, Feb 3, 2020 at 10:38 AM Дмитрий Мельничук wrote: Hello all! Now I am faced with a problem associated with the memory allocation when calling of KSPSolve . GMRES preconditioned by ASM for solving linear algebra system (obtained by the finite element spatial discretisation of Biot poroelasticity model) was chosen. According to the output value of PetscMallocGetCurrentUsage subroutine 176 MB for matrix and RHS vector storage is required (before KSPSolve calling). But during solving linear algebra system 543 MB of RAM is required (during KSPSolve calling). Thus, the amount of allocated memory after preconditioning stage increased three times. This kind of behaviour is critically for 3D models with several millions of cells. 1) In order to know anything, we have to see the output of -ksp_view, although I see you used an overlap of 2 2) The overlap increases the size of submatrices beyond that of the original matrix. Suppose that you used LU for the sub-preconditioner. You would need at least 2x memory (with ILU(0)) since the matrix dominates memory usage. Moreover, you have overlap and you might have fill-in depending on the solver. 3) The massif tool from valgrind is a good fine-grained way to look at memory allocation Thanks, Matt Is there a way to decrease amout of allocated memory? Is that an expected behaviour for GMRES-ASM combination? As I remember, using previous version of PETSc didn't demonstrate so significante memory increasing. ... Vec :: Vec_F, Vec_U Mat :: Mat_K ... ... call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) call VecAssemblyBegin(Vec_F_mod,ierr) call VecAssemblyEnd(Vec_F_mod,ierr) ... ... call PetscMallocGetCurrentUsage(mem, ierr) print *,"Memory used: ",mem ... ... call KSPSetType(Krylov,KSPGMRES,ierr) call KSPGetPC(Krylov,PreCon,ierr) call PCSetType(PreCon,PCASM,ierr) call KSPSetFromOptions(Krylov,ierr) ... call KSPSolve(Krylov,Vec_F,Vec_U,ierr) ... ... options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor -ksp_converged_reason" Kind regards, Dmitry Melnichuk Matrix.dat (265288024) -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit Description: Binary data