> On Feb 5, 2020, at 9:03 AM, Дмитрий Мельничук > <[email protected]> wrote: > > Barry, appreciate your response, as always. > > - You are saying that I am using ASM + ILU(0). However, I use PETSc only with > "ASM" as the input parameter for preconditioner. Does it mean that ILU(0) is > default sub-preconditioner for ASM?
Yes > Can I change it using the option "-sub_pc_type"? Yes -sub_pc_type for then it will use SOR on each block instead of ILU saves a matrix. > Does it make sense to you within the scope of my general goal, which is > memory consumption decrease? Can it be useful to vary the "-sub_ksp_type" > option? Probably not. > > - I have run the computation for the same initial matrix with the > "-sub_pc_factor_in_place" option, PC = ASM. Now the process consumed ~400 MB > comparing to 550 MB without this option. This is what I expected, good. > I used "-ksp_view" for this computation, two logs for this computation are > attached: > "ksp_view.txt" - ksp_view option only > "full_log_ASM_factor_in_place.txt" - full log without ksp_view option > > - Then I changed primary preconditioner from ASM to ILU(0) and ran the > computation: memory consumption was again about ~400 MB, no matter if I use > the "-sub_pc_factor_in_place" option. > > - Then I tried to run the computation with ILU(0) and "-pc_factor_in_place", > just in case: the computation did not start, I got an error message, the log > is attached: "Error_ilu_pc_factor.txt" Since that matrix is used for the MatMuilt you cannot do the factorization in place since it replaces the original matrix entries with the factorization matrix entries > > - Then I ran the computation with SOR as a preconditioner. PETSc gave me an > error message, the log is attached: "Error_gmres_sor.txt" This is because our SOR cannot handle zeros on the diagonal. > > - As for the kind of PDEs: I am solving the standard poroelasticity problem, > the formulation can be found in the attached paper > (Zheng_poroelasticity.pdf), pages 2-3. > The file PDE.jpg is a snapshot of PDEs from this paper. > > > So, if you may give me any further advice on how to decrease the consumed > amount of memory to approximately the matrix size (~200 MB in this case), it > will be great. Do I need to focus on searching a proper preconditioner? BTW, > the single ILU(0) did not give me any memory advantage comparing to ASM with > "-sub_pc_factor_in_place". Yes, because in both cases you need two copies of the matrix, for the multiple and for the ILU. But you want a preconditioner that doesn't require any new matrices in the preconditioner. This is difficult. You want an efficient preconditioner that requires essentially no additional memory? -ksp_type gmres or bcgs -pc_type jacobi (the sor won't work because the zero diagonals) It will not be good preconditioner. Are you sure you don't have additional memory for the preconditioner? A good preconditioner might require up to 5 to 6 the memory of the original matrix. > > Have a pleasant day! > > Kind regards, > Dmitry > > > > 04.02.2020, 19:04, "Smith, Barry F." <[email protected]>: > > Please run with the option -ksp_view so we know the exact solver options > you are using. > > From the lines > > MatCreateSubMats 1 1.0 1.9397e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 > 0 0 0 0 0 0 0 0 0 0 > MatGetOrdering 1 1.0 1.1066e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 > 0 0 0 0 0 0 0 0 0 > MatIncreaseOvrlp 1 1.0 3.0324e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 > 0 0 0 0 0 0 0 0 0 0 > > and the fact you have three matrices I would guess you are using the > additive Schwarz preconditioner (ASM) with ILU(0) on the blocks. (which > converges the same as ILU on one process but does use much more memory). > > Note: your code is still built with 32 bit integers. > > I would guess the basic matrix formed plus the vectors in this example > could take ~200 MB. It is the two matrices in the additive Schwarz that is > taking the additional memory. > > What kind of PDEs are you solving and what kind of formulation? > > ASM plus ILU is the "work mans" type preconditioner, relatively robust but > not particularly fast for convergence. Depending on your problem you might be > able to do much better convergence wise by using a PCFIELDSPLIT and a PCGAMG > on one of the splits. In your own run you see the ILU is chugging along > rather slowly to the solution. > > With your current solvers you can use the option -sub_pc_factor_in_place > which will shave off one of the matrices memories. Please try that. > > Avoiding the ASM you can avoid both extra matrices but at the cost of even > slower convergence. Use, for example -pc_type sor > > > The petroleum industry also has a variety of "custom" > preconditioners/solvers for particular models and formulations that can beat > the convergence of general purpose solvers; and require less memory. Some of > these can be implemented or simulated with PETSc. Some of these are > implemented in the commercial petroleum simulation codes and it can be > difficult to get a handle on exactly what they do because of proprietary > issues. I think I have an old text on these approaches in my office, there > may be modern books that discuss these. > > > Barry > > > > > On Feb 4, 2020, at 6:04 AM, Дмитрий Мельничук > <[email protected]> wrote: > > Hello again! > Thank you very much for your replies! > Log is attached. > > 1. The main problem now is following. To solve the matrix that is attached > to my previous e-mail PETSc consumes ~550 MB. > I know definitely that there are commercial softwares in petroleum industry > (e.g., Schlumberger Petrel) that solve the same initial problem consuming > only ~200 MB. > Moreover, I am sure that when I used 32-bit PETSc (GMRES + ASM) a year ago, > it also consumed ~200 MB for this matrix. > > So, my question is: do you have any advice on how to decrease the amount of > RAM consumed for such matrix from 550 MB to 200 MB? Maybe some specific > preconditioner or other ways? > > I will be very grateful for any thoughts! > > 2. The second problem is more particular. > According to resource manager in Windows 10, Fortran solver based on PETSc > consumes 548 MB RAM while solving the system of linear equations. > As I understand it form logs, it is required 459 MB and 52 MB for matrix and > vector storage respectively. After summing of all objects for which memory is > allocated we get only 517 MB. > > Thank you again for your time! Have a nice day. > > Kind regards, > Dmitry > > > 03.02.2020, 19:55, "Smith, Barry F." <[email protected]>: > > GMRES also can by default require about 35 work vectors if it reaches the > full restart. You can set a smaller restart with -ksp_gmres_restart 15 for > example but this can also hurt the convergence of GMRES dramatically. People > sometimes use the KSPBCGS algorithm since it does not require all the restart > vectors but it can also converge more slowly. > > Depending on how much memory the sparse matrices use relative to the > vectors the vector memory may matter or not. > > If you are using a recent version of PETSc you can run with -log_view > -log_view_memory and it will show on the right side of the columns how much > memory is being allocated for each of the operations in various ways. > > Barry > > > > On Feb 3, 2020, at 10:34 AM, Matthew Knepley <[email protected]> wrote: > > On Mon, Feb 3, 2020 at 10:38 AM Дмитрий Мельничук > <[email protected]> wrote: > Hello all! > > Now I am faced with a problem associated with the memory allocation when > calling of KSPSolve . > > GMRES preconditioned by ASM for solving linear algebra system (obtained by > the finite element spatial discretisation of Biot poroelasticity model) was > chosen. > According to the output value of PetscMallocGetCurrentUsage subroutine 176 > MB for matrix and RHS vector storage is required (before KSPSolve calling). > But during solving linear algebra system 543 MB of RAM is required (during > KSPSolve calling). > Thus, the amount of allocated memory after preconditioning stage increased > three times. This kind of behaviour is critically for 3D models with several > millions of cells. > > 1) In order to know anything, we have to see the output of -ksp_view, > although I see you used an overlap of 2 > > 2) The overlap increases the size of submatrices beyond that of the > original matrix. Suppose that you used LU for the sub-preconditioner. > You would need at least 2x memory (with ILU(0)) since the matrix > dominates memory usage. Moreover, you have overlap > and you might have fill-in depending on the solver. > > 3) The massif tool from valgrind is a good fine-grained way to look at > memory allocation > > Thanks, > > Matt > > Is there a way to decrease amout of allocated memory? > Is that an expected behaviour for GMRES-ASM combination? > > As I remember, using previous version of PETSc didn't demonstrate so > significante memory increasing. > > ... > Vec :: Vec_F, Vec_U > Mat :: Mat_K > ... > ... > call MatAssemblyBegin(Mat_M,Mat_Final_Assembly,ierr) > call MatAssemblyEnd(Mat_M,Mat_Final_Assembly,ierr) > .... > call VecAssemblyBegin(Vec_F_mod,ierr) > call VecAssemblyEnd(Vec_F_mod,ierr) > ... > ... > call PetscMallocGetCurrentUsage(mem, ierr) > print *,"Memory used: ",mem > ... > ... > call KSPSetType(Krylov,KSPGMRES,ierr) > call KSPGetPC(Krylov,PreCon,ierr) > call PCSetType(PreCon,PCASM,ierr) > call KSPSetFromOptions(Krylov,ierr) > ... > call KSPSolve(Krylov,Vec_F,Vec_U,ierr) > ... > ... > options = "-pc_asm_overlap 2 -pc_asm_type basic -ksp_monitor > -ksp_converged_reason" > > > Kind regards, > Dmitry Melnichuk > Matrix.dat (265288024) > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > https://www.cse.buffalo.edu/~knepley/ > > > <Logs_26K_GMRES-ASM-log_view-log_view_memory-malloc_dump_32bit> > > <ksp_view.txt><PDE.JPG><Zheng_poroelasticity.pdf><full_log_ASM_factor_in_place.txt><Error_gmres_sor.txt><Error_ilu_pc_factor.txt>
