[petsc-users] R: Best workflow for different systems with different block sizes

Paolo Lampitella Thu, 02 Feb 2017 00:07:43 -0800

Barry,

thank you very much.


Hopefully, I will soon be able to come back with some real numbers on some real 
machine, to compare the two approaches.

Paolo

-----Messaggio originale-----
Da: Barry Smith [mailto:[email protected]] 
Inviato: mercoledì 1 febbraio 2017 17:53
A: Paolo Lampitella <[email protected]>
Cc: [email protected]
Oggetto: Re: [petsc-users] Best workflow for different systems with different 
block sizes


   Paolo,

     If the various matrices keep the same nonzero structure throughout the 
simulation there is a slight performance gain to reusing them vs creating and 
destroying them in each outer iteration. Likely not more than say 10% of the 
run time if the linear solves are relatively difficult (i.e. most of the time 
is spent in the linear solves) but it could be a bit more maybe 20% if the 
linear solves are not dominating the time.

    I am not basing this on any quantitive information I have. So it really 
comes down to if you want to run larger problems that require reusing the 
memory.

     Regarding creating the right hand side in your memory and copying to 
PETSc. Creating the right hand side directly in PETSc vectors (with 
VecGetArrayF90()) will be a little faster and require slightly less memory so 
is a "good thing to do" but I suspect you will barely be able to measure the 
difference.

   Barry



> On Feb 1, 2017, at 9:21 AM, Paolo Lampitella <[email protected]> 
> wrote:
> 
> Dear PETSc users and developers,
>  
> I successfully, and very satisfactorily, use PETSc for the linear 
> algebra part of a coupled, preconditioned, density-based, unstructured, 
> cell-centered, finite volume CFD code. In short, the method is the one 
> presented in the paper:
>  
> J.M. Weiss, J.P. Maruszewski, W.A. Smith: Implicit Solution of 
> Preconditioned Navier-Stokes Equations Using Algebraic Multigrid
> http://dx.doi.org/10.2514/2.689
>  
> except for the AMG part as, at the moment, I use GMRES + point Jacobi trough 
> PETSc (this will probably be the subject for another post).
> However, I also have a very simple, internally coded, symmetric block 
> Gauss-Seidel solver.
>  
> The code, written in Fortran with MPI, manages all the aspects of the 
> solver, including outer iterations, with PETSc just handling the 
> resolution of the linear systems at each outer iteration. In particular, note 
> that, for certain combination of models, the solver can end up having 
> different systems of equations to solve, in sequence, at each outer 
> iteration. For example, I might have:
>  
> -          Main equations (with block size = 5 + n species)
> -          Turbulence equations (with block size = number of equations 
> implied by the turbulence model)
> -          Additional Transported Scalar Equations (with block size = number 
> of required scalars)
> -          Etc.
>  
> The way I manage the workflow with the internal GS solver is such that, for 
> each block of equations, at each outer iteration, I do the following:
>  
> -          allocate matrix, solution and rhs
> -          fill the matrix and rhs
> -          solve the system
> -          update the independent variables (system solution is in delta form)
> -          deallocate matrix, solution and rhs
>  
> so that the allocated memory is kept as low as possible at any given 
> time. However, for legacy reasons now obsolete, the PETSc workflow 
> used in the code is different as all the required matrices and rhs are 
> instead created in advance with the routine petsc_create.f90 in attachment. 
> Then iterations start, and at each iteration, each system is solved with the 
> routine petsc_solve.f90, also in attachment (both are included in a dedicated 
> petsc module).
> At the end of the iterations, before the finalization, a petsc_destroy 
> subroutine (not attached) is obviously also called for each matrix/rhs 
> allocated.
> So, in conclusion, I keep in memory all the matrices for the whole time (the 
> matrix structure, of course, doesn’t change with the iterations).
> My first question then is: 
>  
> 1) Is this approach recommended? Wouldn’t, instead, calling my petsc_create 
> inside my petsc_solve be a better option in my case?
> In this case I could avoid storing any petsc matrix or rhs outside the 
> petsc_solve routine.
> Would the overhead implied by the routines called in my petsc_create be 
> sustainable if that subroutine is called at every outer iteration for every 
> system?
>  
> Also, note that the way I fill the RHS and the matrix of the systems 
> for PETSc are different. For the RHS I always allocate mine in the 
> code, which is then copied in the petsc one in the petsc_solve routine. For 
> the matrix, instead, I directly fill the petsc one outside the subroutine, 
> which is then passed already filled to petsc_solve. So, the second question 
> is:
>  
> 2) Independently from the approach at question (1), which method do 
> you suggest to adopt? Storing directly the rhs and the matrix in the petsc 
> ones Or copying them at the solve time? Is there a mainstream approach in 
> such cases?
>  
> I don’t know if this has relevance but, consider that, in my case, every 
> process always writes only its own matrix and rhs entries.
>  
> Unfortunately, at the moment, I have no access to a system where a 
> straightforward test would give a clear answer to this. Still, I guess, the 
> matter is more conceptual than practical.
>  
> Thank you and sorry for the long mail
> <petsc_solve.f90><petsc_create.f90>

[petsc-users] R: Best workflow for different systems with different block sizes

Reply via email to