Hi Frank,

On 6 July 2016 at 00:23, frank <hengj...@uci.edu> wrote:

> Hi,
>
> I am using the CG ksp solver and Multigrid preconditioner  to solve a
> linear system in parallel.
> I chose to use the 'Telescope' as the preconditioner on the coarse mesh
> for its good performance.
> The petsc options file is attached.
>
> The domain is a 3d box.
> It works well when the grid is  1536*128*384 and the process mesh is
> 96*8*24. When I double the size of grid and keep the same process mesh and
> petsc options, I get an "out of memory" error from the super-cluster I am
> using.
>

When you increase the mesh resolution, did you also increasing the number
of effective MG levels?
If the number of levels was held constant, then your coarse grid is
increasing in size.
I notice that you coarsest grid solver is PCSVD.
This can be become expensive as PCSVD will convert your coarse level
operator into a dense matrix and could be the cause of your OOM error.

Telescope does have to store a couple of temporary matrices, but generally
when used in the context of multigrid coarse level solves these operators
represent a very small fraction of the fine level operator.

We need to isolate if it's these temporary matrices from telescope causing
the OOM error, or if they are caused by something else (e.g. PCSVD).



> Each process has access to at least 8G memory, which should be more than
> enough for my application. I am sure that all the other parts of my code(
> except the linear solver ) do not use much memory. So I doubt if there is
> something wrong with the linear solver.
> The error occurs before the linear system is completely solved so I don't
> have the info from ksp view. I am not able to re-produce the error with a
> smaller problem either.
> In addition,  I tried to use the block jacobi as the preconditioner with
> the same grid and same decomposition. The linear solver runs extremely slow
> but there is no memory error.
>
> How can I diagnose what exactly cause the error?
>

This going to be kinda hard as I notice your configuration uses nested
calls to telescope.
You need to debug the solver configuration.

The only way I know to do this is by invoking telescope one step at a time.
By this I mean, use telescope once, check the configuration is what you
want.
 Then add the next instance of telescope.
For solver debugging  purposes, get rid of PCSVD.
The constant null space is propagated with telescope so you can just use an
iterative method.
Furthermore, for debugging purposes, you don't care about the solve time or
even convergence, so set -ksp_max_it 1 everywhere in your solver stack
(e.g. outer most KSP and on the coarsest level).

If one instance of telescope works, e.g. no OOM error occurs, add the next
instance of telescope.
If two instance of telescope also works (no OOM), revert back to PCSVD.
If now you have an OOM error, you should consider adding more levels, or
getting rid of PCSVD as your coarse grid solver.

Lastly, the option

-repart_da_processors_x 24

has been depreciated.
It now inherits the prefix from the solver running on the sub-communicator.
For your use case, it should this be something like
  -mg_coarse_telescope_repart_da_processors_x 24
Use -options_left 1 to verify the option is getting picked up (another
useful tool for solver config debugging).


Cheers
  Dave



> Thank you so much.
>
> Frank
>

Reply via email to