Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Zisheng Ye via petsc-users
Hi Jed

Thanks for your reply. I have sent the log files to petsc-ma...@mcs.anl.gov.

Zisheng

From: Jed Brown 
Sent: Tuesday, June 27, 2023 1:02 PM
To: Zisheng Ye ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] GAMG and Hypre preconditioner

[External Sender]

Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.


Re: [petsc-users] GAMG and Hypre preconditioner

2023-06-27 Thread Jed Brown
Zisheng Ye via petsc-users  writes:

> Dear PETSc Team
>
> We are testing the GPU support in PETSc's KSPSolve, especially for the GAMG 
> and Hypre preconditioners. We have encountered several issues that we would 
> like to ask for your suggestions.
>
> First, we have couple of questions when working with a single MPI rank:
>
>   1.  We have tested two backends, CUDA and Kokkos. One commonly encountered 
> error is related to SpGEMM in CUDA when the mat is large as listed below:
>
> cudaMalloc((void **), bufferSize2) error( cudaErrorMemoryAllocation): 
> out of memory
>
> For CUDA backend, one can use "-matmatmult_backend_cpu -matptap_backend_cpu" 
> to avoid these problems. However, there seems no equivalent options in Kokkos 
> backend. Is there any good practice to avoid this error for both backends and 
> if we can avoid this error in Kokkos backend?

Junchao will know more about KK tuning, but the faster GPU matrix-matrix 
algorithms use extra memory. We should be able to make the host option 
available with kokkos.

>   2.  We have tested the combination of Hypre and Kokkos as backend. It looks 
> like this combination is not compatible with each other, as we observed that 
> KSPSolve takes a greater number of iterations to exit, and the residual norm 
> in the post-checking is much larger than the one obtained when working with 
> CUDA backend. This happens for matrices with block size larger than 1. Is 
> there any explanation to the error?
>
> Second, we have couple more questions when working with multiple MPI ranks:
>
>   1.  We are currently using OpenMPI as we couldnt get Intel MPI to work as a 
> GPU-aware MPI, is this a known issue with Intel MPI?

As far as I know, Intel's MPI is only for SYCL/Intel GPUs. In general, 
GPU-aware MPI has been incredibly flaky on all HPC systems despite being 
introduced ten years ago.

>   2.  With OpenMPI we currently see a slow down when increasing the MPI count 
> as shown in the figure below, is this normal?

Could you share -log_view output from a couple representative runs? You could 
send those here or to petsc-ma...@mcs.anl.gov. We need to see what kind of work 
is not scaling to attribute what may be causing it.


Re: [petsc-users] GAMG failure

2023-03-28 Thread Mark Adams
On Tue, Mar 28, 2023 at 12:38 PM Blaise Bourdin  wrote:

>
>
> On Mar 27, 2023, at 9:11 PM, Mark Adams  wrote:
>
> Yes, the eigen estimates are converging slowly.
>
> BTW, have you tried hypre? It is a good solver (lots lots more woman years)
> These eigen estimates are conceptually simple, but they can lead to
> problems like this (hypre and an eigen estimate free smoother).
>
> I just moved from petsc 3.3 to main, so my experience with an old version
> of hyper has not been very convincing. Strangely enough, ML has always been
> the most efficient PC for me.
>

ML is a good solver.


> Maybe it’s time to revisit.
> That said, I would really like to get decent performances out of gamg. One
> day, I’d like to be able to account for the special structure of
> phase-field fracture in the construction of the coarse space.
>
>
> But try this (good to have options anyway):
>
> -pc_gamg_esteig_ksp_max_it 20
>
> Chevy will scale the estimate that we give by, I think, 5% by default.
> Maybe 10.
> You can set that with:
>
> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,*1.05*
>
> 0.2 is the scaling of the high eigen estimate for the low eigen value in
> Chebyshev.
>
>
>
> Jed’s suggestion of using -pc_gamg_reuse_interpolation 0 worked.
>

OK, have to admit I am surprised.
But I guess with your fracture the matrix/physics/dynamics does change a lot


> I am testing your options at the moment.
>

There are a lot of options and it is cumbersome but they are finite and
good to know.
Glad its working,


>
> Thanks a lot,
>
> Blaise
>
> —
> Canada Research Chair in Mathematical and Computational Aspects of Solid
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>
>


Re: [petsc-users] GAMG failure

2023-03-28 Thread Jed Brown
This suite has been good for my solid mechanics solvers. (It's written here as 
a coarse grid solver because we do matrix-free p-MG first, but you can use it 
directly.)

https://github.com/hypre-space/hypre/issues/601#issuecomment-1069426997

Blaise Bourdin  writes:

>  On Mar 27, 2023, at 9:11 PM, Mark Adams  wrote:
>
>  Yes, the eigen estimates are converging slowly. 
>
>  BTW, have you tried hypre? It is a good solver (lots lots more woman years)
>  These eigen estimates are conceptually simple, but they can lead to problems 
> like this (hypre and an eigen estimate free
>  smoother).
>
> I just moved from petsc 3.3 to main, so my experience with an old version of 
> hyper has not been very convincing. Strangely
> enough, ML has always been the most efficient PC for me. Maybe it’s time to 
> revisit.
> That said, I would really like to get decent performances out of gamg. One 
> day, I’d like to be able to account for the special structure
> of phase-field fracture in the construction of the coarse space.
>
>  But try this (good to have options anyway):
>
>  -pc_gamg_esteig_ksp_max_it 20
>
>  Chevy will scale the estimate that we give by, I think, 5% by default. Maybe 
> 10.
>  You can set that with:
>
>  -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05
>
>  0.2 is the scaling of the high eigen estimate for the low eigen value in 
> Chebyshev.
>
> Jed’s suggestion of using -pc_gamg_reuse_interpolation 0 worked. I am testing 
> your options at the moment.
>
> Thanks a lot,
>
> Blaise
>
> — 
> Canada Research Chair in Mathematical and Computational Aspects of Solid 
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243


Re: [petsc-users] GAMG failure

2023-03-28 Thread Blaise Bourdin






On Mar 27, 2023, at 9:11 PM, Mark Adams  wrote:


Yes, the eigen estimates are converging slowly.


BTW, have you tried hypre? It is a good solver (lots lots more woman years)
These eigen estimates are conceptually simple, but they can lead to problems like this (hypre and an eigen estimate free smoother).



I just moved from petsc 3.3 to main, so my experience with an old version of hyper has not been very convincing. Strangely enough, ML has always been the most efficient PC for me. Maybe it’s time to revisit.
That said, I would really like to get decent performances out of gamg. One day, I’d like to be able to account for the special structure of phase-field fracture in the construction of the coarse space.






But try this (good to have options anyway):


-pc_gamg_esteig_ksp_max_it 20



Chevy will scale the estimate that we give by, I think, 5% by default. Maybe 10.
You can set that with:


-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05



0.2 is the scaling of the high eigen estimate for the low eigen value in Chebyshev.







Jed’s suggestion of using -pc_gamg_reuse_interpolation 0 worked. I am testing your options at the moment.


Thanks a lot,


Blaise


— 
















Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243





















Re: [petsc-users] GAMG failure

2023-03-27 Thread Mark Adams
Yes, the eigen estimates are converging slowly.

BTW, have you tried hypre? It is a good solver (lots lots more woman years)
These eigen estimates are conceptually simple, but they can lead to
problems like this (hypre and an eigen estimate free smoother).

But try this (good to have options anyway):

-pc_gamg_esteig_ksp_max_it 20

Chevy will scale the estimate that we give by, I think, 5% by default.
Maybe 10.
You can set that with:

-mg_levels_ksp_chebyshev_esteig 0,0.2,0,*1.05*

0.2 is the scaling of the high eigen estimate for the low eigen value in
Chebyshev.


On Mon, Mar 27, 2023 at 5:06 PM Blaise Bourdin  wrote:

>
>
> On Mar 24, 2023, at 3:21 PM, Mark Adams  wrote:
>
> * Do you set:
>
> PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
>
> PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));
>
>
> Yes
>
>
> Do that to get CG Eigen estimates. Outright failure is usually caused by a
> bad Eigen estimate.
> -pc_gamg_esteig_ksp_monitor_singular_value
> Will print out the estimates as its iterating. You can look at that to
> check that the max has converged.
>
>
> I just did, and something is off:
> I do multiple calls to SNESSolve (staggered scheme for phase-field
> fracture), but only get informations on the first solve (which is not the
> one failing, of course)
> Here is what I get:
> Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 7.636421712860e+01 % max 1.e+00 min
> 1.e+00 max/min 1.e+00
>   1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min
> 1.114319928921e+00 max/min 1.e+00
>   2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min
> 5.739351119078e-01 max/min 2.615528402732e+00
>   3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min
> 3.263683482596e-01 max/min 5.038329074347e+00
>   4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min
> 2.044075812142e-01 max/min 8.388479789416e+00
>   5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min
> 1.409327403114e-01 max/min 1.248878386367e+01
>   6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min
> 1.023484740555e-01 max/min 1.751853463603e+01
>   7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min
> 7.818018001928e-02 max/min 2.329871248104e+01
>   8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min
> 6.178572472468e-02 max/min 3.002538337458e+01
>   9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min
> 5.008612895206e-02 max/min 3.800609738466e+01
>  10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min
> 4.130572033722e-02 max/min 4.748390475004e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 2.373573910237e+02 % max 1.e+00 min
> 1.e+00 max/min 1.e+00
>   1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min
> 1.081192207576e+00 max/min 1.e+00
>   2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min
> 5.768825326129e-01 max/min 2.333138869267e+00
>   3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min
> 3.070603564913e-01 max/min 4.823655974348e+00
>   4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min
> 1.953487694959e-01 max/min 7.941563771968e+00
>   5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min
> 1.313871480574e-01 max/min 1.221267391199e+01
>   6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min
> 9.322735730688e-02 max/min 1.770389646804e+01
>   7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min
> 6.945419058256e-02 max/min 2.443687883140e+01
>   8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min
> 5.319942519758e-02 max/min 3.265624999621e+01
>   9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min
> 4.229776586667e-02 max/min 4.188657656771e+01
>  10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min
> 3.460455235448e-02 max/min 5.209818645753e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 1.361990679391e+03 % max 1.e+00 min
> 1.e+00 max/min 1.e+00
>   1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min
> 1.086812916769e+00 max/min 1.e+00
>   2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min
> 6.475176340551e-01 max/min 2.276745994212e+00
>   3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min
> 4.391851040105e-01 max/min 3.747963441500e+00
>   4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min
> 2.972886103795e-01 max/min 5.921528614526e+00
>   5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min
> 1.921591413785e-01 max/min 9.571494061277e+00
>   6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min
> 1.275920366984e-01 max/min 1.478981048966e+01
>   7 KSP 

Re: [petsc-users] GAMG failure

2023-03-27 Thread Jed Brown
Try -pc_gamg_reuse_interpolation 0. I thought this was disabled by default, but 
I see pc_gamg->reuse_prol = PETSC_TRUE in the code.

Blaise Bourdin  writes:

>  On Mar 24, 2023, at 3:21 PM, Mark Adams  wrote:
>
>  * Do you set: 
>
>  PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
>
>  PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));
>
> Yes
>
>  Do that to get CG Eigen estimates. Outright failure is usually caused by a 
> bad Eigen estimate.
>  -pc_gamg_esteig_ksp_monitor_singular_value
>  Will print out the estimates as its iterating. You can look at that to check 
> that the max has converged.
>
> I just did, and something is off:
> I do multiple calls to SNESSolve (staggered scheme for phase-field fracture), 
> but only get informations on the first solve (which is
> not the one failing, of course)
> Here is what I get:
> Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 7.636421712860e+01 % max 1.e+00 min 
> 1.e+00 max/min
> 1.e+00
>   1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min 
> 1.114319928921e+00 max/min
> 1.e+00
>   2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min 
> 5.739351119078e-01 max/min
> 2.615528402732e+00
>   3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min 
> 3.263683482596e-01 max/min
> 5.038329074347e+00
>   4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min 
> 2.044075812142e-01 max/min
> 8.388479789416e+00
>   5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min 
> 1.409327403114e-01 max/min
> 1.248878386367e+01
>   6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min 
> 1.023484740555e-01 max/min
> 1.751853463603e+01
>   7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min 
> 7.818018001928e-02 max/min
> 2.329871248104e+01
>   8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min 
> 6.178572472468e-02 max/min
> 3.002538337458e+01
>   9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min 
> 5.008612895206e-02 max/min
> 3.800609738466e+01
>  10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min 
> 4.130572033722e-02 max/min
> 4.748390475004e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 2.373573910237e+02 % max 1.e+00 min 
> 1.e+00 max/min
> 1.e+00
>   1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min 
> 1.081192207576e+00 max/min
> 1.e+00
>   2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min 
> 5.768825326129e-01 max/min
> 2.333138869267e+00
>   3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min 
> 3.070603564913e-01 max/min
> 4.823655974348e+00
>   4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min 
> 1.953487694959e-01 max/min
> 7.941563771968e+00
>   5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min 
> 1.313871480574e-01 max/min
> 1.221267391199e+01
>   6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min 
> 9.322735730688e-02 max/min
> 1.770389646804e+01
>   7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min 
> 6.945419058256e-02 max/min
> 2.443687883140e+01
>   8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min 
> 5.319942519758e-02 max/min
> 3.265624999621e+01
>   9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min 
> 4.229776586667e-02 max/min
> 4.188657656771e+01
>  10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min 
> 3.460455235448e-02 max/min
> 5.209818645753e+01
>   Residual norms for Displacement_pc_gamg_esteig_ solve.
>   0 KSP Residual norm 1.361990679391e+03 % max 1.e+00 min 
> 1.e+00 max/min
> 1.e+00
>   1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min 
> 1.086812916769e+00 max/min
> 1.e+00
>   2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min 
> 6.475176340551e-01 max/min
> 2.276745994212e+00
>   3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min 
> 4.391851040105e-01 max/min
> 3.747963441500e+00
>   4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min 
> 2.972886103795e-01 max/min
> 5.921528614526e+00
>   5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min 
> 1.921591413785e-01 max/min
> 9.571494061277e+00
>   6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min 
> 1.275920366984e-01 max/min
> 1.478981048966e+01
>   7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min 
> 8.853577120467e-02 max/min
> 2.166233300122e+01
>   8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min 
> 6.706949937710e-02 max/min
> 2.886345093206e+01
>   9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min 
> 5.354310733090e-02 max/min
> 3.635059671458e+01
>  10 KSP Residual norm 

Re: [petsc-users] GAMG failure

2023-03-27 Thread Blaise Bourdin






On Mar 24, 2023, at 3:21 PM, Mark Adams  wrote:



* Do you set:


    PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));








    PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));







Yes







Do that to get CG Eigen estimates. Outright failure is usually caused by a bad Eigen estimate.
-pc_gamg_esteig_ksp_monitor_singular_value

Will print out the estimates as its iterating. You can look at that to check that the max has converged.






I just did, and something is off:
I do multiple calls to SNESSolve (staggered scheme for phase-field fracture), but only get informations on the first solve (which is not the one failing, of course)
Here is what I get:


Residual norms for Displacement_pc_gamg_esteig_ solve.
  0 KSP Residual norm 7.636421712860e+01 % max 1.e+00 min 1.e+00 max/min 1.e+00
  1 KSP Residual norm 3.402024867977e+01 % max 1.114319928921e+00 min 1.114319928921e+00 max/min 1.e+00
  2 KSP Residual norm 2.124815079671e+01 % max 1.501143586520e+00 min 5.739351119078e-01 max/min 2.615528402732e+00
  3 KSP Residual norm 1.581785698912e+01 % max 1.644351137983e+00 min 3.263683482596e-01 max/min 5.038329074347e+00
  4 KSP Residual norm 1.254871990315e+01 % max 1.714668863819e+00 min 2.044075812142e-01 max/min 8.388479789416e+00
  5 KSP Residual norm 1.051198229090e+01 % max 1.760078533063e+00 min 1.409327403114e-01 max/min 1.248878386367e+01
  6 KSP Residual norm 9.061658306086e+00 % max 1.792995287686e+00 min 1.023484740555e-01 max/min 1.751853463603e+01
  7 KSP Residual norm 8.015529297567e+00 % max 1.821497535985e+00 min 7.818018001928e-02 max/min 2.329871248104e+01
  8 KSP Residual norm 7.201063258957e+00 % max 1.855140071935e+00 min 6.178572472468e-02 max/min 3.002538337458e+01
  9 KSP Residual norm 6.548491711695e+00 % max 1.903578294573e+00 min 5.008612895206e-02 max/min 3.800609738466e+01
 10 KSP Residual norm 6.002109992255e+00 % max 1.961356890125e+00 min 4.130572033722e-02 max/min 4.748390475004e+01
  Residual norms for Displacement_pc_gamg_esteig_ solve.
  0 KSP Residual norm 2.373573910237e+02 % max 1.e+00 min 1.e+00 max/min 1.e+00
  1 KSP Residual norm 8.845061415709e+01 % max 1.081192207576e+00 min 1.081192207576e+00 max/min 1.e+00
  2 KSP Residual norm 5.607525485152e+01 % max 1.345947059840e+00 min 5.768825326129e-01 max/min 2.333138869267e+00
  3 KSP Residual norm 4.123522550864e+01 % max 1.481153523075e+00 min 3.070603564913e-01 max/min 4.823655974348e+00
  4 KSP Residual norm 3.345765664017e+01 % max 1.551374710727e+00 min 1.953487694959e-01 max/min 7.941563771968e+00
  5 KSP Residual norm 2.859712984893e+01 % max 1.604588395452e+00 min 1.313871480574e-01 max/min 1.221267391199e+01
  6 KSP Residual norm 2.525636054248e+01 % max 1.650487481750e+00 min 9.322735730688e-02 max/min 1.770389646804e+01
  7 KSP Residual norm 2.270711391451e+01 % max 1.697243639599e+00 min 6.945419058256e-02 max/min 2.443687883140e+01
  8 KSP Residual norm 2.074739485241e+01 % max 1.737293728907e+00 min 5.319942519758e-02 max/min 3.265624999621e+01
  9 KSP Residual norm 1.912808268870e+01 % max 1.771708608618e+00 min 4.229776586667e-02 max/min 4.188657656771e+01
 10 KSP Residual norm 1.787394414641e+01 % max 1.802834420843e+00 min 3.460455235448e-02 max/min 5.209818645753e+01
  Residual norms for Displacement_pc_gamg_esteig_ solve.
  0 KSP Residual norm 1.361990679391e+03 % max 1.e+00 min 1.e+00 max/min 1.e+00
  1 KSP Residual norm 5.377188333825e+02 % max 1.086812916769e+00 min 1.086812916769e+00 max/min 1.e+00
  2 KSP Residual norm 2.819790765047e+02 % max 1.474233179517e+00 min 6.475176340551e-01 max/min 2.276745994212e+00
  3 KSP Residual norm 1.856720658591e+02 % max 1.646049713883e+00 min 4.391851040105e-01 max/min 3.747963441500e+00
  4 KSP Residual norm 1.446507859917e+02 % max 1.760403013135e+00 min 2.972886103795e-01 max/min 5.921528614526e+00
  5 KSP Residual norm 1.212491636433e+02 % max 1.839250080524e+00 min 1.921591413785e-01 max/min 9.571494061277e+00
  6 KSP Residual norm 1.052783637696e+02 % max 1.887062042760e+00 min 1.275920366984e-01 max/min 1.478981048966e+01
  7 KSP Residual norm 9.230292625762e+01 % max 1.917891358356e+00 min 8.853577120467e-02 max/min 2.166233300122e+01
  8 KSP Residual norm 8.262607594297e+01 % max 1.935857204308e+00 min 6.706949937710e-02 max/min 2.886345093206e+01
  9 KSP Residual norm 7.616474911000e+01 % max 1.946323901431e+00 min 5.354310733090e-02 max/min 3.635059671458e+01
 10 KSP Residual norm 7.138356892221e+01 % max 1.954382723686e+00 min 4.367661484659e-02 max/min 4.474666204216e+01
  Residual norms for Displacement_pc_gamg_esteig_ solve.
  0 KSP Residual norm 3.702300162209e+03 % max 1.e+00 min 1.e+00 max/min 1.e+00
  1 KSP Residual norm 1.255008322497e+03 % max 

Re: [petsc-users] GAMG failure

2023-03-24 Thread Mark Adams
* Do you set:

PetscCall(MatSetOption(Amat, MAT_SPD, PETSC_TRUE));
PetscCall(MatSetOption(Amat, MAT_SPD_ETERNAL, PETSC_TRUE));

Do that to get CG Eigen estimates. Outright failure is usually caused by a
bad Eigen estimate.
-pc_gamg_esteig_ksp_monitor_singular_value
Will print out the estimates as its iterating. You can look at that to
check that the max has converged.

*  -pc_gamg_aggressive_coarsening 0

will slow coarsening as well as threshold.

* you can run with '-info :pc' and send me the output (grep on GAMG)

Mark

On Fri, Mar 24, 2023 at 2:47 PM Jed Brown  wrote:

> You can -pc_gamg_threshold .02 to slow the coarsening and either stronger
> smoother or increase number of iterations used for estimation (or increase
> tolerance). I assume your system is SPD and you've set the near-null space.
>
> Blaise Bourdin  writes:
>
> > Hi,
> >
> > I am having issue with GAMG for some very ill-conditioned 2D linearized
> elasticity problems (sharp variation of elastic moduli with thin  regions
> of nearly incompressible material). I use snes_type newtonls,
> linesearch_type cp, and pc_type gamg without any further options. pc_type
> Jacobi converges fine (although slowly of course).
> >
> >
> > I am not really surprised that gamg would not converge out of the box,
> but don’t know where to start to investigate the convergence failure. Can
> anybody help?
> >
> > Blaise
> >
> > —
> > Canada Research Chair in Mathematical and Computational Aspects of Solid
> Mechanics (Tier 1)
> > Professor, Department of Mathematics & Statistics
> > Hamilton Hall room 409A, McMaster University
> > 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
> > https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
>


Re: [petsc-users] GAMG failure

2023-03-24 Thread Jed Brown
You can -pc_gamg_threshold .02 to slow the coarsening and either stronger 
smoother or increase number of iterations used for estimation (or increase 
tolerance). I assume your system is SPD and you've set the near-null space.

Blaise Bourdin  writes:

> Hi,
>
> I am having issue with GAMG for some very ill-conditioned 2D linearized 
> elasticity problems (sharp variation of elastic moduli with thin  regions of 
> nearly incompressible material). I use snes_type newtonls, linesearch_type 
> cp, and pc_type gamg without any further options. pc_type Jacobi converges 
> fine (although slowly of course).
>
>
> I am not really surprised that gamg would not converge out of the box, but 
> don’t know where to start to investigate the convergence failure. Can anybody 
> help?
>
> Blaise
>
> — 
> Canada Research Chair in Mathematical and Computational Aspects of Solid 
> Mechanics (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243


Re: [petsc-users] gamg out of memory with gpu

2022-12-26 Thread Matthew Knepley
On Mon, Dec 26, 2022 at 10:29 AM Edoardo Centofanti <
edoardo.centofant...@universitadipavia.it> wrote:

> Thank you for your answer. Can you provide me the full path of the example
> you have in mind? The one I found does not seem to exploit the algebraic
> multigrid, but just the geometric one.
>

cd $PETSC_DIR/src/snes/tutorials/ex5
./ex5 -da_grid_x 64 -da_grid_y 64 -mms 3 -pc_type gang

and for GPUs I think you need the options to move things over

  -dm_vec_type cuda -dm_mat_type aijcusparse

  Thanks,

 Matt


> Thanks,
> Edoardo
>
> Il giorno lun 26 dic 2022 alle ore 15:39 Matthew Knepley <
> knep...@gmail.com> ha scritto:
>
>> On Mon, Dec 26, 2022 at 4:41 AM Edoardo Centofanti <
>> edoardo.centofant...@universitadipavia.it> wrote:
>>
>>> Hi PETSc Users,
>>>
>>> I am experimenting some issues with the GAMG precondtioner when used
>>> with GPU.
>>> In particular, it seems to go out of memory very easily (around 5000
>>> dofs are enough to make it throw the "[0]PETSC ERROR: cuda error 2
>>> (cudaErrorMemoryAllocation) : out of memory" error).
>>> I have these issues both with single and multiple GPUs (on the same or
>>> on different nodes). The exact same problems work like a charm with HYPRE
>>> BoomerAMG on GPUs.
>>> With both preconditioners I exploit the device acceleration by giving
>>> the usual command line options "-dm_vec_type cuda" and "-dm_mat_type
>>> aijcusparse" (I am working with structured meshes). My PETSc version is
>>> 3.17.
>>>
>>> Is this a known issue of the GAMG preconditioner?
>>>
>>
>> No. Can you get it to do this with a PETSc example? Say SNES ex5?
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Thank you in advance,
>>> Edoardo
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] gamg out of memory with gpu

2022-12-26 Thread Edoardo Centofanti
Thank you for your answer. Can you provide me the full path of the example
you have in mind? The one I found does not seem to exploit the algebraic
multigrid, but just the geometric one.

Thanks,
Edoardo

Il giorno lun 26 dic 2022 alle ore 15:39 Matthew Knepley 
ha scritto:

> On Mon, Dec 26, 2022 at 4:41 AM Edoardo Centofanti <
> edoardo.centofant...@universitadipavia.it> wrote:
>
>> Hi PETSc Users,
>>
>> I am experimenting some issues with the GAMG precondtioner when used with
>> GPU.
>> In particular, it seems to go out of memory very easily (around 5000
>> dofs are enough to make it throw the "[0]PETSC ERROR: cuda error 2
>> (cudaErrorMemoryAllocation) : out of memory" error).
>> I have these issues both with single and multiple GPUs (on the same or on
>> different nodes). The exact same problems work like a charm with HYPRE
>> BoomerAMG on GPUs.
>> With both preconditioners I exploit the device acceleration by giving the
>> usual command line options "-dm_vec_type cuda" and "-dm_mat_type
>> aijcusparse" (I am working with structured meshes). My PETSc version is
>> 3.17.
>>
>> Is this a known issue of the GAMG preconditioner?
>>
>
> No. Can you get it to do this with a PETSc example? Say SNES ex5?
>
>   Thanks,
>
>  Matt
>
>
>> Thank you in advance,
>> Edoardo
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] gamg out of memory with gpu

2022-12-26 Thread Matthew Knepley
On Mon, Dec 26, 2022 at 4:41 AM Edoardo Centofanti <
edoardo.centofant...@universitadipavia.it> wrote:

> Hi PETSc Users,
>
> I am experimenting some issues with the GAMG precondtioner when used with
> GPU.
> In particular, it seems to go out of memory very easily (around 5000
> dofs are enough to make it throw the "[0]PETSC ERROR: cuda error 2
> (cudaErrorMemoryAllocation) : out of memory" error).
> I have these issues both with single and multiple GPUs (on the same or on
> different nodes). The exact same problems work like a charm with HYPRE
> BoomerAMG on GPUs.
> With both preconditioners I exploit the device acceleration by giving the
> usual command line options "-dm_vec_type cuda" and "-dm_mat_type
> aijcusparse" (I am working with structured meshes). My PETSc version is
> 3.17.
>
> Is this a known issue of the GAMG preconditioner?
>

No. Can you get it to do this with a PETSc example? Say SNES ex5?

  Thanks,

 Matt


> Thank you in advance,
> Edoardo
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] GAMG and linearized elasticity

2022-12-13 Thread Jed Brown
Do you have slip/symmetry boundary conditions, where some components are 
constrained? In that case, there is no uniform block size and I think you'll 
need DMPlexCreateRigidBody() and MatSetNearNullSpace().

The PCSetCoordinates() code won't work for non-constant block size.

-pc_type gamg should work okay out of the box for elasticity. For hypre, I've 
had good luck with this options suite, which also runs on GPU.

-pc_type hypre -pc_hypre_boomeramg_coarsen_type pmis 
-pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_no_CF 
-pc_hypre_boomeramg_P_max 6 -pc_hypre_boomeramg_relax_type_down Chebyshev 
-pc_hypre_boomeramg_relax_type_up Chebyshev 
-pc_hypre_boomeramg_strong_threshold 0.5

Blaise Bourdin  writes:

> Hi,
>
> I am getting close to finish porting a code from petsc 3.3 / sieve to main / 
> dmplex, but am
> now encountering difficulties 
> I am reasonably sure that the Jacobian and residual are correct. The codes 
> handle boundary
> conditions differently (MatZeroRowCols vs dmplex constraints) so it is not 
> trivial to compare
> them. Running with snes_type ksponly pc_type Jacobi or hyper gives me the 
> same results in
> roughly the same number of iterations.
>
> In my old code, gamg would work out of the box. When using petsc-main, 
> -pc_type gamg -
> pc_gamg_type agg works for _some_ problems using P1-Lagrange elements, but 
> never for
> P2-Lagrange. The typical error message is in gamg_agg.txt
>
> When using -pc_type classical, a problem where the KSP would converge in 47 
> iteration in
> 3.3 now takes 1400.  ksp_view_3.3.txt and ksp_view_main.txt show the output 
> of -ksp_view
> for both versions. I don’t notice anything obvious.
>
> Strangely, removing the call to PCSetCoordinates does not have any impact on 
> the
> convergence.
>
> I am sure that I am missing something, or not passing the right options. 
> What’s a good
> starting point for 3D elasticity?
> Regards,
> Blaise
>
> — 
> Canada Research Chair in Mathematical and Computational Aspects of Solid 
> Mechanics
> (Tier 1)
> Professor, Department of Mathematics & Statistics
> Hamilton Hall room 409A, McMaster University
> 1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
> https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Computed maximum singular value as zero
> [0]PETSC ERROR: WARNING! There are option(s) set that were not used! Could be 
> the program crashed before they were used or a spelling mistake, etc!
> [0]PETSC ERROR: Option left: name:-displacement_ksp_converged_reason value: 
> ascii source: file
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.2-341-g16200351da0  GIT 
> Date: 2022-12-12 23:42:20 +
> [0]PETSC ERROR: 
> /home/bourdinb/Development/mef90/mef90-dmplex/bbserv-gcc11.2.1-mvapich2-2.3.7-O/bin/ThermoElasticity
>  on a bbserv-gcc11.2.1-mvapich2-2.3.7-O named bb01 by bourdinb Tue Dec 13 
> 17:02:19 2022
> [0]PETSC ERROR: Configure options --CFLAGS=-Wunused 
> --FFLAGS="-ffree-line-length-none -fallow-argument-mismatch -Wunused" 
> --COPTFLAGS="-O2 -march=znver2" --CXXOPTFLAGS="-O2 -march=znver2" 
> --FOPTFLAGS="-O2 -march=znver2" --download-chaco=1 --download-exodusii=1 
> --download-fblaslapack=1 --download-hdf5=1 --download-hypre=1 
> --download-metis=1 --download-ml=1 --download-mumps=1 --download-netcdf=1 
> --download-p4est=1 --download-parmetis=1 --download-pnetcdf=1 
> --download-scalapack=1 --download-sowing=1 
> --download-sowing-cc=/opt/rh/devtoolset-9/root/usr/bin/gcc 
> --download-sowing-cxx=/opt/rh/devtoolset-9/root/usr/bin/g++ 
> --download-sowing-cpp=/opt/rh/devtoolset-9/root/usr/bin/cpp 
> --download-sowing-cxxcpp=/opt/rh/devtoolset-9/root/usr/bin/cpp 
> --download-superlu=1 --download-triangle=1 --download-yaml=1 
> --download-zlib=1 --with-debugging=0 
> --with-mpi-dir=/opt/HPC/mvapich2/2.3.7-gcc11.2.1 --with-pic 
> --with-shared-libraries=1 --with-mpiexec=srun --with-x11=0
> [0]PETSC ERROR: #1 PCGAMGOptProlongator_AGG() at 
> /1/HPC/petsc/main/src/ksp/pc/impls/gamg/agg.c:779
> [0]PETSC ERROR: #2 PCSetUp_GAMG() at 
> /1/HPC/petsc/main/src/ksp/pc/impls/gamg/gamg.c:639
> [0]PETSC ERROR: #3 PCSetUp() at 
> /1/HPC/petsc/main/src/ksp/pc/interface/precon.c:994
> [0]PETSC ERROR: #4 KSPSetUp() at 
> /1/HPC/petsc/main/src/ksp/ksp/interface/itfunc.c:405
> [0]PETSC ERROR: #5 KSPSolve_Private() at 
> /1/HPC/petsc/main/src/ksp/ksp/interface/itfunc.c:824
> [0]PETSC ERROR: #6 KSPSolve() at 
> /1/HPC/petsc/main/src/ksp/ksp/interface/itfunc.c:1070
> [0]PETSC ERROR: #7 SNESSolve_KSPONLY() at 
> /1/HPC/petsc/main/src/snes/impls/ksponly/ksponly.c:48
> [0]PETSC ERROR: #8 SNESSolve() at 
> /1/HPC/petsc/main/src/snes/interface/snes.c:4693
> [0]PETSC ERROR: #9 
> 

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-11 Thread Sajid Ali Syed
Hi Mark,

Thanks for the information.

@Junchao: Given that there are known issues with GPU aware MPI, it might be 
best to wait until there is an updated version of cray-mpich (which hopefully 
contains the relevant fixes).

Thank You,
Sajid Ali (he/him) | Research Associate
Scientific Computing Division
Fermi National Accelerator Laboratory
s-sajid-ali.github.io<http://s-sajid-ali.github.io>


From: Mark Adams 
Sent: Thursday, February 10, 2022 8:47 PM
To: Junchao Zhang 
Cc: Sajid Ali Syed ; petsc-users@mcs.anl.gov 

Subject: Re: [petsc-users] GAMG crash during setup when using multiple GPUs

Perlmutter has problems with GPU aware MPI.
This is being actively worked on at NERSc.

Mark

On Thu, Feb 10, 2022 at 9:22 PM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Hi, Sajid Ali,
  I have no clue. I have access to perlmutter.  I am thinking how to debug that.
  If your app is open-sourced and easy to build, then I can build and debug it. 
Otherwise, suppose you build and install petsc (only with options needed by 
your app) to a shared directory, and I can access your executable (which uses 
RPATH for libraries), then maybe I can debug it (I only need to install my own 
petsc to the shared directory)

--Junchao Zhang


On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed 
mailto:sas...@fnal.gov>> wrote:
Hi Junchao,

With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for this 
case with this email.

I also ran with gpu aware mpi to see if I could reproduce the error and got the 
error but from a different location. This logfile is also attached.

This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me know 
if I can share further information to help with debugging this.

Thank You,
Sajid Ali (he/him) | Research Associate
Scientific Computing Division
Fermi National Accelerator Laboratory
s-sajid-ali.github.io<https://urldefense.proofpoint.com/v2/url?u=http-3A__s-2Dsajid-2Dali.github.io=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=Fea4VIbc4UoqdTFjAk3kg3Hp94LYXkjR3gHIdP08lMeT-3zEDZNKDcHjRejBIggW=ezCw13eIYUcCzUki3rlnpGZWZrdcTxlGpG57GqrEz_s=>


From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
Sent: Thursday, February 10, 2022 1:43 PM
To: Sajid Ali Syed mailto:sas...@fnal.gov>>
Cc: petsc-users@mcs.anl.gov<mailto:petsc-users@mcs.anl.gov> 
mailto:petsc-users@mcs.anl.gov>>
Subject: Re: [petsc-users] GAMG crash during setup when using multiple GPUs

Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
mailto:junchao.zh...@gmail.com>> wrote:
Did it fail without GPU at 64 MPI ranks?

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed 
mailto:sas...@fnal.gov>> wrote:

Hi PETSc-developers,

I’m seeing the following crash that occurs during the setup phase of the 
preconditioner when using multiple GPUs. The relevant error trace is shown 
below:

(GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
CUDA_ERROR_ALREADY_MAPPED, line no 272
[24]PETSC ERROR: - Error Message 
--
[24]PETSC ERROR: General MPI error
[24]PETSC ERROR: MPI error 1 Invalid buffer pointer
[24]PETSC ERROR: See 
https://petsc.org/release/faq/<https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=>
 for trouble shooting.
[24]PETSC ERROR: Petsc Development GIT revision: 
f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
...
[24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
[24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
[24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
[24]PETSC ERROR: #4 PetscSFBcastEnd() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
[24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
[24]PETSC ERROR: #6 VecScatterEnd() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
[24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
/tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6k

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Mark Adams
Perlmutter has problems with GPU aware MPI.
This is being actively worked on at NERSc.

Mark

On Thu, Feb 10, 2022 at 9:22 PM Junchao Zhang 
wrote:

> Hi, Sajid Ali,
>   I have no clue. I have access to perlmutter.  I am thinking how to debug
> that.
>   If your app is open-sourced and easy to build, then I can build and
> debug it. Otherwise, suppose you build and install petsc (only with options
> needed by your app) to a shared directory, and I can access your executable
> (which uses RPATH for libraries), then maybe I can debug it (I only need to
> install my own petsc to the shared directory)
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed  wrote:
>
>> Hi Junchao,
>>
>> With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for
>> this case with this email.
>>
>> I also ran with gpu aware mpi to see if I could reproduce the error and
>> got the error but from a different location. This logfile is also attached.
>>
>> This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me
>> know if I can share further information to help with debugging this.
>>
>> Thank You,
>> Sajid Ali (he/him) | Research Associate
>> Scientific Computing Division
>> Fermi National Accelerator Laboratory
>> s-sajid-ali.github.io
>>
>> ----------
>> *From:* Junchao Zhang 
>> *Sent:* Thursday, February 10, 2022 1:43 PM
>> *To:* Sajid Ali Syed 
>> *Cc:* petsc-users@mcs.anl.gov 
>> *Subject:* Re: [petsc-users] GAMG crash during setup when using multiple
>> GPUs
>>
>> Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
>> wrote:
>>
>> Did it fail without GPU at 64 MPI ranks?
>>
>> --Junchao Zhang
>>
>>
>> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:
>>
>> Hi PETSc-developers,
>>
>> I’m seeing the following crash that occurs during the setup phase of the
>> preconditioner when using multiple GPUs. The relevant error trace is shown
>> below:
>>
>> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
>> CUDA_ERROR_ALREADY_MAPPED, line no 272
>> [24]PETSC ERROR: - Error Message 
>> --
>> [24]PETSC ERROR: General MPI error
>> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
>> [24]PETSC ERROR: See https://petsc.org/release/faq/ 
>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=>
>>  for trouble shooting.
>> [24]PETSC ERROR: Petsc Development GIT revision: 
>> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
>> ...
>> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
>> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
>> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
>> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
>> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
>> [24]PETSC ERROR: #6 VecScatterEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
>> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
>>  
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mpiaijcusparse.cu-3A302=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=eMW4lGCKOn_tzQeT5gnM0i9mgEMwwbOe1EkCAtKG9M8=>
>> [24]PETSC ERROR: #8 MatMult() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttlj

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Junchao Zhang
Hi, Sajid Ali,
  I have no clue. I have access to perlmutter.  I am thinking how to debug
that.
  If your app is open-sourced and easy to build, then I can build and debug
it. Otherwise, suppose you build and install petsc (only with options
needed by your app) to a shared directory, and I can access your executable
(which uses RPATH for libraries), then maybe I can debug it (I only need to
install my own petsc to the shared directory)

--Junchao Zhang


On Thu, Feb 10, 2022 at 6:04 PM Sajid Ali Syed  wrote:

> Hi Junchao,
>
> With "-use_gpu_aware_mpi 0" there is no error. I'm attaching the log for
> this case with this email.
>
> I also ran with gpu aware mpi to see if I could reproduce the error and
> got the error but from a different location. This logfile is also attached.
>
> This was using the newest cray-mpich on NERSC-perlmutter (8.1.12). Let me
> know if I can share further information to help with debugging this.
>
> Thank You,
> Sajid Ali (he/him) | Research Associate
> Scientific Computing Division
> Fermi National Accelerator Laboratory
> s-sajid-ali.github.io
>
> --
> *From:* Junchao Zhang 
> *Sent:* Thursday, February 10, 2022 1:43 PM
> *To:* Sajid Ali Syed 
> *Cc:* petsc-users@mcs.anl.gov 
> *Subject:* Re: [petsc-users] GAMG crash during setup when using multiple
> GPUs
>
> Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
> wrote:
>
> Did it fail without GPU at 64 MPI ranks?
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:
>
> Hi PETSc-developers,
>
> I’m seeing the following crash that occurs during the setup phase of the
> preconditioner when using multiple GPUs. The relevant error trace is shown
> below:
>
> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
> CUDA_ERROR_ALREADY_MAPPED, line no 272
> [24]PETSC ERROR: - Error Message 
> --
> [24]PETSC ERROR: General MPI error
> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
> [24]PETSC ERROR: See https://petsc.org/release/faq/ 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__petsc.org_release_faq_=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=ZpvtorGvQdUD8O-wLBTUYUUb6-Kccver8Cc4kXlZ7J0=>
>  for trouble shooting.
> [24]PETSC ERROR: Petsc Development GIT revision: 
> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
> ...
> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
> [24]PETSC ERROR: #6 VecScatterEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
>  
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__mpiaijcusparse.cu-3A302=DwMFaQ=gRgGjJ3BkIsb5y6s49QqsA=w-DPglgoOUOz8eiEyHKz0g=3AFKDE-HT__MEeFxdxlc6bMDLLjchFccw_htjVmWkOsApaEairnUJYnKT28tfsiN=eMW4lGCKOn_tzQeT5gnM0i9mgEMwwbOe1EkCAtKG9M8=>
> [24]PETSC ERROR: #8 MatMult() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438
> [24]PETSC ERROR: #9 PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730
> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421
> [24]PETSC ERROR: #11 KSPGMRESCycle() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Junchao Zhang
Also, try "-use_gpu_aware_mpi 0" to see if there is a difference.

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:40 PM Junchao Zhang 
wrote:

> Did it fail without GPU at 64 MPI ranks?
>
> --Junchao Zhang
>
>
> On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:
>
>> Hi PETSc-developers,
>>
>> I’m seeing the following crash that occurs during the setup phase of the
>> preconditioner when using multiple GPUs. The relevant error trace is shown
>> below:
>>
>> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
>> CUDA_ERROR_ALREADY_MAPPED, line no 272
>> [24]PETSC ERROR: - Error Message 
>> --
>> [24]PETSC ERROR: General MPI error
>> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
>> [24]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>> [24]PETSC ERROR: Petsc Development GIT revision: 
>> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
>> ...
>> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
>> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
>> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
>> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
>> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
>> [24]PETSC ERROR: #6 VecScatterEnd() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
>> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
>> [24]PETSC ERROR: #8 MatMult() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438
>> [24]PETSC ERROR: #9 PCApplyBAorAB() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730
>> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421
>> [24]PETSC ERROR: #11 KSPGMRESCycle() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:162
>> [24]PETSC ERROR: #12 KSPSolve_GMRES() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:247
>> [24]PETSC ERROR: #13 KSPSolve_Private() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:925
>> [24]PETSC ERROR: #14 KSPSolve() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:1103
>> [24]PETSC ERROR: #15 PCGAMGOptProlongator_AGG() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/agg.c:1127
>> [24]PETSC ERROR: #16 PCSetUp_GAMG() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/gamg.c:626
>> [24]PETSC ERROR: #17 PCSetUp() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:1017
>> [24]PETSC ERROR: #18 KSPSetUp() at 
>> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:417
>> [24]PETSC ERROR: #19 main() at poisson3d.c:69
>> [24]PETSC ERROR: PETSc Option Table entries:
>> [24]PETSC ERROR: -dm_mat_type aijcusparse
>> [24]PETSC ERROR: -dm_vec_type cuda
>> [24]PETSC ERROR: -ksp_monitor
>> [24]PETSC ERROR: -ksp_norm_type unpreconditioned
>> [24]PETSC ERROR: -ksp_type cg
>> [24]PETSC ERROR: -ksp_view
>> [24]PETSC ERROR: -log_view
>> [24]PETSC ERROR: -mg_levels_esteig_ksp_type cg
>> [24]PETSC ERROR: -mg_levels_ksp_type chebyshev
>> [24]PETSC ERROR: -mg_levels_pc_type jacobi
>> [24]PETSC ERROR: -pc_gamg_agg_nsmooths 1
>> [24]PETSC ERROR: -pc_gamg_square_graph 1
>> [24]PETSC ERROR: -pc_gamg_threshold 0.0
>> [24]PETSC ERROR: -pc_gamg_threshold_scale 0.0
>> [24]PETSC ERROR: -pc_gamg_type agg
>> [24]PETSC ERROR: -pc_type gamg
>> [24]PETSC ERROR: End of Error Message ---send entire 

Re: [petsc-users] GAMG crash during setup when using multiple GPUs

2022-02-10 Thread Junchao Zhang
Did it fail without GPU at 64 MPI ranks?

--Junchao Zhang


On Thu, Feb 10, 2022 at 1:22 PM Sajid Ali Syed  wrote:

> Hi PETSc-developers,
>
> I’m seeing the following crash that occurs during the setup phase of the
> preconditioner when using multiple GPUs. The relevant error trace is shown
> below:
>
> (GTL DEBUG: 26) cuIpcOpenMemHandle: resource already mapped, 
> CUDA_ERROR_ALREADY_MAPPED, line no 272
> [24]PETSC ERROR: - Error Message 
> --
> [24]PETSC ERROR: General MPI error
> [24]PETSC ERROR: MPI error 1 Invalid buffer pointer
> [24]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [24]PETSC ERROR: Petsc Development GIT revision: 
> f351d5494b5462f62c419e00645ac2e477b88cae  GIT Date: 2022-02-08 15:08:19 +
> ...
> [24]PETSC ERROR: #1 PetscSFLinkWaitRequests_MPI() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfmpi.c:54
> [24]PETSC ERROR: #2 PetscSFLinkFinishCommunication() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/../src/vec/is/sf/impls/basic/sfpack.h:274
> [24]PETSC ERROR: #3 PetscSFBcastEnd_Basic() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/impls/basic/sfbasic.c:218
> [24]PETSC ERROR: #4 PetscSFBcastEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/sf.c:1499
> [24]PETSC ERROR: #5 VecScatterEnd_Internal() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:87
> [24]PETSC ERROR: #6 VecScatterEnd() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/vec/is/sf/interface/vscat.c:1366
> [24]PETSC ERROR: #7 MatMult_MPIAIJCUSPARSE() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/impls/aij/mpi/mpicusparse/mpiaijcusparse.cu:302
> [24]PETSC ERROR: #8 MatMult() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/mat/interface/matrix.c:2438
> [24]PETSC ERROR: #9 PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:730
> [24]PETSC ERROR: #10 KSP_PCApplyBAorAB() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/include/petsc/private/kspimpl.h:421
> [24]PETSC ERROR: #11 KSPGMRESCycle() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:162
> [24]PETSC ERROR: #12 KSPSolve_GMRES() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/impls/gmres/gmres.c:247
> [24]PETSC ERROR: #13 KSPSolve_Private() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:925
> [24]PETSC ERROR: #14 KSPSolve() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:1103
> [24]PETSC ERROR: #15 PCGAMGOptProlongator_AGG() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/agg.c:1127
> [24]PETSC ERROR: #16 PCSetUp_GAMG() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/impls/gamg/gamg.c:626
> [24]PETSC ERROR: #17 PCSetUp() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/pc/interface/precon.c:1017
> [24]PETSC ERROR: #18 KSPSetUp() at 
> /tmp/sajid/spack-stage/spack-stage-petsc-main-mnj56kbexro3fipf6kheyttljzwss7fo/spack-src/src/ksp/ksp/interface/itfunc.c:417
> [24]PETSC ERROR: #19 main() at poisson3d.c:69
> [24]PETSC ERROR: PETSc Option Table entries:
> [24]PETSC ERROR: -dm_mat_type aijcusparse
> [24]PETSC ERROR: -dm_vec_type cuda
> [24]PETSC ERROR: -ksp_monitor
> [24]PETSC ERROR: -ksp_norm_type unpreconditioned
> [24]PETSC ERROR: -ksp_type cg
> [24]PETSC ERROR: -ksp_view
> [24]PETSC ERROR: -log_view
> [24]PETSC ERROR: -mg_levels_esteig_ksp_type cg
> [24]PETSC ERROR: -mg_levels_ksp_type chebyshev
> [24]PETSC ERROR: -mg_levels_pc_type jacobi
> [24]PETSC ERROR: -pc_gamg_agg_nsmooths 1
> [24]PETSC ERROR: -pc_gamg_square_graph 1
> [24]PETSC ERROR: -pc_gamg_threshold 0.0
> [24]PETSC ERROR: -pc_gamg_threshold_scale 0.0
> [24]PETSC ERROR: -pc_gamg_type agg
> [24]PETSC ERROR: -pc_type gamg
> [24]PETSC ERROR: End of Error Message ---send entire 
> error message to petsc-ma...@mcs.anl.gov--
>
> Attached with this email is the full error log and the submit script for a
> 8-node/64-GPU/64 MPI rank job. I’ll also note that the same program did not
> crash 

Re: [petsc-users] GAMG memory consumption

2021-11-24 Thread Dave May
I think your run with -pc_type mg is defining a multigrid hierarchy with a
only single level. (A single level mg PC would also explain the 100+
iterations required to converge.) The gamg configuration is definitely
coarsening your problem and has a deeper hierarchy.  A single level
hierarchy will require less memory than a multilevel hierarchy.

Cheers,
Dave

On Wed 24. Nov 2021 at 19:03, Matthew Knepley  wrote:

> On Wed, Nov 24, 2021 at 12:26 PM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalin...@stfc.ac.uk> wrote:
>
>> Hello,
>>
>>
>>
>> I would like to understand why more memory is consumed by -pc_type gamg
>> compared to -pc_type mg for the same problem size
>>
>>
>>
>> ksp/ksp/tutorial: ./ex45 -da_grid_x 368 -da_grid_x 368 -da_grid_x 368
>> -ksp_type cg
>>
>>
>>
>> -pc_type mg
>>
>>
>>
>> Maximum (over computational time) process memory:total 1.9399e+10
>> max 9.7000e+09 min 9.6992e+09
>>
>>
>>
>> -pc_type gamg
>>
>>
>>
>> Maximum (over computational time) process memory:total 4.9671e+10
>> max 2.4836e+10 min 2.4835e+10
>>
>>
>>
>>
>> Am I right in understanding that the memory limiting factor is ‘max
>> 2.4836e+10’ as it is the maximum memory used at any given time?
>>
>
> Yes, I believe so.
>
> GAMG is using A_C = P^T A P, where P is the prolongation from coarse to
> fine, in order to compute the coarse operator A_C, rather than
> rediscretization, since it does not have any notion of discretization or
> coarse meshes. This takes more memory.
>
>   Thanks,
>
> Matt
>
>
>> I have attached the -log_view output of both the preconditioners.
>>
>>
>>
>> Best regards,
>>
>> Karthik.
>>
>>
>>
>> This email and any attachments are intended solely for the use of the
>> named recipients. If you are not the intended recipient you must not use,
>> disclose, copy or distribute this email or any of its attachments and
>> should notify the sender immediately and delete this email from your
>> system. UK Research and Innovation (UKRI) has taken every reasonable
>> precaution to minimise risk of this email or any attachments containing
>> viruses or malware but the recipient should carry out its own virus and
>> malware checks before opening the attachments. UKRI does not accept any
>> liability for any losses or damages which the recipient may sustain due to
>> presence of any viruses.
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] GAMG memory consumption

2021-11-24 Thread Mark Adams
As Matt said GAMG uses more memory.
But these numbers look odd: max == min and total = max + min, for both
cases.
I would use
https://petsc.org/release/docs/manualpages/Sys/PetscMallocDump.html to look
at this more closely.

On Wed, Nov 24, 2021 at 1:03 PM Matthew Knepley  wrote:

> On Wed, Nov 24, 2021 at 12:26 PM Karthikeyan Chockalingam - STFC UKRI <
> karthikeyan.chockalin...@stfc.ac.uk> wrote:
>
>> Hello,
>>
>>
>>
>> I would like to understand why more memory is consumed by -pc_type gamg
>> compared to -pc_type mg for the same problem size
>>
>>
>>
>> ksp/ksp/tutorial: ./ex45 -da_grid_x 368 -da_grid_x 368 -da_grid_x 368
>> -ksp_type cg
>>
>>
>>
>> -pc_type mg
>>
>>
>>
>> Maximum (over computational time) process memory:total 1.9399e+10
>> max 9.7000e+09 min 9.6992e+09
>>
>>
>>
>> -pc_type gamg
>>
>>
>>
>> Maximum (over computational time) process memory:total 4.9671e+10
>> max 2.4836e+10 min 2.4835e+10
>>
>>
>>
>>
>> Am I right in understanding that the memory limiting factor is ‘max
>> 2.4836e+10’ as it is the maximum memory used at any given time?
>>
>
> Yes, I believe so.
>
> GAMG is using A_C = P^T A P, where P is the prolongation from coarse to
> fine, in order to compute the coarse operator A_C, rather than
> rediscretization, since it does not have any notion of discretization or
> coarse meshes. This takes more memory.
>
>   Thanks,
>
> Matt
>
>
>> I have attached the -log_view output of both the preconditioners.
>>
>>
>>
>> Best regards,
>>
>> Karthik.
>>
>>
>>
>> This email and any attachments are intended solely for the use of the
>> named recipients. If you are not the intended recipient you must not use,
>> disclose, copy or distribute this email or any of its attachments and
>> should notify the sender immediately and delete this email from your
>> system. UK Research and Innovation (UKRI) has taken every reasonable
>> precaution to minimise risk of this email or any attachments containing
>> viruses or malware but the recipient should carry out its own virus and
>> malware checks before opening the attachments. UKRI does not accept any
>> liability for any losses or damages which the recipient may sustain due to
>> presence of any viruses.
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


Re: [petsc-users] GAMG memory consumption

2021-11-24 Thread Matthew Knepley
On Wed, Nov 24, 2021 at 12:26 PM Karthikeyan Chockalingam - STFC UKRI <
karthikeyan.chockalin...@stfc.ac.uk> wrote:

> Hello,
>
>
>
> I would like to understand why more memory is consumed by -pc_type gamg
> compared to -pc_type mg for the same problem size
>
>
>
> ksp/ksp/tutorial: ./ex45 -da_grid_x 368 -da_grid_x 368 -da_grid_x 368
> -ksp_type cg
>
>
>
> -pc_type mg
>
>
>
> Maximum (over computational time) process memory:total 1.9399e+10
> max 9.7000e+09 min 9.6992e+09
>
>
>
> -pc_type gamg
>
>
>
> Maximum (over computational time) process memory:total 4.9671e+10
> max 2.4836e+10 min 2.4835e+10
>
>
>
>
> Am I right in understanding that the memory limiting factor is ‘max
> 2.4836e+10’ as it is the maximum memory used at any given time?
>

Yes, I believe so.

GAMG is using A_C = P^T A P, where P is the prolongation from coarse to
fine, in order to compute the coarse operator A_C, rather than
rediscretization, since it does not have any notion of discretization or
coarse meshes. This takes more memory.

  Thanks,

Matt


> I have attached the -log_view output of both the preconditioners.
>
>
>
> Best regards,
>
> Karthik.
>
>
>
> This email and any attachments are intended solely for the use of the
> named recipients. If you are not the intended recipient you must not use,
> disclose, copy or distribute this email or any of its attachments and
> should notify the sender immediately and delete this email from your
> system. UK Research and Innovation (UKRI) has taken every reasonable
> precaution to minimise risk of this email or any attachments containing
> viruses or malware but the recipient should carry out its own virus and
> malware checks before opening the attachments. UKRI does not accept any
> liability for any losses or damages which the recipient may sustain due to
> presence of any viruses.
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] gamg student questions

2021-10-17 Thread Matthew Knepley
On Sun, Oct 17, 2021 at 9:04 AM Mark Adams  wrote:

> Hi Daniel, [this is a PETSc users list question so let me move it there]
>
> The behavior that you are seeing is a bit odd but not surprising.
>
> First, you should start with simple problems and get AMG (you might want
> to try this exercise with hypre as well: --download-hypre and use -pc_type
> hypre, or BDDC, see below).
>

We have two examples that do this:

  1) SNES ex56: This shows good performance of GAMG on Q1 and Q2 elasticity

  2) SNES ex17: This sets up a lot of finite element elasticity problems
where you can experiment with GAMG, ML, Hypre, BDDC, and other
preconditioners

As a rule of thumb, if my solver is taking more than 100 iterations
(usually for 1e-8 tolerance), something is very wrong. Either the problem
is setup incorrectly, the solver is
configured incorrectly, or I need to switch solvers.

  Thanks,

 Matt


> There are, alas, a lot of tuning parameters in AMG/DD and I recommend a
> homotopy process: you can start with issues that deal with your
> discretization on a simple cube, linear elasticity, cube elements, modest
> Posson ratio, etc., and first get "textbook multigrid efficiency" (TME),
> which for elasticity and a V(2,2) cycle in GAMG is about one digit of error
> reduction per iteration and perfectly monotonic until it hits floating
> point precision.
>
> I would set this problem up and I would hope it runs OK, but the
> problems that you want to do are probably pretty hard (high order FE,
> plasticity, incompressibility) so there will be more work to do.
>
> That said, PETSc has nice domain decomposition solvers that are more
> optimized and maintained for elasticity. Now that I think about it, you
> should probably look at these (
> https://petsc.org/release/docs/manualpages/PC/PCBDDC.html
> https://petsc.org/release/docs/manual/ksp/#balancing-domain-decomposition-by-constraints).
> I think they prefer, but do not require, that you do not assemble your
> element matrices, but let them do it. The docs will make that clear.
>
> BSSC is great but it is not magic, and it is no less complex, so I would
> still recommend the same process of getting TME and then moving to the
> problems that you want to solve.
>
> Good luck,
> Mark
>
>
>
> On Sat, Oct 16, 2021 at 10:50 PM Daniel N Pickard  wrote:
>
>> Hi Dr Adams,
>>
>>
>> I am using the gamg in petsc to solve some elasticity problems for
>> modeling bones. I am new to profiling with petsc, but I am observing that
>> around a thousand iterations my norm has gone down 3 orders of magnitude
>> but the solver slows down and progress sort of stalls. The norm
>> also doesn't decrease monotonically, but jumps around a bit. I also notice
>> that if I request to only use 1 multigrid level, the preconditioner is
>> much cheaper and not as powerful so the code takes more iterations, but
>> runs 2-3x faster. Is this expected that large models require lots of
>> iterations and convergence slows down as we get more accurate? What exactly
>> should I be looking for when I am profiling to try to understand how to run
>> faster? I see that a lot of my ratio's are 2.7, but I think that is because
>> my mesh partitioner is not doing a great job making equal domains. What are
>> the giveaways in the log_view that tell you that petsc could be optimized
>> more?
>>
>>
>> Also when I look at the solution with just 4 orders of magnitude of
>> convergence I can see that the solver has not made much progress in the
>> interior of the domain, but seems to have smoothed out the boundary where
>> forces where applied very well. Does this mean I should use a larger
>> threshold to get more course grids that can fix the low frequency error?
>>
>>
>> Thanks,
>>
>> Daniel Pickard
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] gamg student questions

2021-10-17 Thread Mark Adams
Hi Daniel, [this is a PETSc users list question so let me move it there]

The behavior that you are seeing is a bit odd but not surprising.

First, you should start with simple problems and get AMG (you might want to
try this exercise with hypre as well: --download-hypre and use -pc_type
hypre, or BDDC, see below).

There are, alas, a lot of tuning parameters in AMG/DD and I recommend a
homotopy process: you can start with issues that deal with your
discretization on a simple cube, linear elasticity, cube elements, modest
Posson ratio, etc., and first get "textbook multigrid efficiency" (TME),
which for elasticity and a V(2,2) cycle in GAMG is about one digit of error
reduction per iteration and perfectly monotonic until it hits floating
point precision.

I would set this problem up and I would hope it runs OK, but the
problems that you want to do are probably pretty hard (high order FE,
plasticity, incompressibility) so there will be more work to do.

That said, PETSc has nice domain decomposition solvers that are more
optimized and maintained for elasticity. Now that I think about it, you
should probably look at these (
https://petsc.org/release/docs/manualpages/PC/PCBDDC.html
https://petsc.org/release/docs/manual/ksp/#balancing-domain-decomposition-by-constraints).
I think they prefer, but do not require, that you do not assemble your
element matrices, but let them do it. The docs will make that clear.

BSSC is great but it is not magic, and it is no less complex, so I would
still recommend the same process of getting TME and then moving to the
problems that you want to solve.

Good luck,
Mark



On Sat, Oct 16, 2021 at 10:50 PM Daniel N Pickard  wrote:

> Hi Dr Adams,
>
>
> I am using the gamg in petsc to solve some elasticity problems for
> modeling bones. I am new to profiling with petsc, but I am observing that
> around a thousand iterations my norm has gone down 3 orders of magnitude
> but the solver slows down and progress sort of stalls. The norm
> also doesn't decrease monotonically, but jumps around a bit. I also notice
> that if I request to only use 1 multigrid level, the preconditioner is
> much cheaper and not as powerful so the code takes more iterations, but
> runs 2-3x faster. Is this expected that large models require lots of
> iterations and convergence slows down as we get more accurate? What exactly
> should I be looking for when I am profiling to try to understand how to run
> faster? I see that a lot of my ratio's are 2.7, but I think that is because
> my mesh partitioner is not doing a great job making equal domains. What are
> the giveaways in the log_view that tell you that petsc could be optimized
> more?
>
>
> Also when I look at the solution with just 4 orders of magnitude of
> convergence I can see that the solver has not made much progress in the
> interior of the domain, but seems to have smoothed out the boundary where
> forces where applied very well. Does this mean I should use a larger
> threshold to get more course grids that can fix the low frequency error?
>
>
> Thanks,
>
> Daniel Pickard
>


Re: [petsc-users] GAMG preconditioning

2021-04-12 Thread Barry Smith

  Please send -log_view for the ilu and GAMG case.

  Barry


> On Apr 12, 2021, at 10:34 AM, Milan Pelletier via petsc-users 
>  wrote:
> 
> Dear all,
> 
> I am currently trying to use PETSc with CG solver and GAMG preconditioner.
> I have started with the following set of parameters:
> -ksp_type cg
> -pc_type gamg
> -pc_gamg_agg_nsmooths 1 
> -pc_gamg_threshold 0.02 
> -mg_levels_ksp_type chebyshev 
> -mg_levels_pc_type sor 
> -mg_levels_ksp_max_it 2
> 
> Unfortunately, the preconditioning seems to run extremely slowly. I tried to 
> play around with the numbers, to check if I could notice some difference, but 
> could not observe significant changes. 
> As a comparison, the KSPSetup call with GAMG PC takes more than 10 times 
> longer than completing the whole computation (preconditioning + ~400 KSP 
> iterations to convergence) of the similar case using the following parameters 
> :
> -ksp_type cg
> -pc_type ilu
> -pc_factor_levels 0
> 
> The matrix size for my case is ~1,850,000*1,850,000 elements, with 
> ~38,000,000 non-zero terms (i.e. ~20 per row). For both ILU and AMG cases I 
> use matseqaij/vecseq storage (as a first step I work with only 1 MPI process).
> 
> Is there something wrong in the parameter set I have been using?
> I understand that the preconditioning overhead with AMG is higher than with 
> ILU, but I would also expect CG/GAMG to be competitive against CG/ILU, 
> especially considering the relatively big problem size.
> 
> For information, I am using the PETSc version built from commit 
> 6840fe907c1a3d26068082d180636158471d79a2 (release branch from April 7, 2021). 
> 
> Any clue or idea would be greatly appreciated!
> Thanks for your help,
> 
> Best regards,
> Milan Pelletier
> 
> 



Re: [petsc-users] GAMG preconditioning

2021-04-12 Thread Mark Adams
Can you briefly describe your application,?

AMG usually only works well for straightforward elliptic problems, at least
right out of the box.


On Mon, Apr 12, 2021 at 11:35 AM Milan Pelletier via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear all,
>
> I am currently trying to use PETSc with CG solver and GAMG preconditioner.
> I have started with the following set of parameters:
> -ksp_type cg
> -pc_type gamg
> -pc_gamg_agg_nsmooths 1
> -pc_gamg_threshold 0.02
> -mg_levels_ksp_type chebyshev
> -mg_levels_pc_type sor
> -mg_levels_ksp_max_it 2
>
> Unfortunately, the preconditioning seems to run extremely slowly. I tried
> to play around with the numbers, to check if I could notice some
> difference, but could not observe significant changes.
> As a comparison, the KSPSetup call with GAMG PC takes more than 10 times
> longer than completing the whole computation (preconditioning + ~400 KSP
> iterations to convergence) of the similar case using the following
> parameters :
> -ksp_type cg
> -pc_type ilu
> -pc_factor_levels 0
>
> The matrix size for my case is ~1,850,000*1,850,000 elements, with
> ~38,000,000 non-zero terms (i.e. ~20 per row). For both ILU and AMG cases I
> use matseqaij/vecseq storage (as a first step I work with only 1 MPI
> process).
>
> Is there something wrong in the parameter set I have been using?
> I understand that the preconditioning overhead with AMG is higher than
> with ILU, but I would also expect CG/GAMG to be competitive against CG/ILU,
> especially considering the relatively big problem size.
>
> For information, I am using the PETSc version built from commit
> 6840fe907c1a3d26068082d180636158471d79a2 (release branch from April 7,
> 2021).
>
> Any clue or idea would be greatly appreciated!
> Thanks for your help,
>
> Best regards,
> Milan Pelletier
>
>
>


Re: [petsc-users] GAMG parameters for ideal coarsening ratio

2020-03-17 Thread Mark Adams
On Tue, Mar 17, 2020 at 1:42 PM Sajid Ali 
wrote:

> Hi Mark/Jed,
>
> The problem I'm solving is scalar helmholtz in 2D, (u_t = A*u_xx + A*u_yy
> + F_t*u, with the familiar 5 point central difference as the derivative
> approximation,
>

I assume this is definite HelmHoltz. The time integrator will also add a
mass term. I'm assuming F_t looks like a mass matrix.


> I'm also attaching the result of -info | grep GAMG if that helps). My goal
> is to get weak and strong scaling results for the FD solver (leading me to
> double check all my parameters). I ran the sweep again as Mark suggested
> and it looks like my base params were close to optimal ( negative threshold
> and 10 levels of squaring
>

For low order discretizations, squaring every level, as you are doing,
sound right. And the mass matrix confuses GAMG's filtering heuristics so no
filter sounds reasonable.

Note, hypre would do better than GAMG on this problem.


> with gmres/jacobi smoothers (chebyshev/sor is slower)).
>

You don't want to use GMRES as a smoother (unless you have
indefinite Helmholtz). SOR will be more expensive but often converges a lot
faster. chebyshev/jacobi would probably be better for you.

And you want CG (-ksp_type cg) if this system is symmetric positive
definite.


>
> [image: image.png]
>
> While I think that the base parameters should work well for strong
> scaling, do I have to modify any of my parameters for a weak scaling run ?
> Does GAMG automatically increase the number of mg-levels as grid size
> increases or is it upon the user to do that ?
>
> @Mark : Is there a GAMG implementation paper I should cite ? I've already
> added a citation for the Comput. Mech. (2007) 39: 497–507 as a reference
> for the general idea of applying agglomeration type multigrid
> preconditioning to helmholtz operators.
>
>
> Thank You,
> Sajid Ali | PhD Candidate
> Applied Physics
> Northwestern University
> s-sajid-ali.github.io
>
>


Re: [petsc-users] GAMG parameters for ideal coarsening ratio

2020-03-17 Thread Sajid Ali
 Hi Mark/Jed,

The problem I'm solving is scalar helmholtz in 2D, (u_t = A*u_xx + A*u_yy +
F_t*u, with the familiar 5 point central difference as the derivative
approximation, I'm also attaching the result of -info | grep GAMG if that
helps). My goal is to get weak and strong scaling results for the FD solver
(leading me to double check all my parameters). I ran the sweep again as
Mark suggested and it looks like my base params were close to optimal (
negative threshold and 10 levels of squaring with gmres/jacobi smoothers
(chebyshev/sor is slower)).

[image: image.png]

While I think that the base parameters should work well for strong scaling,
do I have to modify any of my parameters for a weak scaling run ? Does GAMG
automatically increase the number of mg-levels as grid size increases or is
it upon the user to do that ?

@Mark : Is there a GAMG implementation paper I should cite ? I've already
added a citation for the Comput. Mech. (2007) 39: 497–507 as a reference
for the general idea of applying agglomeration type multigrid
preconditioning to helmholtz operators.


Thank You,
Sajid Ali | PhD Candidate
Applied Physics
Northwestern University
s-sajid-ali.github.io


Re: [petsc-users] GAMG parameters for ideal coarsening ratio

2020-03-16 Thread Jed Brown
Sajid Ali  writes:

> Hi PETSc-developers,
>
> As per the manual, the ideal gamg parameters are those which result in
> MatPtAP time being roughly similar to (or just slightly larger) than KSP
> solve times. The way to adjust this is by changing the threshold for
> coarsening and/or squaring the graph. I was working with a grid of size
> 2^14 by 2^14 in a linear & time-independent TS with the following params :
>
> #PETSc Option Table entries:
> -ksp_monitor
> -ksp_rtol 1e-5
> -ksp_type fgmres
> -ksp_view
> -log_view
> -mg_levels_ksp_type gmres
> -mg_levels_pc_type jacobi
> -pc_gamg_coarse_eq_limit 1000
> -pc_gamg_reuse_interpolation true
> -pc_gamg_square_graph 10
> -pc_gamg_threshold -0.04
> -pc_gamg_type agg
> -pc_gamg_use_parallel_coarse_grid_solver
> -pc_mg_monitor
> -pc_type gamg
> -prop_steps 8
> -ts_monitor
> -ts_type cn
> #End of PETSc Option Table entries
>
> With this I get a grid complexity of 1.33047, 6 multigrid levels,
> MatPtAP/KSPSolve ratio of 0.24, and the linear solve at each TS step takes
> 5 iterations (with approx one order of magnitude reduction in residual per
> step for iterations 2 through 5 and two orders for the first). The
> convergence and grid complexity look good, but the ratio of grid coarsening
> time to ksp-solve time is far from ideal. I've attached the log file from
> this set of base parameters as well.
>
> To investigate the effect of coarsening rates, I ran a parameter sweep over
> the coarsening parameters (threshold and sq. graph) and I'm confused by the
> results. For some reason either the number of gamg levels turns out to be
> too high or it is set to 1. When I try to manually set the number of levels
> to 4 (with pc_mg_levels 4 and thres. -0.04/ squaring of 10) I see
> performance much worse than the base parameters. Any advice as to what I'm
> missing in my search for a set of params where MatPtAP to KSPSolve is ~ 1 ?

Your solver looks efficient and the time to setup roughly matches the
solve time:

PCSetUp8 1.0 1.2202e+02 1.0 4.39e+09 1.0 4.9e+05 6.5e+03 
6.3e+02 36 12 19 27 21  36 12 19 27 22  9201
PCApply   40 1.0 1.1077e+02 1.0 2.63e+10 1.0 2.0e+06 3.8e+03 
2.0e+03 33 72 79 65 68  33 72 79 65 68 60662

If you have a specific need to reduce setup time or reduce solve time
(e.g., if you'll do many solves with the same setup), you might be able
to adjust.  But your iteration count is pretty low so probably not a lot
of room in that direction.


Re: [petsc-users] GAMG scalability for serendipity 20 nodes hexahedra

2019-06-27 Thread TARDIEU Nicolas via petsc-users


Thank you very much for your answer, Mark.
Do you think it is worth it, to play around with aggregation variants? Plain 
aggregation "à la Notay" for instance.

Nicolas
 



 

De : mfad...@lbl.gov 
Envoyé : mercredi 26 juin 2019 22:37
À : TARDIEU Nicolas
Cc : PETSc users list
Objet : Re: [petsc-users] GAMG scalability for serendipity 20 nodes hexahedra
  

I get growth with Q2 elements also. I've never seen anyone report scaling of 
high order elements with generic AMG.


First, discretizations are very important for AMG solver. All optimal solvers 
really. I've never looked at serendipity elements. It might be a good idea to 
try Q2 as well.


SNES ex56 is 3D elasticity on a cube with tensor elements. Below are parameters 
that I have been using. I see some evidence that more smoothing steps 
(-mg_levels_ksp_max_it N) helps "scaling" but not necessarily solve time.


An example of what I see, running ex56 with -cells 8,12,16  -max_conv_its 5 and 
the below params I get these iteration counts: 19, 20, 31, 31, 38.


My guess is that you need higher order interpolation for higher order elements 
and when you add a new level you get an increase in condition number (ie, it is 
not an optimal MG method). But, the original smoothed aggregation paper did 
have high order discretizations  their theory said it was still optimal, as I 
recall.


Mark


-log_view
-max_conv_its 5
-petscspace_degree 2
-snes_max_it 2
-ksp_max_it 100
-ksp_type cg
-ksp_rtol 1.e-11
-ksp_atol 1.e-71
-ksp_norm_type unpreconditioned
-snes_rtol 1.e-10
-pc_type gamg
-pc_gamg_type agg
-pc_gamg_agg_nsmooths 1
-pc_gamg_coarse_eq_limit 1000
-pc_gamg_process_eq_limit 200
-pc_gamg_reuse_interpolation true
-ksp_converged_reason
-snes_monitor_short
-ksp_monitor_short
-snes_converged_reason
-use_mat_nearnullspace true
-mg_levels_ksp_max_it 4
-mg_levels_ksp_type chebyshev
-mg_levels_esteig_ksp_type cg
-gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10
-mg_levels_esteig_ksp_max_it 10
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_pc_type jacobi
-petscpartitioner_type simple
-mat_block_size 3
-matptap_via scalable
-run_type 1
-pc_gamg_repartition false
-pc_gamg_threshold 0.0
-pc_gamg_threshold_scale .25
-pc_gamg_square_graph 1
-check_pointer_intensity 0
-snes_type ksponly
-ex56_dm_view
-options_left




 


On Wed, Jun 26, 2019 at 8:21 AM TARDIEU Nicolas via petsc-users 
 wrote:
 
Dear PETSc team,


I have run a simple weak scalability test based on canonical 3D elasticity 
problem : a cube, meshed with 8 nodes hexaedra, clamped on one of its face and 
submited to a pressure load on the opposite face. 
I am using the FGMRES ksp with GAMG as preconditioner. I have set the rigid 
body modes using MatNullSpaceCreateRigidBody and its works like a charm. The 
solver exhibit a perfect scalability until 800 cores (I haven't tested with 
more cores). The ksp always  converges in 11 or 12 iterations. Let me emphasize 
that I use GAMG default options.



Nevertheless, if I switch to a quadratic mesh with 20 nodes serendipity 
hexaedra, the weak scalability deteriorates. For instance the number of 
iteration for the ksp increases from 20 iterations for the smallest problem to 
30 for the biggest. 
Here is my question : I wonder what is the right tuning for GAMG to recover the 
same weak scalability as in the linear case? I apologize if this is a stupid 
question...





 
I  look forward to reading you,  
Nicolas

  

Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à 
l'intention exclusive des destinataires et les informations qui y figurent sont 
strictement confidentielles. Toute utilisation de ce Message non conforme à sa 
destination, toute  diffusion ou toute publication totale ou partielle, est 
interdite sauf autorisation expresse.
Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le 
copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si 
vous avez reçu ce Message par erreur, merci de le supprimer de votre système, 
ainsi que toutes ses  copies, et de n'en garder aucune trace sur quelque 
support que ce soit. Nous vous remercions également d'en avertir immédiatement 
l'expéditeur par retour du message.
Il est impossible de garantir que les communications par messagerie 
électronique arrivent en temps utile, sont sécurisées ou dénuées de toute 
erreur ou virus.

This message and any attachments (the 'Message') are intended solely for the 
addressees. The information contained in this Message is confidential. Any use 
of information contained in this Message not in accord with its purpose, any 
dissemination or disclosure,  either whole or partial, is prohibited except 
formal approval.
If you are not the addressee, you may not copy, forward, disclose or use any 
part of it. If you have received this message in error, please delete it and 
all copies from your system and notify the sender immediately 

Re: [petsc-users] GAMG scalability for serendipity 20 nodes hexahedra

2019-06-26 Thread Mark Adams via petsc-users
I get growth with Q2 elements also. I've never seen anyone report scaling
of high order elements with generic AMG.

First, discretizations are very important for AMG solver. All optimal
solvers really. I've never looked at serendipity elements. It might be a
good idea to try Q2 as well.

SNES ex56 is 3D elasticity on a cube with tensor elements. Below are
parameters that I have been using. I see some evidence that more smoothing
steps (-mg_levels_ksp_max_it N) helps "scaling" but not necessarily solve
time.

An example of what I see, running ex56 with -cells 8,12,16  -max_conv_its 5
and the below params I get these iteration counts: 19, 20, 31, 31, 38.

My guess is that you need higher order interpolation for higher order
elements and when you add a new level you get an increase in condition
number (ie, it is not an optimal MG method). But, the original smoothed
aggregation paper did have high order discretizations their theory said it
was still optimal, as I recall.

Mark

-log_view
-max_conv_its 5
-petscspace_degree 2
-snes_max_it 2
-ksp_max_it 100
-ksp_type cg
-ksp_rtol 1.e-11
-ksp_atol 1.e-71
-ksp_norm_type unpreconditioned
-snes_rtol 1.e-10
-pc_type gamg
-pc_gamg_type agg
-pc_gamg_agg_nsmooths 1
-pc_gamg_coarse_eq_limit 1000
-pc_gamg_process_eq_limit 200
-pc_gamg_reuse_interpolation true
-ksp_converged_reason
-snes_monitor_short
-ksp_monitor_short
-snes_converged_reason
-use_mat_nearnullspace true
-mg_levels_ksp_max_it 4
-mg_levels_ksp_type chebyshev
-mg_levels_esteig_ksp_type cg
-gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10
-mg_levels_esteig_ksp_max_it 10
-mg_levels_ksp_chebyshev_esteig 0,0.05,0,1.05
-mg_levels_pc_type jacobi
-petscpartitioner_type simple
-mat_block_size 3
-matptap_via scalable
-run_type 1
-pc_gamg_repartition false
-pc_gamg_threshold 0.0
-pc_gamg_threshold_scale .25
-pc_gamg_square_graph 1
-check_pointer_intensity 0
-snes_type ksponly
-ex56_dm_view
-options_left



On Wed, Jun 26, 2019 at 8:21 AM TARDIEU Nicolas via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear PETSc team,
>
>
> I have run a simple weak scalability test based on canonical 3D elasticity
> problem : a cube, meshed with 8 nodes hexaedra, clamped on one of its face
> and submited to a pressure load on the opposite face.
>
> I am using the FGMRES ksp with GAMG as preconditioner. I have set the
> rigid body modes using MatNullSpaceCreateRigidBody and its works like a
> charm. The solver exhibit a perfect scalability until 800 cores (I haven't
> tested with more cores). The ksp always converges in 11 or 12 iterations.
> Let me emphasize that I use GAMG default options.
>
>
> Nevertheless, if I switch to a quadratic mesh with 20 nodes serendipity
> hexaedra, the weak scalability deteriorates. For instance the number of
> iteration for the ksp increases from 20 iterations for the smallest problem
> to 30 for the biggest.
>
> Here is my question : I wonder what is the right tuning for GAMG to
> recover the same weak scalability as in the linear case? I apologize if
> this is a stupid question...
>
> I look forward to reading you,
> Nicolas
>
>
> Ce message et toutes les pièces jointes (ci-après le 'Message') sont
> établis à l'intention exclusive des destinataires et les informations qui y
> figurent sont strictement confidentielles. Toute utilisation de ce Message
> non conforme à sa destination, toute diffusion ou toute publication totale
> ou partielle, est interdite sauf autorisation expresse.
>
> Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de
> le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou
> partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de
> votre système, ainsi que toutes ses copies, et de n'en garder aucune trace
> sur quelque support que ce soit. Nous vous remercions également d'en
> avertir immédiatement l'expéditeur par retour du message.
>
> Il est impossible de garantir que les communications par messagerie
> électronique arrivent en temps utile, sont sécurisées ou dénuées de toute
> erreur ou virus.
> 
>
> This message and any attachments (the 'Message') are intended solely for
> the addressees. The information contained in this Message is confidential.
> Any use of information contained in this Message not in accord with its
> purpose, any dissemination or disclosure, either whole or partial, is
> prohibited except formal approval.
>
> If you are not the addressee, you may not copy, forward, disclose or use
> any part of it. If you have received this message in error, please delete
> it and all copies from your system and notify the sender immediately by
> return message.
>
> E-mail communication cannot be guaranteed to be timely secure, error or
> virus-free.
>


Re: [petsc-users] GAMG parallel convergence sensitivity

2019-03-14 Thread Jed Brown via petsc-users
Mark Lohry  writes:

> It seems to me with these semi-implicit methods the CFL limit is still so
> close to the explicit limit (that paper stops at 30), I don't really see
> the purpose unless you're running purely incompressible? That's just my
> ignorance speaking though. I'm currently running fully implicit for
> everything, with CFLs around 1e3 - 1e5 or so.

It depends what you're trying to resolve.  Sounds like maybe you're
stepping toward steady state.  The paper is wishing to resolve vortex
and baroclinic dynamics while stepping over acoustics and barotropic
waves.


Re: [petsc-users] GAMG parallel convergence sensitivity

2019-03-13 Thread Jed Brown via petsc-users
Mark Lohry via petsc-users  writes:

> For what it's worth, I'm regularly solving much larger problems (1M-100M
> unknowns, unsteady) with this discretization and AMG setup on 500+ cores
> with impressively great convergence, dramatically better than ILU/ASM. This
> just happens to be the first time I've experimented with this extremely low
> Mach number, which is known to have a whole host of issues and generally
> needs low-mach preconditioners, I was just a bit surprised by this specific
> failure mechanism.

A common technique for low-Mach preconditioning is to convert to
primitive variables (much better conditioned for the solve) and use a
Schur fieldsplit into the pressure space.  For modest time step, you can
use SIMPLE-like method ("selfp" in PCFieldSplit lingo) to approximate
that Schur complement.  You can also rediscretize to form that
approximation.  This paper has a bunch of examples of choices for the
state variables and derivation of the continuous pressure preconditioner
each case.  (They present it as a classical semi-implicit method, but
that would be the Schur complement preconditioner if using FieldSplit
with a fully implicit or IMEX method.)

https://doi.org/10.1137/090775889


Re: [petsc-users] GAMG parallel convergence sensitivity

2019-03-13 Thread Mark Adams via petsc-users
>
>
>
> Any thoughts here? Is there anything obviously wrong with my setup?
>

Fast and robust solvers for NS require specialized methods that are not
provided in PETSc and the methods tend to require tighter integration with
the meshing and discretization than the algebraic interface supports.

I see you are using 20 smoothing steps. That is very high. Generally you
want to use the v-cycle more (ie, lower number of smoothing steps and more
iterations).

And, full MG is a bit tricky. I would not use it, but if it helps, fine.


> Any way to reduce the dependence of the convergence iterations on the
> parallelism?
>

This comes from the bjacobi smoother. Use jacobi and you will not have a
parallelism problem and you have bjacobi in the limit of parallelism.


> -- obviously I expect the iteration count to be higher in parallel, but I
> didn't expect such catastrophic failure.
>
>
You are beyond what AMG is designed for. If you press this problem it will
break any solver and will break generic AMG relatively early.

This makes it hard to give much advice. You really just need to test things
and use what works best. There are special purpose methods that you can
implement in PETSc but that is a topic for a significant project.


Re: [petsc-users] GAMG scaling

2018-12-24 Thread Mark Adams via petsc-users
On Tue, Dec 25, 2018 at 12:10 AM Jed Brown  wrote:

> Mark Adams  writes:
>
> > On Mon, Dec 24, 2018 at 4:56 PM Jed Brown  wrote:
> >
> >> Mark Adams via petsc-users  writes:
> >>
> >> > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private
> in
> >> > attached, NB, this is code that I wrote in grad school). It is memory
> >> > efficient and simple, just four nested loops i,j,I,J: C(I,J) =
> >> > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data
> that I
> >> am
> >> > getting from my bone modeling colleagues, that use this old code on
> >> > Stampede now, the times look reasonable compared to GAMG. This is
> >> optimized
> >> > for elasticity, where I unroll loops (so it is really six nested
> loops).
> >>
> >> Is the A above meant to include some ghosted rows?
> >>
> >
> > You could but I was thinking of having i in the outer loop. In C(I,J) =
> > P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows
> of
> > A (and the left term P).
>
> Okay, so you need to gather those rows of P referenced by the
> off-diagonal parts of A.


yes, and this looks correct ..


> Once you have them, do
>
>   for i:
> v[:] = 0 # sparse vector
> for j:
>   v[:] += A[i,j] * P[j,:]
> for I:
>   C[I,:] += P[i,I] * v[:]
>
> One inefficiency is that you don't actually get "hits" on all the
> entries of C[I,:], but that much remains no matter how you reorder loops
> (unless you make I the outermost).
>

> >> > In thinking about this now, I think you want to make a local copy of P
> >> with
> >> > rows (j) for every column in A that you have locally, then transpose
> this
> >> > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
> >> > special tree data structure but a matrix is simpler.)
> >>
> >> Why transpose for P(j,J)?
> >>
> >
> > (premature) optimization. I was thinking 'j' being in the inner loop and
> > doing sparse inner product, but now that I think about it there are other
> > options.
>
> Sparse inner products tend to be quite inefficient.  Explicit blocking
> helps some, but I would try to avoid it.
>

Yea, the design space here is non-trivial.

BTW, I have a Cal ME grad student that I've been working with on getting my
old parallel FE / Prometheus code running on Stampede for his bone modeling
problems. He started from zero in HPC but he is eager and has been picking
it up. If there is interest we could get performance data with the existing
code, as a benchmark, and we could generate matrices, if anyone wants to
look into this.


Re: [petsc-users] GAMG scaling

2018-12-24 Thread Jed Brown via petsc-users
Mark Adams  writes:

> On Mon, Dec 24, 2018 at 4:56 PM Jed Brown  wrote:
>
>> Mark Adams via petsc-users  writes:
>>
>> > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
>> > attached, NB, this is code that I wrote in grad school). It is memory
>> > efficient and simple, just four nested loops i,j,I,J: C(I,J) =
>> > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I
>> am
>> > getting from my bone modeling colleagues, that use this old code on
>> > Stampede now, the times look reasonable compared to GAMG. This is
>> optimized
>> > for elasticity, where I unroll loops (so it is really six nested loops).
>>
>> Is the A above meant to include some ghosted rows?
>>
>
> You could but I was thinking of having i in the outer loop. In C(I,J) =
> P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows of
> A (and the left term P).

Okay, so you need to gather those rows of P referenced by the
off-diagonal parts of A.  Once you have them, do

  for i:
v[:] = 0 # sparse vector
for j:
  v[:] += A[i,j] * P[j,:]
for I:
  C[I,:] += P[i,I] * v[:]

One inefficiency is that you don't actually get "hits" on all the
entries of C[I,:], but that much remains no matter how you reorder loops
(unless you make I the outermost).

>> > In thinking about this now, I think you want to make a local copy of P
>> with
>> > rows (j) for every column in A that you have locally, then transpose this
>> > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
>> > special tree data structure but a matrix is simpler.)
>>
>> Why transpose for P(j,J)?
>>
>
> (premature) optimization. I was thinking 'j' being in the inner loop and
> doing sparse inner product, but now that I think about it there are other
> options.

Sparse inner products tend to be quite inefficient.  Explicit blocking
helps some, but I would try to avoid it.


Re: [petsc-users] GAMG scaling

2018-12-24 Thread Mark Adams via petsc-users
On Mon, Dec 24, 2018 at 4:56 PM Jed Brown  wrote:

> Mark Adams via petsc-users  writes:
>
> > Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
> > attached, NB, this is code that I wrote in grad school). It is memory
> > efficient and simple, just four nested loops i,j,I,J: C(I,J) =
> > P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I
> am
> > getting from my bone modeling colleagues, that use this old code on
> > Stampede now, the times look reasonable compared to GAMG. This is
> optimized
> > for elasticity, where I unroll loops (so it is really six nested loops).
>
> Is the A above meant to include some ghosted rows?
>

You could but I was thinking of having i in the outer loop. In C(I,J) =
P(i,I)*A(i,j)*P(j,J), the iteration over 'i' need only be the local rows of
A (and the left term P).


>
> > In thinking about this now, I think you want to make a local copy of P
> with
> > rows (j) for every column in A that you have locally, then transpose this
> > local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
> > special tree data structure but a matrix is simpler.)
>
> Why transpose for P(j,J)?
>

(premature) optimization. I was thinking 'j' being in the inner loop and
doing sparse inner product, but now that I think about it there are other
options.


Re: [petsc-users] GAMG scaling

2018-12-24 Thread Jed Brown via petsc-users
Mark Adams via petsc-users  writes:

> Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
> attached, NB, this is code that I wrote in grad school). It is memory
> efficient and simple, just four nested loops i,j,I,J: C(I,J) =
> P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I am
> getting from my bone modeling colleagues, that use this old code on
> Stampede now, the times look reasonable compared to GAMG. This is optimized
> for elasticity, where I unroll loops (so it is really six nested loops).

Is the A above meant to include some ghosted rows?

> In thinking about this now, I think you want to make a local copy of P with
> rows (j) for every column in A that you have locally, then transpose this
> local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
> special tree data structure but a matrix is simpler.)

Why transpose for P(j,J)?


Re: [petsc-users] GAMG scaling

2018-12-22 Thread Mark Adams via petsc-users
Wow, this is an old thread.

Sorry if I sound like an old fart talking about the good old days but I
originally did RAP. in Prometheus, in a non work optimal way that might be
of interest. Not hard to implement. I bring this up because we continue to
struggle with this damn thing. I think this approach is perfectly scalable
and pretty low overhead, and simple.

Note, I talked to the hypre people about this in like 97 when they were
implementing RAP and perhaps they are doing it this way ... the 4x slower
way.

Anyway, my data for this is in my SC 2004 paper (MakeNextMat_private in
attached, NB, this is code that I wrote in grad school). It is memory
efficient and simple, just four nested loops i,j,I,J: C(I,J) =
P(i,I)*A(i,j)*P(j,J). In eyeballing the numbers and from new data that I am
getting from my bone modeling colleagues, that use this old code on
Stampede now, the times look reasonable compared to GAMG. This is optimized
for elasticity, where I unroll loops (so it is really six nested loops).

In thinking about this now, I think you want to make a local copy of P with
rows (j) for every column in A that you have locally, then transpose this
local thing for the P(j,J) term. A sparse AXPY on j. (My code uses a
special tree data structure but a matrix is simpler.)


On Sat, Dec 22, 2018 at 3:39 AM Mark Adams  wrote:

> OK, so this thread has drifted, see title :)
>
> On Fri, Dec 21, 2018 at 10:01 PM Fande Kong  wrote:
>
>> Sorry, hit the wrong button.
>>
>>
>>
>> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:
>>
>>>
>>>
>>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>>>
 Also, you mentioned that you are using 10 levels. This is very strange
 with GAMG. You can run with -info and grep on GAMG to see the sizes and the
 number of non-zeros per level. You should coarsen at a rate of about 2^D to
 3^D with GAMG (with 10 levels this would imply a very large fine grid
 problem so I suspect there is something strange going on with coarsening).
 Mark

>>>
>>> Hi Mark,
>>>
>>>
>> Thanks for your email. We did not try GAMG much for our problems since we
>> still have troubles to figure out how to effectively use GAMG so far.
>> Instead, we are building our own customized  AMG  that needs to use PtAP to
>> construct coarse matrices.  The customized AMG works pretty well for our
>> specific simulations. The bottleneck right now is that PtAP might
>> take too much memory, and the code crashes within the function "PtAP". I
>> defiantly need a memory profiler to confirm my statement here.
>>
>> Thanks,
>>
>> Fande Kong,
>>
>>
>>
>>>
>>>
>>>

 On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is
>> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
>> release the memory after the coarse matrix is assembled. I understand you
>> may still want to reuse these data structures by default but for my
>> simulation, the preconditioner is fixed and there is no reason to keep 
>> the
>> "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
>> wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable
>>> for matrices over finer grids. You may use option '-matptap_via 
>>> scalable'
>>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>>> works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable
 automatically for "large" problems, which is determined by some 
 heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has 
 small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.

Re: [petsc-users] GAMG scaling

2018-12-22 Thread Mark Adams via petsc-users
OK, so this thread has drifted, see title :)

On Fri, Dec 21, 2018 at 10:01 PM Fande Kong  wrote:

> Sorry, hit the wrong button.
>
>
>
> On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:
>
>>
>>
>> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>>
>>> Also, you mentioned that you are using 10 levels. This is very strange
>>> with GAMG. You can run with -info and grep on GAMG to see the sizes and the
>>> number of non-zeros per level. You should coarsen at a rate of about 2^D to
>>> 3^D with GAMG (with 10 levels this would imply a very large fine grid
>>> problem so I suspect there is something strange going on with coarsening).
>>> Mark
>>>
>>
>> Hi Mark,
>>
>>
> Thanks for your email. We did not try GAMG much for our problems since we
> still have troubles to figure out how to effectively use GAMG so far.
> Instead, we are building our own customized  AMG  that needs to use PtAP to
> construct coarse matrices.  The customized AMG works pretty well for our
> specific simulations. The bottleneck right now is that PtAP might
> take too much memory, and the code crashes within the function "PtAP". I
> defiantly need a memory profiler to confirm my statement here.
>
> Thanks,
>
> Fande Kong,
>
>
>
>>
>>
>>
>>>
>>> On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>>
 Fande:
 I will explore it and get back to you.
 Does anyone know how to profile memory usage?
 Hong

 Thanks, Hong,
>
> I just briefly went through the code. I was wondering if it is
> possible to destroy "c->ptap" (that caches a lot of intermediate data) to
> release the memory after the coarse matrix is assembled. I understand you
> may still want to reuse these data structures by default but for my
> simulation, the preconditioner is fixed and there is no reason to keep the
> "c->ptap".
>

> It would be great, if we could have this optional functionality.
>
> Fande Kong,
>
> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
> wrote:
>
>> We use nonscalable implementation as default, and switch to scalable
>> for matrices over finer grids. You may use option '-matptap_via scalable'
>> to force scalable PtAP  implementation for all PtAP. Let me know if it
>> works.
>> Hong
>>
>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>> wrote:
>>
>>>
>>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>>> for "large" problems, which is determined by some heuristic.
>>>
>>>Barry
>>>
>>>
>>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>>> petsc-users@mcs.anl.gov> wrote:
>>> >
>>> >
>>> >
>>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>>> wrote:
>>> > Fande:
>>> > Hong,
>>> > Thanks for your improvements on PtAP that is critical for MG-type
>>> algorithms.
>>> >
>>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>>> > Mark,
>>> > Below is the copy of my email sent to you on Feb 27:
>>> >
>>> > I implemented scalable MatPtAP and did comparisons of three
>>> implementations using ex56.c on alcf cetus machine (this machine has 
>>> small
>>> memory, 1GB/core):
>>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>>> > - scalable PtAP:   do sparse axpy without use of PN array
>>> >
>>> > What PN means here?
>>> > Global number of columns of P.
>>> >
>>> > - hypre PtAP.
>>> >
>>> > The results are attached. Summary:
>>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than
>>> hypre PtAP
>>> > - scalable PtAP is 4x faster than hypre PtAP
>>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>> >
>>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>>> am implementing an AMG algorithm based on PETSc right now, and it is
>>> working well. But we find some a bottleneck with PtAP. For the same P 
>>> and
>>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, 
>>> while
>>> hypre still can generates the coarse matrix.
>>> >
>>> > I do not want to just use the HYPRE one because we had to
>>> duplicate matrices if I used HYPRE PtAP.
>>> >
>>> > It would be nice if you guys already have done some compassions on
>>> these implementations for the memory usage.
>>> > Do you encounter memory issue with  scalable PtAP?
>>> >
>>> > By default do we use the scalable PtAP?? Do we have to specify
>>> some options to use the scalable version of PtAP?  If so, it would be 
>>> nice
>>> to use the scalable version by default.  I am totally missing something
>>> here.
>>> >
>>> > Thanks,
>>> >
>>> > Fande
>>> >
>>> >
>>> > Karl had a student in the summer who improved MatPtAP(). Do you
>>> use the latest version of 

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Fande Kong via petsc-users
Sorry, hit the wrong button.



On Fri, Dec 21, 2018 at 7:56 PM Fande Kong  wrote:

>
>
> On Fri, Dec 21, 2018 at 9:44 AM Mark Adams  wrote:
>
>> Also, you mentioned that you are using 10 levels. This is very strange
>> with GAMG. You can run with -info and grep on GAMG to see the sizes and the
>> number of non-zeros per level. You should coarsen at a rate of about 2^D to
>> 3^D with GAMG (with 10 levels this would imply a very large fine grid
>> problem so I suspect there is something strange going on with coarsening).
>> Mark
>>
>
> Hi Mark,
>
>
Thanks for your email. We did not try GAMG much for our problems since we
still have troubles to figure out how to effectively use GAMG so far.
Instead, we are building our own customized  AMG  that needs to use PtAP to
construct coarse matrices.  The customized AMG works pretty well for our
specific simulations. The bottleneck right now is that PtAP might
take too much memory, and the code crashes within the function "PtAP". I
defiantly need a memory profiler to confirm my statement here.

Thanks,

Fande Kong,



>
>
>
>>
>> On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>>
>>> Fande:
>>> I will explore it and get back to you.
>>> Does anyone know how to profile memory usage?
>>> Hong
>>>
>>> Thanks, Hong,

 I just briefly went through the code. I was wondering if it is possible
 to destroy "c->ptap" (that caches a lot of intermediate data) to release
 the memory after the coarse matrix is assembled. I understand you may still
 want to reuse these data structures by default but for my simulation, the
 preconditioner is fixed and there is no reason to keep the "c->ptap".

>>>
 It would be great, if we could have this optional functionality.

 Fande Kong,

 On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:

> We use nonscalable implementation as default, and switch to scalable
> for matrices over finer grids. You may use option '-matptap_via scalable'
> to force scalable PtAP  implementation for all PtAP. Let me know if it
> works.
> Hong
>
> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
> wrote:
>
>>
>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>> for "large" problems, which is determined by some heuristic.
>>
>>Barry
>>
>>
>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>> >
>> >
>> >
>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>> wrote:
>> > Fande:
>> > Hong,
>> > Thanks for your improvements on PtAP that is critical for MG-type
>> algorithms.
>> >
>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>> > Mark,
>> > Below is the copy of my email sent to you on Feb 27:
>> >
>> > I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has 
>> small
>> memory, 1GB/core):
>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>> > - scalable PtAP:   do sparse axpy without use of PN array
>> >
>> > What PN means here?
>> > Global number of columns of P.
>> >
>> > - hypre PtAP.
>> >
>> > The results are attached. Summary:
>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>> PtAP
>> > - scalable PtAP is 4x faster than hypre PtAP
>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> >
>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>> am implementing an AMG algorithm based on PETSc right now, and it is
>> working well. But we find some a bottleneck with PtAP. For the same P and
>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, 
>> while
>> hypre still can generates the coarse matrix.
>> >
>> > I do not want to just use the HYPRE one because we had to duplicate
>> matrices if I used HYPRE PtAP.
>> >
>> > It would be nice if you guys already have done some compassions on
>> these implementations for the memory usage.
>> > Do you encounter memory issue with  scalable PtAP?
>> >
>> > By default do we use the scalable PtAP?? Do we have to specify some
>> options to use the scalable version of PtAP?  If so, it would be nice to
>> use the scalable version by default.  I am totally missing something 
>> here.
>> >
>> > Thanks,
>> >
>> > Fande
>> >
>> >
>> > Karl had a student in the summer who improved MatPtAP(). Do you use
>> the latest version of petsc?
>> > HYPRE may use less memory than PETSc because it does not save and
>> reuse the matrices.
>> >
>> > I do not understand why generating coarse matrix fails due to out
>> of memory. Do you use direct solver at coarse grid?
>> > Hong
>> >
>> 

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Fande Kong via petsc-users
Thanks so much, Hong,

If any new finding, please let me know.


On Fri, Dec 21, 2018 at 9:36 AM Zhang, Hong  wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
>

We are using gperftools
https://gperftools.github.io/gperftools/heapprofile.html

Fande,



> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is possible
>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>> the memory after the coarse matrix is assembled. I understand you may still
>> want to reuse these data structures by default but for my simulation, the
>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable for
>>> matrices over finer grids. You may use option '-matptap_via scalable' to
>>> force scalable PtAP  implementation for all PtAP. Let me know if it works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
 for "large" problems, which is determined by some heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.
 >
 > - hypre PtAP.
 >
 > The results are attached. Summary:
 > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
 PtAP
 > - scalable PtAP is 4x faster than hypre PtAP
 > - hypre uses less memory (see job.ne399.n63.np1000.sh)
 >
 > I was wondering how much more memory PETSc PtAP uses than hypre? I am
 implementing an AMG algorithm based on PETSc right now, and it is working
 well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
 PtAP fails to generate a coarse matrix due to out of memory, while hypre
 still can generates the coarse matrix.
 >
 > I do not want to just use the HYPRE one because we had to duplicate
 matrices if I used HYPRE PtAP.
 >
 > It would be nice if you guys already have done some compassions on
 these implementations for the memory usage.
 > Do you encounter memory issue with  scalable PtAP?
 >
 > By default do we use the scalable PtAP?? Do we have to specify some
 options to use the scalable version of PtAP?  If so, it would be nice to
 use the scalable version by default.  I am totally missing something here.
 >
 > Thanks,
 >
 > Fande
 >
 >
 > Karl had a student in the summer who improved MatPtAP(). Do you use
 the latest version of petsc?
 > HYPRE may use less memory than PETSc because it does not save and
 reuse the matrices.
 >
 > I do not understand why generating coarse matrix fails due to out of
 memory. Do you use direct solver at coarse grid?
 > Hong
 >
 > Based on above observation, I set the default PtAP algorithm as
 'nonscalable'.
 > When PN > local estimated nonzero of C=PtAP, then switch default to
 'scalable'.
 > User can overwrite default.
 >
 > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
 > MatPtAP   3.6224e+01 (nonscalable for small mats,
 scalable for larger ones)
 > scalable MatPtAP 4.6129e+01
 > hypre1.9389e+02
 >
 > This work in on petsc-master. Give it a try. If you encounter any
 problem, let me know.
 >
 > Hong
 >
 > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
 > (Hong), what is the current state of optimizing RAP for scaling?
 >
 > Nate, is driving 3D elasticity problems at scaling with GAMG and we
 are working out performance problems. They are hitting problems at ~1.5B
 dof problems on a basic Cray (XC30 I think).
 >
 > Thanks,
 > Mark
 >




Re: [petsc-users] GAMG scaling

2018-12-21 Thread Matthew Knepley via petsc-users
On Fri, Dec 21, 2018 at 12:55 PM Zhang, Hong  wrote:

> Matt:
>
>> Does anyone know how to profile memory usage?
>>>
>>
>> The best serial way is to use Massif, which is part of valgrind. I think
>> it might work in parallel if you
>> only look at one process at a time.
>>
>
> Can you give an example of using  Massif?
> For example, how to use it on petsc/src/ksp/ksp/examples/tutorials/ex56.c
> with np=8?
>

I have not used it in a while, so I have nothing laying around. However,
the manual is very good:

http://valgrind.org/docs/manual/ms-manual.html

  Thanks,

Matt


> Hong
>
>>
>>
>>> Hong
>>>
>>> Thanks, Hong,

 I just briefly went through the code. I was wondering if it is possible
 to destroy "c->ptap" (that caches a lot of intermediate data) to release
 the memory after the coarse matrix is assembled. I understand you may still
 want to reuse these data structures by default but for my simulation, the
 preconditioner is fixed and there is no reason to keep the "c->ptap".

>>>
 It would be great, if we could have this optional functionality.

 Fande Kong,

 On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:

> We use nonscalable implementation as default, and switch to scalable
> for matrices over finer grids. You may use option '-matptap_via scalable'
> to force scalable PtAP  implementation for all PtAP. Let me know if it
> works.
> Hong
>
> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
> wrote:
>
>>
>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
>> for "large" problems, which is determined by some heuristic.
>>
>>Barry
>>
>>
>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>> >
>> >
>> >
>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
>> wrote:
>> > Fande:
>> > Hong,
>> > Thanks for your improvements on PtAP that is critical for MG-type
>> algorithms.
>> >
>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>> > Mark,
>> > Below is the copy of my email sent to you on Feb 27:
>> >
>> > I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has 
>> small
>> memory, 1GB/core):
>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>> > - scalable PtAP:   do sparse axpy without use of PN array
>> >
>> > What PN means here?
>> > Global number of columns of P.
>> >
>> > - hypre PtAP.
>> >
>> > The results are attached. Summary:
>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
>> PtAP
>> > - scalable PtAP is 4x faster than hypre PtAP
>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> >
>> > I was wondering how much more memory PETSc PtAP uses than hypre? I
>> am implementing an AMG algorithm based on PETSc right now, and it is
>> working well. But we find some a bottleneck with PtAP. For the same P and
>> A, PETSc PtAP fails to generate a coarse matrix due to out of memory, 
>> while
>> hypre still can generates the coarse matrix.
>> >
>> > I do not want to just use the HYPRE one because we had to duplicate
>> matrices if I used HYPRE PtAP.
>> >
>> > It would be nice if you guys already have done some compassions on
>> these implementations for the memory usage.
>> > Do you encounter memory issue with  scalable PtAP?
>> >
>> > By default do we use the scalable PtAP?? Do we have to specify some
>> options to use the scalable version of PtAP?  If so, it would be nice to
>> use the scalable version by default.  I am totally missing something 
>> here.
>> >
>> > Thanks,
>> >
>> > Fande
>> >
>> >
>> > Karl had a student in the summer who improved MatPtAP(). Do you use
>> the latest version of petsc?
>> > HYPRE may use less memory than PETSc because it does not save and
>> reuse the matrices.
>> >
>> > I do not understand why generating coarse matrix fails due to out
>> of memory. Do you use direct solver at coarse grid?
>> > Hong
>> >
>> > Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> > User can overwrite default.
>> >
>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I
>> get
>> > MatPtAP   3.6224e+01 (nonscalable for small mats,
>> scalable for larger ones)
>> > scalable MatPtAP 4.6129e+01
>> > hypre1.9389e+02
>> >
>> > This work in on petsc-master. Give it a try. If you encounter any
>> problem, let me know.
>> >
>> > Hong
>> >
>> > On Wed, May 3, 2017 at 

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Zhang, Hong via petsc-users
Matt:
Does anyone know how to profile memory usage?

The best serial way is to use Massif, which is part of valgrind. I think it 
might work in parallel if you
only look at one process at a time.

Can you give an example of using  Massif?
For example, how to use it on petsc/src/ksp/ksp/examples/tutorials/ex56.c with 
np=8?
Hong

Hong

Thanks, Hong,

I just briefly went through the code. I was wondering if it is possible to 
destroy "c->ptap" (that caches a lot of intermediate data) to release the 
memory after the coarse matrix is assembled. I understand you may still want to 
reuse these data structures by default but for my simulation, the 
preconditioner is fixed and there is no reason to keep the "c->ptap".

It would be great, if we could have this optional functionality.

Fande Kong,

On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
mailto:hzh...@mcs.anl.gov>> wrote:
We use nonscalable implementation as default, and switch to scalable for 
matrices over finer grids. You may use option '-matptap_via scalable' to force 
scalable PtAP  implementation for all PtAP. Let me know if it works.
Hong

On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

  See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for 
"large" problems, which is determined by some heuristic.

   Barry


> On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
>
>
>
> On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
> mailto:hzh...@mcs.anl.gov>> wrote:
> Fande:
> Hong,
> Thanks for your improvements on PtAP that is critical for MG-type algorithms.
>
> On Wed, May 3, 2017 at 10:17 AM Hong 
> mailto:hzh...@mcs.anl.gov>> wrote:
> Mark,
> Below is the copy of my email sent to you on Feb 27:
>
> I implemented scalable MatPtAP and did comparisons of three implementations 
> using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:   do sparse axpy without use of PN array
>
> What PN means here?
> Global number of columns of P.
>
> - hypre PtAP.
>
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see 
> job.ne399.n63.np1000.sh)
>
> I was wondering how much more memory PETSc PtAP uses than hypre? I am 
> implementing an AMG algorithm based on PETSc right now, and it is working 
> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc 
> PtAP fails to generate a coarse matrix due to out of memory, while hypre 
> still can generates the coarse matrix.
>
> I do not want to just use the HYPRE one because we had to duplicate matrices 
> if I used HYPRE PtAP.
>
> It would be nice if you guys already have done some compassions on these 
> implementations for the memory usage.
> Do you encounter memory issue with  scalable PtAP?
>
> By default do we use the scalable PtAP?? Do we have to specify some options 
> to use the scalable version of PtAP?  If so, it would be nice to use the 
> scalable version by default.  I am totally missing something here.
>
> Thanks,
>
> Fande
>
>
> Karl had a student in the summer who improved MatPtAP(). Do you use the 
> latest version of petsc?
> HYPRE may use less memory than PETSc because it does not save and reuse the 
> matrices.
>
> I do not understand why generating coarse matrix fails due to out of memory. 
> Do you use direct solver at coarse grid?
> Hong
>
> Based on above observation, I set the default PtAP algorithm as 'nonscalable'.
> When PN > local estimated nonzero of C=PtAP, then switch default to 
> 'scalable'.
> User can overwrite default.
>
> For the case of np=8000, ne=599 (see 
> job.ne599.n500.np8000.sh), I get
> MatPtAP   3.6224e+01 (nonscalable for small mats, scalable 
> for larger ones)
> scalable MatPtAP 4.6129e+01
> hypre1.9389e+02
>
> This work in on petsc-master. Give it a try. If you encounter any problem, 
> let me know.
>
> Hong
>
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams 
> mailto:mfad...@lbl.gov>> wrote:
> (Hong), what is the current state of optimizing RAP for scaling?
>
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are 
> working out performance problems. They are hitting problems at ~1.5B dof 
> problems on a basic Cray (XC30 I think).
>
> Thanks,
> Mark
>



--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/


Re: [petsc-users] GAMG scaling

2018-12-21 Thread Matthew Knepley via petsc-users
On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
>

The best serial way is to use Massif, which is part of valgrind. I think it
might work in parallel if you
only look at one process at a time.

  Matt


> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is possible
>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>> the memory after the coarse matrix is assembled. I understand you may still
>> want to reuse these data structures by default but for my simulation, the
>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable for
>>> matrices over finer grids. You may use option '-matptap_via scalable' to
>>> force scalable PtAP  implementation for all PtAP. Let me know if it works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
 for "large" problems, which is determined by some heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.
 >
 > - hypre PtAP.
 >
 > The results are attached. Summary:
 > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
 PtAP
 > - scalable PtAP is 4x faster than hypre PtAP
 > - hypre uses less memory (see job.ne399.n63.np1000.sh)
 >
 > I was wondering how much more memory PETSc PtAP uses than hypre? I am
 implementing an AMG algorithm based on PETSc right now, and it is working
 well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
 PtAP fails to generate a coarse matrix due to out of memory, while hypre
 still can generates the coarse matrix.
 >
 > I do not want to just use the HYPRE one because we had to duplicate
 matrices if I used HYPRE PtAP.
 >
 > It would be nice if you guys already have done some compassions on
 these implementations for the memory usage.
 > Do you encounter memory issue with  scalable PtAP?
 >
 > By default do we use the scalable PtAP?? Do we have to specify some
 options to use the scalable version of PtAP?  If so, it would be nice to
 use the scalable version by default.  I am totally missing something here.
 >
 > Thanks,
 >
 > Fande
 >
 >
 > Karl had a student in the summer who improved MatPtAP(). Do you use
 the latest version of petsc?
 > HYPRE may use less memory than PETSc because it does not save and
 reuse the matrices.
 >
 > I do not understand why generating coarse matrix fails due to out of
 memory. Do you use direct solver at coarse grid?
 > Hong
 >
 > Based on above observation, I set the default PtAP algorithm as
 'nonscalable'.
 > When PN > local estimated nonzero of C=PtAP, then switch default to
 'scalable'.
 > User can overwrite default.
 >
 > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
 > MatPtAP   3.6224e+01 (nonscalable for small mats,
 scalable for larger ones)
 > scalable MatPtAP 4.6129e+01
 > hypre1.9389e+02
 >
 > This work in on petsc-master. Give it a try. If you encounter any
 problem, let me know.
 >
 > Hong
 >
 > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
 > (Hong), what is the current state of optimizing RAP for scaling?
 >
 > Nate, is driving 3D elasticity problems at scaling with GAMG and we
 are working out performance problems. They are hitting problems at ~1.5B
 dof problems on a basic Cray (XC30 I think).
 >
 > Thanks,
 > Mark
 >



-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Mark Adams via petsc-users
Also, you mentioned that you are using 10 levels. This is very strange with
GAMG. You can run with -info and grep on GAMG to see the sizes and the
number of non-zeros per level. You should coarsen at a rate of about 2^D to
3^D with GAMG (with 10 levels this would imply a very large fine grid
problem so I suspect there is something strange going on with coarsening).
Mark

On Fri, Dec 21, 2018 at 11:36 AM Zhang, Hong via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Fande:
> I will explore it and get back to you.
> Does anyone know how to profile memory usage?
> Hong
>
> Thanks, Hong,
>>
>> I just briefly went through the code. I was wondering if it is possible
>> to destroy "c->ptap" (that caches a lot of intermediate data) to release
>> the memory after the coarse matrix is assembled. I understand you may still
>> want to reuse these data structures by default but for my simulation, the
>> preconditioner is fixed and there is no reason to keep the "c->ptap".
>>
>
>> It would be great, if we could have this optional functionality.
>>
>> Fande Kong,
>>
>> On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:
>>
>>> We use nonscalable implementation as default, and switch to scalable for
>>> matrices over finer grids. You may use option '-matptap_via scalable' to
>>> force scalable PtAP  implementation for all PtAP. Let me know if it works.
>>> Hong
>>>
>>> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
>>> wrote:
>>>

   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically
 for "large" problems, which is determined by some heuristic.

Barry


 > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
 petsc-users@mcs.anl.gov> wrote:
 >
 >
 >
 > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
 wrote:
 > Fande:
 > Hong,
 > Thanks for your improvements on PtAP that is critical for MG-type
 algorithms.
 >
 > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
 > Mark,
 > Below is the copy of my email sent to you on Feb 27:
 >
 > I implemented scalable MatPtAP and did comparisons of three
 implementations using ex56.c on alcf cetus machine (this machine has small
 memory, 1GB/core):
 > - nonscalable PtAP: use an array of length PN to do dense axpy
 > - scalable PtAP:   do sparse axpy without use of PN array
 >
 > What PN means here?
 > Global number of columns of P.
 >
 > - hypre PtAP.
 >
 > The results are attached. Summary:
 > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre
 PtAP
 > - scalable PtAP is 4x faster than hypre PtAP
 > - hypre uses less memory (see job.ne399.n63.np1000.sh)
 >
 > I was wondering how much more memory PETSc PtAP uses than hypre? I am
 implementing an AMG algorithm based on PETSc right now, and it is working
 well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
 PtAP fails to generate a coarse matrix due to out of memory, while hypre
 still can generates the coarse matrix.
 >
 > I do not want to just use the HYPRE one because we had to duplicate
 matrices if I used HYPRE PtAP.
 >
 > It would be nice if you guys already have done some compassions on
 these implementations for the memory usage.
 > Do you encounter memory issue with  scalable PtAP?
 >
 > By default do we use the scalable PtAP?? Do we have to specify some
 options to use the scalable version of PtAP?  If so, it would be nice to
 use the scalable version by default.  I am totally missing something here.
 >
 > Thanks,
 >
 > Fande
 >
 >
 > Karl had a student in the summer who improved MatPtAP(). Do you use
 the latest version of petsc?
 > HYPRE may use less memory than PETSc because it does not save and
 reuse the matrices.
 >
 > I do not understand why generating coarse matrix fails due to out of
 memory. Do you use direct solver at coarse grid?
 > Hong
 >
 > Based on above observation, I set the default PtAP algorithm as
 'nonscalable'.
 > When PN > local estimated nonzero of C=PtAP, then switch default to
 'scalable'.
 > User can overwrite default.
 >
 > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
 > MatPtAP   3.6224e+01 (nonscalable for small mats,
 scalable for larger ones)
 > scalable MatPtAP 4.6129e+01
 > hypre1.9389e+02
 >
 > This work in on petsc-master. Give it a try. If you encounter any
 problem, let me know.
 >
 > Hong
 >
 > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
 > (Hong), what is the current state of optimizing RAP for scaling?
 >
 > Nate, is driving 3D elasticity problems at scaling with GAMG and we
 are working out performance problems. They are hitting problems at ~1.5B
 dof problems on a basic Cray 

Re: [petsc-users] GAMG scaling

2018-12-21 Thread Zhang, Hong via petsc-users
Fande:
I will explore it and get back to you.
Does anyone know how to profile memory usage?
Hong

Thanks, Hong,

I just briefly went through the code. I was wondering if it is possible to 
destroy "c->ptap" (that caches a lot of intermediate data) to release the 
memory after the coarse matrix is assembled. I understand you may still want to 
reuse these data structures by default but for my simulation, the 
preconditioner is fixed and there is no reason to keep the "c->ptap".

It would be great, if we could have this optional functionality.

Fande Kong,

On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong 
mailto:hzh...@mcs.anl.gov>> wrote:
We use nonscalable implementation as default, and switch to scalable for 
matrices over finer grids. You may use option '-matptap_via scalable' to force 
scalable PtAP  implementation for all PtAP. Let me know if it works.
Hong

On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

  See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for 
"large" problems, which is determined by some heuristic.

   Barry


> On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
>
>
>
> On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
> mailto:hzh...@mcs.anl.gov>> wrote:
> Fande:
> Hong,
> Thanks for your improvements on PtAP that is critical for MG-type algorithms.
>
> On Wed, May 3, 2017 at 10:17 AM Hong 
> mailto:hzh...@mcs.anl.gov>> wrote:
> Mark,
> Below is the copy of my email sent to you on Feb 27:
>
> I implemented scalable MatPtAP and did comparisons of three implementations 
> using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:   do sparse axpy without use of PN array
>
> What PN means here?
> Global number of columns of P.
>
> - hypre PtAP.
>
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see 
> job.ne399.n63.np1000.sh)
>
> I was wondering how much more memory PETSc PtAP uses than hypre? I am 
> implementing an AMG algorithm based on PETSc right now, and it is working 
> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc 
> PtAP fails to generate a coarse matrix due to out of memory, while hypre 
> still can generates the coarse matrix.
>
> I do not want to just use the HYPRE one because we had to duplicate matrices 
> if I used HYPRE PtAP.
>
> It would be nice if you guys already have done some compassions on these 
> implementations for the memory usage.
> Do you encounter memory issue with  scalable PtAP?
>
> By default do we use the scalable PtAP?? Do we have to specify some options 
> to use the scalable version of PtAP?  If so, it would be nice to use the 
> scalable version by default.  I am totally missing something here.
>
> Thanks,
>
> Fande
>
>
> Karl had a student in the summer who improved MatPtAP(). Do you use the 
> latest version of petsc?
> HYPRE may use less memory than PETSc because it does not save and reuse the 
> matrices.
>
> I do not understand why generating coarse matrix fails due to out of memory. 
> Do you use direct solver at coarse grid?
> Hong
>
> Based on above observation, I set the default PtAP algorithm as 'nonscalable'.
> When PN > local estimated nonzero of C=PtAP, then switch default to 
> 'scalable'.
> User can overwrite default.
>
> For the case of np=8000, ne=599 (see 
> job.ne599.n500.np8000.sh), I get
> MatPtAP   3.6224e+01 (nonscalable for small mats, scalable 
> for larger ones)
> scalable MatPtAP 4.6129e+01
> hypre1.9389e+02
>
> This work in on petsc-master. Give it a try. If you encounter any problem, 
> let me know.
>
> Hong
>
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams 
> mailto:mfad...@lbl.gov>> wrote:
> (Hong), what is the current state of optimizing RAP for scaling?
>
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are 
> working out performance problems. They are hitting problems at ~1.5B dof 
> problems on a basic Cray (XC30 I think).
>
> Thanks,
> Mark
>



Re: [petsc-users] GAMG scaling

2018-12-20 Thread Fande Kong via petsc-users
Thanks, Hong,

I just briefly went through the code. I was wondering if it is possible to
destroy "c->ptap" (that caches a lot of intermediate data) to release the
memory after the coarse matrix is assembled. I understand you may still
want to reuse these data structures by default but for my simulation, the
preconditioner is fixed and there is no reason to keep the "c->ptap".

It would be great, if we could have this optional functionality.

Fande Kong,

On Thu, Dec 20, 2018 at 9:45 PM Zhang, Hong  wrote:

> We use nonscalable implementation as default, and switch to scalable for
> matrices over finer grids. You may use option '-matptap_via scalable' to
> force scalable PtAP  implementation for all PtAP. Let me know if it works.
> Hong
>
> On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
> wrote:
>
>>
>>   See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for
>> "large" problems, which is determined by some heuristic.
>>
>>Barry
>>
>>
>> > On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users <
>> petsc-users@mcs.anl.gov> wrote:
>> >
>> >
>> >
>> > On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong  wrote:
>> > Fande:
>> > Hong,
>> > Thanks for your improvements on PtAP that is critical for MG-type
>> algorithms.
>> >
>> > On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
>> > Mark,
>> > Below is the copy of my email sent to you on Feb 27:
>> >
>> > I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has small
>> memory, 1GB/core):
>> > - nonscalable PtAP: use an array of length PN to do dense axpy
>> > - scalable PtAP:   do sparse axpy without use of PN array
>> >
>> > What PN means here?
>> > Global number of columns of P.
>> >
>> > - hypre PtAP.
>> >
>> > The results are attached. Summary:
>> > - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>> > - scalable PtAP is 4x faster than hypre PtAP
>> > - hypre uses less memory (see job.ne399.n63.np1000.sh)
>> >
>> > I was wondering how much more memory PETSc PtAP uses than hypre? I am
>> implementing an AMG algorithm based on PETSc right now, and it is working
>> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc
>> PtAP fails to generate a coarse matrix due to out of memory, while hypre
>> still can generates the coarse matrix.
>> >
>> > I do not want to just use the HYPRE one because we had to duplicate
>> matrices if I used HYPRE PtAP.
>> >
>> > It would be nice if you guys already have done some compassions on
>> these implementations for the memory usage.
>> > Do you encounter memory issue with  scalable PtAP?
>> >
>> > By default do we use the scalable PtAP?? Do we have to specify some
>> options to use the scalable version of PtAP?  If so, it would be nice to
>> use the scalable version by default.  I am totally missing something here.
>> >
>> > Thanks,
>> >
>> > Fande
>> >
>> >
>> > Karl had a student in the summer who improved MatPtAP(). Do you use the
>> latest version of petsc?
>> > HYPRE may use less memory than PETSc because it does not save and reuse
>> the matrices.
>> >
>> > I do not understand why generating coarse matrix fails due to out of
>> memory. Do you use direct solver at coarse grid?
>> > Hong
>> >
>> > Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> > When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> > User can overwrite default.
>> >
>> > For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
>> > MatPtAP   3.6224e+01 (nonscalable for small mats,
>> scalable for larger ones)
>> > scalable MatPtAP 4.6129e+01
>> > hypre1.9389e+02
>> >
>> > This work in on petsc-master. Give it a try. If you encounter any
>> problem, let me know.
>> >
>> > Hong
>> >
>> > On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
>> > (Hong), what is the current state of optimizing RAP for scaling?
>> >
>> > Nate, is driving 3D elasticity problems at scaling with GAMG and we are
>> working out performance problems. They are hitting problems at ~1.5B dof
>> problems on a basic Cray (XC30 I think).
>> >
>> > Thanks,
>> > Mark
>> >
>>
>>


Re: [petsc-users] GAMG scaling

2018-12-20 Thread Zhang, Hong via petsc-users
We use nonscalable implementation as default, and switch to scalable for 
matrices over finer grids. You may use option '-matptap_via scalable' to force 
scalable PtAP  implementation for all PtAP. Let me know if it works.
Hong

On Thu, Dec 20, 2018 at 8:16 PM Smith, Barry F. 
mailto:bsm...@mcs.anl.gov>> wrote:

  See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for 
"large" problems, which is determined by some heuristic.

   Barry


> On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users 
> mailto:petsc-users@mcs.anl.gov>> wrote:
>
>
>
> On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong 
> mailto:hzh...@mcs.anl.gov>> wrote:
> Fande:
> Hong,
> Thanks for your improvements on PtAP that is critical for MG-type algorithms.
>
> On Wed, May 3, 2017 at 10:17 AM Hong 
> mailto:hzh...@mcs.anl.gov>> wrote:
> Mark,
> Below is the copy of my email sent to you on Feb 27:
>
> I implemented scalable MatPtAP and did comparisons of three implementations 
> using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:   do sparse axpy without use of PN array
>
> What PN means here?
> Global number of columns of P.
>
> - hypre PtAP.
>
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see 
> job.ne399.n63.np1000.sh)
>
> I was wondering how much more memory PETSc PtAP uses than hypre? I am 
> implementing an AMG algorithm based on PETSc right now, and it is working 
> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc 
> PtAP fails to generate a coarse matrix due to out of memory, while hypre 
> still can generates the coarse matrix.
>
> I do not want to just use the HYPRE one because we had to duplicate matrices 
> if I used HYPRE PtAP.
>
> It would be nice if you guys already have done some compassions on these 
> implementations for the memory usage.
> Do you encounter memory issue with  scalable PtAP?
>
> By default do we use the scalable PtAP?? Do we have to specify some options 
> to use the scalable version of PtAP?  If so, it would be nice to use the 
> scalable version by default.  I am totally missing something here.
>
> Thanks,
>
> Fande
>
>
> Karl had a student in the summer who improved MatPtAP(). Do you use the 
> latest version of petsc?
> HYPRE may use less memory than PETSc because it does not save and reuse the 
> matrices.
>
> I do not understand why generating coarse matrix fails due to out of memory. 
> Do you use direct solver at coarse grid?
> Hong
>
> Based on above observation, I set the default PtAP algorithm as 'nonscalable'.
> When PN > local estimated nonzero of C=PtAP, then switch default to 
> 'scalable'.
> User can overwrite default.
>
> For the case of np=8000, ne=599 (see 
> job.ne599.n500.np8000.sh), I get
> MatPtAP   3.6224e+01 (nonscalable for small mats, scalable 
> for larger ones)
> scalable MatPtAP 4.6129e+01
> hypre1.9389e+02
>
> This work in on petsc-master. Give it a try. If you encounter any problem, 
> let me know.
>
> Hong
>
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams 
> mailto:mfad...@lbl.gov>> wrote:
> (Hong), what is the current state of optimizing RAP for scaling?
>
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are 
> working out performance problems. They are hitting problems at ~1.5B dof 
> problems on a basic Cray (XC30 I think).
>
> Thanks,
> Mark
>



Re: [petsc-users] GAMG scaling

2018-12-20 Thread Smith, Barry F. via petsc-users


  See MatPtAP_MPIAIJ_MPIAIJ(). It switches to scalable automatically for 
"large" problems, which is determined by some heuristic.

   Barry


> On Dec 20, 2018, at 6:46 PM, Fande Kong via petsc-users 
>  wrote:
> 
> 
> 
> On Thu, Dec 20, 2018 at 4:43 PM Zhang, Hong  wrote:
> Fande:
> Hong,
> Thanks for your improvements on PtAP that is critical for MG-type algorithms. 
> 
> On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
> Mark,
> Below is the copy of my email sent to you on Feb 27:
> 
> I implemented scalable MatPtAP and did comparisons of three implementations 
> using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:   do sparse axpy without use of PN array
> 
> What PN means here?
> Global number of columns of P. 
> 
> - hypre PtAP.
> 
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see job.ne399.n63.np1000.sh)
> 
> I was wondering how much more memory PETSc PtAP uses than hypre? I am 
> implementing an AMG algorithm based on PETSc right now, and it is working 
> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc 
> PtAP fails to generate a coarse matrix due to out of memory, while hypre 
> still can generates the coarse matrix.
> 
> I do not want to just use the HYPRE one because we had to duplicate matrices 
> if I used HYPRE PtAP.
> 
> It would be nice if you guys already have done some compassions on these 
> implementations for the memory usage.
> Do you encounter memory issue with  scalable PtAP?
> 
> By default do we use the scalable PtAP?? Do we have to specify some options 
> to use the scalable version of PtAP?  If so, it would be nice to use the 
> scalable version by default.  I am totally missing something here. 
> 
> Thanks,
> 
> Fande
> 
>  
> Karl had a student in the summer who improved MatPtAP(). Do you use the 
> latest version of petsc?
> HYPRE may use less memory than PETSc because it does not save and reuse the 
> matrices.
> 
> I do not understand why generating coarse matrix fails due to out of memory. 
> Do you use direct solver at coarse grid?
> Hong
> 
> Based on above observation, I set the default PtAP algorithm as 
> 'nonscalable'. 
> When PN > local estimated nonzero of C=PtAP, then switch default to 
> 'scalable'.
> User can overwrite default.
> 
> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
> MatPtAP   3.6224e+01 (nonscalable for small mats, scalable 
> for larger ones)
> scalable MatPtAP 4.6129e+01
> hypre1.9389e+02 
> 
> This work in on petsc-master. Give it a try. If you encounter any problem, 
> let me know.
> 
> Hong
> 
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
> (Hong), what is the current state of optimizing RAP for scaling?
> 
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are 
> working out performance problems. They are hitting problems at ~1.5B dof 
> problems on a basic Cray (XC30 I think).
> 
> Thanks,
> Mark
> 



Re: [petsc-users] GAMG scaling

2018-12-20 Thread Smith, Barry F. via petsc-users



> On Dec 20, 2018, at 5:51 PM, Zhang, Hong via petsc-users 
>  wrote:
> 
> Fande:
> Hong,
> Thanks for your improvements on PtAP that is critical for MG-type algorithms. 
> 
> On Wed, May 3, 2017 at 10:17 AM Hong  wrote:
> Mark,
> Below is the copy of my email sent to you on Feb 27:
> 
> I implemented scalable MatPtAP and did comparisons of three implementations 
> using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:   do sparse axpy without use of PN array
> 
> What PN means here?
> Global number of columns of P. 
> 
> - hypre PtAP.
> 
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see job.ne399.n63.np1000.sh)
> 
> I was wondering how much more memory PETSc PtAP uses than hypre? I am 
> implementing an AMG algorithm based on PETSc right now, and it is working 
> well. But we find some a bottleneck with PtAP. For the same P and A, PETSc 
> PtAP fails to generate a coarse matrix due to out of memory, while hypre 
> still can generates the coarse matrix.
> 
> I do not want to just use the HYPRE one because we had to duplicate matrices 
> if I used HYPRE PtAP.
> 
> It would be nice if you guys already have done some compassions on these 
> implementations for the memory usage.
> Do you encounter memory issue with  scalable PtAP? Karl had a student in the 
> summer who improved MatPtAP(). Do you use the latest version of petsc?
> HYPRE may use less memory than PETSc because it does not save and reuse the 
> matrices.

   Could PETSc have an option where it does not save and reuse the matrices? 
And thus require less memory but with more compute time for multiple setups? 
How much memory would it save, 20%, 50%? 

   Barry

> 
> I do not understand why generating coarse matrix fails due to out of memory. 
> Do you use direct solver at coarse grid?
> Hong
> 
> Based on above observation, I set the default PtAP algorithm as 
> 'nonscalable'. 
> When PN > local estimated nonzero of C=PtAP, then switch default to 
> 'scalable'.
> User can overwrite default.
> 
> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
> MatPtAP   3.6224e+01 (nonscalable for small mats, scalable 
> for larger ones)
> scalable MatPtAP 4.6129e+01
> hypre1.9389e+02 
> 
> This work in on petsc-master. Give it a try. If you encounter any problem, 
> let me know.
> 
> Hong
> 
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
> (Hong), what is the current state of optimizing RAP for scaling?
> 
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are 
> working out performance problems. They are hitting problems at ~1.5B dof 
> problems on a basic Cray (XC30 I think).
> 
> Thanks,
> Mark
> 



Re: [petsc-users] GAMG scaling

2018-12-20 Thread Zhang, Hong via petsc-users
Fande:
Hong,
Thanks for your improvements on PtAP that is critical for MG-type algorithms.

On Wed, May 3, 2017 at 10:17 AM Hong 
mailto:hzh...@mcs.anl.gov>> wrote:
Mark,
Below is the copy of my email sent to you on Feb 27:

I implemented scalable MatPtAP and did comparisons of three implementations 
using ex56.c on alcf cetus machine (this machine has small memory, 1GB/core):
- nonscalable PtAP: use an array of length PN to do dense axpy
- scalable PtAP:   do sparse axpy without use of PN array

What PN means here?
Global number of columns of P.

- hypre PtAP.

The results are attached. Summary:
- nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
- scalable PtAP is 4x faster than hypre PtAP
- hypre uses less memory (see 
job.ne399.n63.np1000.sh)

I was wondering how much more memory PETSc PtAP uses than hypre? I am 
implementing an AMG algorithm based on PETSc right now, and it is working well. 
But we find some a bottleneck with PtAP. For the same P and A, PETSc PtAP fails 
to generate a coarse matrix due to out of memory, while hypre still can 
generates the coarse matrix.

I do not want to just use the HYPRE one because we had to duplicate matrices if 
I used HYPRE PtAP.

It would be nice if you guys already have done some compassions on these 
implementations for the memory usage.
Do you encounter memory issue with  scalable PtAP? Karl had a student in the 
summer who improved MatPtAP(). Do you use the latest version of petsc?
HYPRE may use less memory than PETSc because it does not save and reuse the 
matrices.

I do not understand why generating coarse matrix fails due to out of memory. Do 
you use direct solver at coarse grid?
Hong

Based on above observation, I set the default PtAP algorithm as 'nonscalable'.
When PN > local estimated nonzero of C=PtAP, then switch default to 'scalable'.
User can overwrite default.

For the case of np=8000, ne=599 (see 
job.ne599.n500.np8000.sh), I get
MatPtAP   3.6224e+01 (nonscalable for small mats, scalable for 
larger ones)
scalable MatPtAP 4.6129e+01
hypre1.9389e+02

This work in on petsc-master. Give it a try. If you encounter any problem, let 
me know.

Hong

On Wed, May 3, 2017 at 10:01 AM, Mark Adams 
mailto:mfad...@lbl.gov>> wrote:
(Hong), what is the current state of optimizing RAP for scaling?

Nate, is driving 3D elasticity problems at scaling with GAMG and we are working 
out performance problems. They are hitting problems at ~1.5B dof problems on a 
basic Cray (XC30 I think).

Thanks,
Mark



Re: [petsc-users] GAMG Parallel Performance

2018-11-15 Thread Smith, Barry F. via petsc-users



> On Nov 15, 2018, at 1:02 PM, Mark Adams  wrote:
> 
> There is a lot of load imbalance in VecMAXPY also. The partitioning could be 
> bad and if not its the machine.


> 
> On Thu, Nov 15, 2018 at 1:56 PM Smith, Barry F. via petsc-users 
>  wrote:
> 
> Something is odd about your configuration. Just consider the time for 
> VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI processes 
> it produces
> 
> Time  
>   
> flop rate
>  VecMAXPY 575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021
> 
> on 1500 processes it produces
> 
>  VecMAXPY 583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187
> 
> that is it actually takes longer (the time goes from .84 seconds to 1.08 
> seconds and the flop rate from 1,600,021 down to 1,289,187) You would never 
> expect this kind of behavior
> 
> and on 2000 processes it produces
> 
> VecMAXPY 583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00 
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563
> 
> so it speeds up again but not by very much. This is very mysterious and not 
> what you would expect.
> 
>I'm inclined to believe something is out of whack on your computer, are 
> you sure all nodes on the computer are equivalent? Same processors, same 
> clock speeds? What happens if you run the 1000 process case several times, do 
> you get very similar numbers for VecMAXPY()? You should but I am guessing you 
> may not.
> 
> Barry
> 
>   Note that this performance issue doesn't really have anything to do with 
> the preconditioner you are using.
> 
> 
> 
> 
> 
> > On Nov 15, 2018, at 10:50 AM, Karin via petsc-users 
> >  wrote:
> > 
> > Dear PETSc team,
> > 
> > I am solving a linear transient dynamic problem, based on a discretization 
> > with finite elements. To do that, I am using FGMRES with GAMG as a 
> > preconditioner. I consider here 10 time steps. 
> > The problem has round to 118e6 dof and I am running on 1000, 1500 and 2000 
> > procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> > I notice that the performance deteriorates when I increase the number of 
> > processes. 
> > You can find as attached file the log_view of the execution and the 
> > detailled definition of the KSP.
> > 
> > Is the problem too small to run on that number of processes or is there 
> > something wrong with my use of GAMG?
> > 
> > I thank you in advance for your help,
> > Nicolas
> > 
> 



Re: [petsc-users] GAMG Parallel Performance

2018-11-15 Thread Mark Adams via petsc-users
There is a lot of load imbalance in VecMAXPY also. The partitioning could
be bad and if not its the machine.

On Thu, Nov 15, 2018 at 1:56 PM Smith, Barry F. via petsc-users <
petsc-users@mcs.anl.gov> wrote:

>
> Something is odd about your configuration. Just consider the time for
> VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI
> processes it produces
>
> Time
>
>   flop rate
>  VecMAXPY 575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021
>
> on 1500 processes it produces
>
>  VecMAXPY 583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187
>
> that is it actually takes longer (the time goes from .84 seconds to 1.08
> seconds and the flop rate from 1,600,021 down to 1,289,187) You would never
> expect this kind of behavior
>
> and on 2000 processes it produces
>
> VecMAXPY 583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563
>
> so it speeds up again but not by very much. This is very mysterious and
> not what you would expect.
>
>I'm inclined to believe something is out of whack on your computer, are
> you sure all nodes on the computer are equivalent? Same processors, same
> clock speeds? What happens if you run the 1000 process case several times,
> do you get very similar numbers for VecMAXPY()? You should but I am
> guessing you may not.
>
> Barry
>
>   Note that this performance issue doesn't really have anything to do with
> the preconditioner you are using.
>
>
>
>
>
> > On Nov 15, 2018, at 10:50 AM, Karin via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
> >
> > Dear PETSc team,
> >
> > I am solving a linear transient dynamic problem, based on a
> discretization with finite elements. To do that, I am using FGMRES with
> GAMG as a preconditioner. I consider here 10 time steps.
> > The problem has round to 118e6 dof and I am running on 1000, 1500 and
> 2000 procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> > I notice that the performance deteriorates when I increase the number of
> processes.
> > You can find as attached file the log_view of the execution and the
> detailled definition of the KSP.
> >
> > Is the problem too small to run on that number of processes or is there
> something wrong with my use of GAMG?
> >
> > I thank you in advance for your help,
> > Nicolas
> >
> 
>
>


Re: [petsc-users] GAMG Parallel Performance

2018-11-15 Thread Smith, Barry F. via petsc-users


Something is odd about your configuration. Just consider the time for 
VecMAXPY which is an embarrassingly parallel operation. On 1000 MPI processes 
it produces

Time

flop rate
 VecMAXPY 575 1.0 8.4132e-01 1.5 1.36e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   0  2  0  0  0 1,600,021

on 1500 processes it produces

 VecMAXPY 583 1.0 1.0786e+00 3.4 9.38e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   0  2  0  0  0 1,289,187

that is it actually takes longer (the time goes from .84 seconds to 1.08 
seconds and the flop rate from 1,600,021 down to 1,289,187) You would never 
expect this kind of behavior

and on 2000 processes it produces

VecMAXPY 583 1.0 7.1103e-01 2.7 7.03e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  0  2  0  0  0   0  2  0  0  0 1,955,563

so it speeds up again but not by very much. This is very mysterious and not 
what you would expect.

   I'm inclined to believe something is out of whack on your computer, are you 
sure all nodes on the computer are equivalent? Same processors, same clock 
speeds? What happens if you run the 1000 process case several times, do you get 
very similar numbers for VecMAXPY()? You should but I am guessing you may not.

Barry

  Note that this performance issue doesn't really have anything to do with the 
preconditioner you are using.





> On Nov 15, 2018, at 10:50 AM, Karin via petsc-users 
>  wrote:
> 
> Dear PETSc team,
> 
> I am solving a linear transient dynamic problem, based on a discretization 
> with finite elements. To do that, I am using FGMRES with GAMG as a 
> preconditioner. I consider here 10 time steps. 
> The problem has round to 118e6 dof and I am running on 1000, 1500 and 2000 
> procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> I notice that the performance deteriorates when I increase the number of 
> processes. 
> You can find as attached file the log_view of the execution and the detailled 
> definition of the KSP.
> 
> Is the problem too small to run on that number of processes or is there 
> something wrong with my use of GAMG?
> 
> I thank you in advance for your help,
> Nicolas
> 



Re: [petsc-users] GAMG Parallel Performance

2018-11-15 Thread Matthew Knepley via petsc-users
On Thu, Nov 15, 2018 at 11:52 AM Karin via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear PETSc team,
>
> I am solving a linear transient dynamic problem, based on a discretization
> with finite elements. To do that, I am using FGMRES with GAMG as a
> preconditioner. I consider here 10 time steps.
> The problem has round to 118e6 dof and I am running on 1000, 1500 and 2000
> procs. So I have something like 100e3, 78e3 and 50e3 dof/proc.
> I notice that the performance deteriorates when I increase the number of
> processes.
> You can find as attached file the log_view of the execution and the
> detailled definition of the KSP.
>
> Is the problem too small to run on that number of processes or is there
> something wrong with my use of GAMG?
>

I am having a hard time understanding the data. Just to be clear, I
understand you to be running the exact same problem on 1000, 1500, and 2000
processes, so looking for strong speedup. The PCSetUp time actually sped up
a little, which is great, and its still a small percentage (notice that
your whole solve is only half the runtime). Lets just look at a big time
component, MatMult,

P = 1000

MatMult 7342 1.0 4.4956e+01 1.4 4.09e+10 1.2 9.6e+07
4.3e+03 0.0e+00 23 53 81 86  0  23 53 81 86  0 859939


P = 2000

MatMult 7470 1.0 4.7611e+01 1.9 2.11e+10 1.2 2.0e+08
2.9e+03 0.0e+00 11 53 81 86  0  11 53 81 86  0 827107


So there was no speedup at all. It is doing 1/2 the flops per process, but
taking almost exactly the same time. This looks like your 2000 process run
is on exactly the same number of nodes as your 1000 process run, but you
just use more processes. Your 1000 process run was maxing out the bandwidth
of those nodes, and thus 2000 runs no faster. Is this true? Otherwise, I am
misunderstanding the run.

  Thanks,

Matt


> I thank you in advance for your help,
> Nicolas
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] GAMG advice

2017-11-10 Thread Mark Adams
On Thu, Nov 9, 2017 at 2:19 PM, David Nolte  wrote:

> Hi Mark,
>
> thanks for clarifying.
> When I wrote the initial question I had somehow overlooked the fact that
> the GAMG standard smoother was Chebychev while ML uses SOR. All the other
> comments concerning threshold etc were based on this mistake.
>
> The following settings work quite well, of course LU is used on the coarse
> level.
>
> -pc_type gamg
> -pc_gamg_type agg
> -pc_gamg_threshold 0.03
> -pc_gamg_square_graph 10# no effect ?
> -pc_gamg_sym_graph
> -mg_levels_ksp_type richardson
> -mg_levels_pc_type sor
>
> -pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.
>

Looks reasonable. And this smoothing is good for elliptic operators
convergence but it makes the operator more expensive. It's worth doing for
elliptic operators but in my experience not for others. If you convergence
rate does not change then you probably want -pc_gamg_agg_nsmooths 0. This
is a cheaper (if smoothing does not help convergence a lot), simpler method
and want to use it.


>
> The ksp view now looks like this: (does this seem reasonable?)
>
>
> KSP Object: 4 MPI processes
>   type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 4 MPI processes
>   type: gamg
> MG: type is MULTIPLICATIVE, levels=5 cycles=v
>   Cycles per PCApply=1
>   Using Galerkin computed coarse grid matrices
>   GAMG specific options
> Threshold for dropping small values from graph 0.03
> AGG specific options
>   Symmetric graph true
>   Coarse grid solver -- level ---
> KSP Object:(mg_coarse_) 4 MPI processes
>   type: preonly
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object:(mg_coarse_) 4 MPI processes
>   type: bjacobi
> block Jacobi: number of blocks = 4
> Local solve is same for all blocks, in the following KSP and PC
> objects:
>   KSP Object:  (mg_coarse_sub_)   1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object:  (mg_coarse_sub_)   1 MPI processes
> type: lu
>   LU: out-of-place factorization
>   tolerance for zero pivot 2.22045e-14
>   using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
>   matrix ordering: nd
>   factor fill ratio given 5., needed 1.
> Factored matrix follows:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=38, cols=38
> package used to perform factorization: petsc
> total: nonzeros=1444, allocated nonzeros=1444
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 8 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=38, cols=38
>   total: nonzeros=1444, allocated nonzeros=1444
>   total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 8 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object:   4 MPI processes
> type: mpiaij
> rows=38, cols=38
> total: nonzeros=1444, allocated nonzeros=1444
> total number of mallocs used during MatSetValues calls =0
>   using I-node (on process 0) routines: found 8 nodes, limit used
> is 5
>   Down solver (pre-smoother) on level 1 ---
> KSP Object:(mg_levels_1_) 4 MPI processes
>   type: richardson
> Richardson: damping factor=1.
>   maximum iterations=2
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using nonzero initial guess
>   using NONE norm type for convergence test
> PC Object:(mg_levels_1_) 4 MPI processes
>   type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations = 1,
> omega = 1.
>   linear system matrix = precond matrix:
>   Mat Object:   4 MPI processes
> type: mpiaij
> rows=168, cols=168
> total: nonzeros=19874, allocated 

Re: [petsc-users] GAMG advice

2017-11-09 Thread David Nolte
Hi Mark,

thanks for clarifying.
When I wrote the initial question I had somehow overlooked the fact that
the GAMG standard smoother was Chebychev while ML uses SOR. All the
other comments concerning threshold etc were based on this mistake.

The following settings work quite well, of course LU is used on the
coarse level.

    -pc_type gamg
    -pc_gamg_type agg
    -pc_gamg_threshold 0.03
    -pc_gamg_square_graph 10        # no effect ?
    -pc_gamg_sym_graph
    -mg_levels_ksp_type richardson
    -mg_levels_pc_type sor

-pc_gamg_agg_nsmooths 0 does not seem to improve the convergence.

The ksp view now looks like this: (does this seem reasonable?)


KSP Object: 4 MPI processes
  type: fgmres
    GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement
    GMRES: happy breakdown tolerance 1e-30
  maximum iterations=1
  tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
  right preconditioning
  using nonzero initial guess
  using UNPRECONDITIONED norm type for convergence test
PC Object: 4 MPI processes
  type: gamg
    MG: type is MULTIPLICATIVE, levels=5 cycles=v
  Cycles per PCApply=1
  Using Galerkin computed coarse grid matrices
  GAMG specific options
    Threshold for dropping small values from graph 0.03
    AGG specific options
  Symmetric graph true
  Coarse grid solver -- level ---
    KSP Object:    (mg_coarse_) 4 MPI processes
  type: preonly
  maximum iterations=1, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
    PC Object:    (mg_coarse_) 4 MPI processes
  type: bjacobi
    block Jacobi: number of blocks = 4
    Local solve is same for all blocks, in the following KSP and PC
objects:
  KSP Object:  (mg_coarse_sub_)   1 MPI processes
    type: preonly
    maximum iterations=1, initial guess is zero
    tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
    left preconditioning
    using NONE norm type for convergence test
  PC Object:  (mg_coarse_sub_)   1 MPI processes
    type: lu
  LU: out-of-place factorization
  tolerance for zero pivot 2.22045e-14
  using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
  matrix ordering: nd
  factor fill ratio given 5., needed 1.
    Factored matrix follows:
  Mat Object:   1 MPI processes
    type: seqaij
    rows=38, cols=38
    package used to perform factorization: petsc
    total: nonzeros=1444, allocated nonzeros=1444
    total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 8 nodes, limit used is 5
    linear system matrix = precond matrix:
    Mat Object: 1 MPI processes
  type: seqaij
  rows=38, cols=38
  total: nonzeros=1444, allocated nonzeros=1444
  total number of mallocs used during MatSetValues calls =0
    using I-node routines: found 8 nodes, limit used is 5
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=38, cols=38
    total: nonzeros=1444, allocated nonzeros=1444
    total number of mallocs used during MatSetValues calls =0
  using I-node (on process 0) routines: found 8 nodes, limit
used is 5
  Down solver (pre-smoother) on level 1 ---
    KSP Object:    (mg_levels_1_) 4 MPI processes
  type: richardson
    Richardson: damping factor=1.
  maximum iterations=2
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using NONE norm type for convergence test
    PC Object:    (mg_levels_1_) 4 MPI processes
  type: sor
    SOR: type = local_symmetric, iterations = 1, local iterations =
1, omega = 1.
  linear system matrix = precond matrix:
  Mat Object:   4 MPI processes
    type: mpiaij
    rows=168, cols=168
    total: nonzeros=19874, allocated nonzeros=19874
    total number of mallocs used during MatSetValues calls =0
  using I-node (on process 0) routines: found 17 nodes, limit
used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 2 ---
    KSP Object:    (mg_levels_2_) 4 MPI processes
  type: richardson
    Richardson: damping factor=1.
  maximum iterations=2
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using nonzero initial guess
  using NONE norm type for convergence test
    PC Object:    (mg_levels_2_) 4 MPI processes
  type: 

Re: [petsc-users] GAMG advice

2017-11-08 Thread Mark Adams
On Wed, Nov 1, 2017 at 5:45 PM, David Nolte  wrote:

> Thanks Barry.
> By simply replacing chebychev by richardson I get similar performance
> with GAMG and ML


That too (I assumed you were using the same, I could not see cheby in your
view data).

I guess SOR works for the coarse grid solver because the coarse grid is
small. It should help using lu.


> (GAMG even slightly faster):
>

This is "random" fluctuations.


>
> -pc_type
> gamg
>
>
>
> -pc_gamg_type
> agg
>
>
>
> -pc_gamg_threshold
> 0.03
>
>
>
> -pc_gamg_square_graph 10
> -pc_gamg_sym_graph
> -mg_levels_ksp_type
> richardson
>
>
>
> -mg_levels_pc_type sor
>
> Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
> is asymmetric?


yes,


> For serial runs it doesn't seem to matter,


yes,


> but in
> parallel the PC setup hangs (after calls of
> PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.
>

yep,


>
> David
>
>
> On 10/21/2017 12:10 AM, Barry Smith wrote:
> >   David,
> >
> >GAMG picks the number of levels based on how the coarsening process
> etc proceeds. You cannot hardwire it to a particular value. You can run
> with -info to get more info potentially on the decisions GAMG is making.
> >
> >   Barry
> >
> >> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
> >>
> >> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> >> was not taken into account:
> >> type: gamg
> >> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>
> >>
> >>
> >> On 10/20/2017 03:32 PM, David Nolte wrote:
> >>> Dear all,
> >>>
> >>> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >>> Background: I am solving the incompressible, unsteady Navier-Stokes
> >>> equations with a coupled mixed FEM approach, using P1/P1 elements for
> >>> velocity and pressure on an unstructured tetrahedron mesh with about
> >>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> >>> hence, no zeros on the diagonal of the pressure block. Time
> >>> discretization with semi-implicit backward Euler. The flow is a
> >>> convection dominated flow through a nozzle.
> >>>
> >>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> >>> solver for the full system (rather bruteforce, I admit, but much faster
> >>> than any block/Schur preconditioners I tried):
> >>>
> >>> -ksp_converged_reason
> >>> -ksp_monitor_true_residual
> >>> -ksp_type fgmres
> >>> -ksp_rtol 1.0e-6
> >>> -ksp_initial_guess_nonzero
> >>>
> >>> -pc_type ml
> >>> -pc_ml_Threshold 0.03
> >>> -pc_ml_maxNlevels 3
> >>>
> >>> This setup converges in ~100 iterations (see below the ksp_view output)
> >>> to rtol:
> >>>
> >>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> >>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> >>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> >>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> >>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>>
> >>>
> >>> Now I'd like to try GAMG instead of ML. However, I don't know how to
> set
> >>> it up to get similar performance.
> >>> The obvious/naive
> >>>
> >>> -pc_type gamg
> >>> -pc_gamg_type agg
> >>>
> >>> # with and without
> >>> -pc_gamg_threshold 0.03
> >>> -pc_mg_levels 3
> >>>
> >>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> >>> proc), for instance:
> >>> np = 1:
> >>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> >>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> >>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> >>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>>
> >>> np = 8:
> >>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> >>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>>
> >>> A very high threshold seems to improve the GAMG PC, for instance with
> >>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >>> What else should I try?
> >>>
> >>> I would very much appreciate any advice on configuring GAMG and
> >>> differences w.r.t ML to be taken into account (not a multigrid expert
> >>> though).
> >>>
> >>> Thanks, best wishes
> >>> David
> >>>
> >>>
> >>> --
> >>> ksp_view for -pc_type gamg  

Re: [petsc-users] GAMG advice

2017-11-08 Thread Mark Adams
On Fri, Oct 20, 2017 at 11:10 PM, Barry Smith  wrote:

>
>   David,
>
>GAMG picks the number of levels based on how the coarsening process etc
> proceeds. You cannot hardwire it to a particular value.


Yes you can. GAMG will respect -pc_mg_levels N, but we don't recommend
using it.


> You can run with -info to get more info potentially on the decisions GAMG
> is making.
>

this is noisy but grep on GAMG and you will see the levels and sizes, etc.


>
>   Barry
>
> > On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
> >
> > PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> > was not taken into account:
> > type: gamg
> > MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >
> >
> >
> > On 10/20/2017 03:32 PM, David Nolte wrote:
> >> Dear all,
> >>
> >> I have some problems using GAMG as a preconditioner for (F)GMRES.
> >> Background: I am solving the incompressible, unsteady Navier-Stokes
> >> equations with a coupled mixed FEM approach, using P1/P1 elements for
> >> velocity and pressure on an unstructured tetrahedron mesh with about
> >> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> >> hence, no zeros on the diagonal of the pressure block. Time
> >> discretization with semi-implicit backward Euler. The flow is a
> >> convection dominated flow through a nozzle.
> >>
> >> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> >> solver for the full system (rather bruteforce, I admit, but much faster
> >> than any block/Schur preconditioners I tried):
> >>
> >> -ksp_converged_reason
> >> -ksp_monitor_true_residual
> >> -ksp_type fgmres
> >> -ksp_rtol 1.0e-6
> >> -ksp_initial_guess_nonzero
> >>
> >> -pc_type ml
> >> -pc_ml_Threshold 0.03
> >> -pc_ml_maxNlevels 3
> >>
> >> This setup converges in ~100 iterations (see below the ksp_view output)
> >> to rtol:
> >>
> >> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> >> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> >> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> >> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> >> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> >> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> >> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> >> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
> >>
> >>
> >> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> >> it up to get similar performance.
> >> The obvious/naive
> >>
> >> -pc_type gamg
> >> -pc_gamg_type agg
> >>
> >> # with and without
> >> -pc_gamg_threshold 0.03
> >> -pc_mg_levels 3
> >>
> >> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> >> proc), for instance:
> >> np = 1:
> >> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> >> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> >> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> >> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> >> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> >> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
> >>
> >> np = 8:
> >> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> >> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> >> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> >> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> >>
> >> A very high threshold seems to improve the GAMG PC, for instance with
> >> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> >> What else should I try?
> >>
> >> I would very much appreciate any advice on configuring GAMG and
> >> differences w.r.t ML to be taken into account (not a multigrid expert
> >> though).
> >>
> >> Thanks, best wishes
> >> David
> >>
> >>
> >> --
> >> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 -pc_mg_levels 3
> >>
> >> KSP Object: 1 MPI processes
> >>   type: fgmres
> >> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> >> Orthogonalization with no iterative refinement
> >> GMRES: happy breakdown tolerance 1e-30
> >>   maximum iterations=1
> >>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
> >>   right preconditioning
> >>   using nonzero initial guess
> >>   using UNPRECONDITIONED norm type for convergence test
> >> PC Object: 1 MPI processes
> >>   type: gamg
> >> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> >>   Cycles per PCApply=1
> >>   Using Galerkin computed coarse grid matrices
> >>   GAMG specific options
> >> Threshold for dropping small values from graph 0.75
> >> AGG specific options
> >>   Symmetric graph false
> 

Re: [petsc-users] GAMG advice

2017-11-08 Thread Mark Adams
>
>
> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> it up to get similar performance.
> The obvious/naive
>
> -pc_type gamg
> -pc_gamg_type agg
>
> # with and without
> -pc_gamg_threshold 0.03
> -pc_mg_levels 3
>
>
This looks fine. I would not set the number of levels but if it helps then
go for it.


> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> proc), for instance:
> np = 1:
> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>
> np = 8:
> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>
> A very high threshold seems to improve the GAMG PC, for instance with
> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> What else should I try?
>

Not sure. ML use the same algorithm as GAMG (so the threshold means the
same thing pretty much). ML is a good solver and the leader, Ray Tuminaro,
has had a lot of NS experience. But I'm not sure what the differences are
that are resulting in this performance.

* It looks like you are using sor for the coarse grid solver in gamg:

  Coarse grid solver -- level ---
KSP Object:(mg_levels_0_) 1 MPI processes
  type: preonly
  maximum iterations=2, initial guess is zero
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object:(mg_levels_0_) 1 MPI processes
  type: sor
SOR: type = local_symmetric, iterations = 1, local iterations =

You should/must use lu, like in ML. This will kill you.

* smoothed aggregation vs unsmoothed: GAMG's view data does not say if it
is smoothing. Damn, I need to fix that. For NS, you probably want
unsmoothed (-pc_gamg_agg_nsmooths 0). I'm not sure what the ML parameter is
for this nor do I know the default. It should make a noticable difference
(good or bad).

* Threshold for dropping small values from graph 0.75 -- this is crazy :)

This is all that I can think of now.

Mark


>
> I would very much appreciate any advice on configuring GAMG and
> differences w.r.t ML to be taken into account (not a multigrid expert
> though).
>
> Thanks, best wishes
> David
>
>
> --
> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 -pc_mg_levels 3
>
> KSP Object: 1 MPI processes
>   type: fgmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: gamg
> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>   Cycles per PCApply=1
>   Using Galerkin computed coarse grid matrices
>   GAMG specific options
> Threshold for dropping small values from graph 0.75
> AGG specific options
>   Symmetric graph false
>   Coarse grid solver -- level ---
> KSP Object:(mg_levels_0_) 1 MPI processes
>   type: preonly
>   maximum iterations=2, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object:(mg_levels_0_) 1 MPI processes
>   type: sor
> SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 1037847 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=1745224, cols=1745224
> total: nonzeros=99452608, allocated nonzeros=99452608
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 1037847 nodes, limit used is 5
>
>
> --
> ksp_view for -pc_type ml:
>
> KSP Object: 8 MPI processes
>   type: fgmres
> GMRES: 

Re: [petsc-users] GAMG advice

2017-11-01 Thread David Nolte
Thanks Barry.
By simply replacing chebychev by richardson I get similar performance
with GAMG and ML (GAMG even slightly faster):

-pc_type
gamg
   

-pc_gamg_type
agg 
  

-pc_gamg_threshold
0.03
 

-pc_gamg_square_graph 10
-pc_gamg_sym_graph
-mg_levels_ksp_type
richardson  


-mg_levels_pc_type sor

Is it still true that I need to set "-pc_gamg_sym_graph" if the matrix
is asymmetric? For serial runs it doesn't seem to matter, but in
parallel the PC setup hangs (after calls of
PCGAMGFilterGraph()) if -pc_gamg_sym_graph is not set.

David


On 10/21/2017 12:10 AM, Barry Smith wrote:
>   David,
>
>GAMG picks the number of levels based on how the coarsening process etc 
> proceeds. You cannot hardwire it to a particular value. You can run with 
> -info to get more info potentially on the decisions GAMG is making.
>
>   Barry
>
>> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
>>
>> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
>> was not taken into account:  
>> type: gamg
>> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>
>>
>>
>> On 10/20/2017 03:32 PM, David Nolte wrote:
>>> Dear all,
>>>
>>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>>> Background: I am solving the incompressible, unsteady Navier-Stokes
>>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>>> velocity and pressure on an unstructured tetrahedron mesh with about
>>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>>> hence, no zeros on the diagonal of the pressure block. Time
>>> discretization with semi-implicit backward Euler. The flow is a
>>> convection dominated flow through a nozzle.
>>>
>>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
>>> solver for the full system (rather bruteforce, I admit, but much faster
>>> than any block/Schur preconditioners I tried):
>>>
>>> -ksp_converged_reason
>>> -ksp_monitor_true_residual
>>> -ksp_type fgmres
>>> -ksp_rtol 1.0e-6
>>> -ksp_initial_guess_nonzero
>>>
>>> -pc_type ml
>>> -pc_ml_Threshold 0.03
>>> -pc_ml_maxNlevels 3
>>>
>>> This setup converges in ~100 iterations (see below the ksp_view output)
>>> to rtol:
>>>
>>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>>>
>>>
>>> Now I'd like to try GAMG instead of ML. However, I don't know how to set
>>> it up to get similar performance.
>>> The obvious/naive
>>>
>>> -pc_type gamg
>>> -pc_gamg_type agg
>>>
>>> # with and without
>>> -pc_gamg_threshold 0.03
>>> -pc_mg_levels 3
>>>
>>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>>> proc), for instance:
>>> np = 1:
>>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>>>
>>> np = 8:
>>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>>>
>>> A very high threshold seems to improve the GAMG PC, for instance with
>>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>>> What else should I try?
>>>
>>> I would very much appreciate any advice on configuring GAMG and
>>> differences w.r.t ML to be taken into account (not a multigrid expert
>>> though).
>>>
>>> Thanks, best wishes
>>> David
>>>
>>>
>>> --
>>> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 

Re: [petsc-users] GAMG advice

2017-10-20 Thread Barry Smith

  David,

   GAMG picks the number of levels based on how the coarsening process etc 
proceeds. You cannot hardwire it to a particular value. You can run with -info 
to get more info potentially on the decisions GAMG is making.

  Barry

> On Oct 20, 2017, at 2:06 PM, David Nolte  wrote:
> 
> PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
> was not taken into account:  
> type: gamg
> MG: type is MULTIPLICATIVE, levels=1 cycles=v
> 
> 
> 
> On 10/20/2017 03:32 PM, David Nolte wrote:
>> Dear all,
>> 
>> I have some problems using GAMG as a preconditioner for (F)GMRES.
>> Background: I am solving the incompressible, unsteady Navier-Stokes
>> equations with a coupled mixed FEM approach, using P1/P1 elements for
>> velocity and pressure on an unstructured tetrahedron mesh with about
>> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
>> hence, no zeros on the diagonal of the pressure block. Time
>> discretization with semi-implicit backward Euler. The flow is a
>> convection dominated flow through a nozzle.
>> 
>> So far, for this setup, I have been quite happy with a simple FGMRES/ML
>> solver for the full system (rather bruteforce, I admit, but much faster
>> than any block/Schur preconditioners I tried):
>> 
>> -ksp_converged_reason
>> -ksp_monitor_true_residual
>> -ksp_type fgmres
>> -ksp_rtol 1.0e-6
>> -ksp_initial_guess_nonzero
>> 
>> -pc_type ml
>> -pc_ml_Threshold 0.03
>> -pc_ml_maxNlevels 3
>> 
>> This setup converges in ~100 iterations (see below the ksp_view output)
>> to rtol:
>> 
>> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
>> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
>> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
>> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
>> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
>> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
>> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
>> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>> 
>> 
>> Now I'd like to try GAMG instead of ML. However, I don't know how to set
>> it up to get similar performance.
>> The obvious/naive
>> 
>> -pc_type gamg
>> -pc_gamg_type agg
>> 
>> # with and without
>> -pc_gamg_threshold 0.03
>> -pc_mg_levels 3
>> 
>> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
>> proc), for instance:
>> np = 1:
>> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
>> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
>> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
>> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
>> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
>> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>> 
>> np = 8:
>> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
>> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
>> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
>> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>> 
>> A very high threshold seems to improve the GAMG PC, for instance with
>> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
>> What else should I try?
>> 
>> I would very much appreciate any advice on configuring GAMG and
>> differences w.r.t ML to be taken into account (not a multigrid expert
>> though).
>> 
>> Thanks, best wishes
>> David
>> 
>> 
>> --
>> ksp_view for -pc_type gamg  -pc_gamg_threshold 0.75 -pc_mg_levels 3
>> 
>> KSP Object: 1 MPI processes
>>   type: fgmres
>> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
>> Orthogonalization with no iterative refinement
>> GMRES: happy breakdown tolerance 1e-30
>>   maximum iterations=1
>>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>>   right preconditioning
>>   using nonzero initial guess
>>   using UNPRECONDITIONED norm type for convergence test
>> PC Object: 1 MPI processes
>>   type: gamg
>> MG: type is MULTIPLICATIVE, levels=1 cycles=v
>>   Cycles per PCApply=1
>>   Using Galerkin computed coarse grid matrices
>>   GAMG specific options
>> Threshold for dropping small values from graph 0.75
>> AGG specific options
>>   Symmetric graph false
>>   Coarse grid solver -- level ---
>> KSP Object:(mg_levels_0_) 1 MPI processes
>>   type: preonly
>>   maximum iterations=2, initial guess is zero
>>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>>   left preconditioning
>>   using NONE norm type for convergence test
>> PC Object:(mg_levels_0_) 1 MPI processes
>>   type: sor
>> SOR: type = 

Re: [petsc-users] GAMG advice

2017-10-20 Thread David Nolte
PS: I didn't realize at first, it looks as if the -pc_mg_levels 3 option
was not taken into account:  
type: gamg
    MG: type is MULTIPLICATIVE, levels=1 cycles=v



On 10/20/2017 03:32 PM, David Nolte wrote:
> Dear all,
>
> I have some problems using GAMG as a preconditioner for (F)GMRES.
> Background: I am solving the incompressible, unsteady Navier-Stokes
> equations with a coupled mixed FEM approach, using P1/P1 elements for
> velocity and pressure on an unstructured tetrahedron mesh with about
> 2mio DOFs (and up to 15mio). The method is stabilized with SUPG/PSPG,
> hence, no zeros on the diagonal of the pressure block. Time
> discretization with semi-implicit backward Euler. The flow is a
> convection dominated flow through a nozzle.
>
> So far, for this setup, I have been quite happy with a simple FGMRES/ML
> solver for the full system (rather bruteforce, I admit, but much faster
> than any block/Schur preconditioners I tried):
>
>     -ksp_converged_reason
>     -ksp_monitor_true_residual
>     -ksp_type fgmres
>     -ksp_rtol 1.0e-6
>     -ksp_initial_guess_nonzero
>
>     -pc_type ml
>     -pc_ml_Threshold 0.03
>     -pc_ml_maxNlevels 3
>
> This setup converges in ~100 iterations (see below the ksp_view output)
> to rtol:
>
> 119 KSP unpreconditioned resid norm 4.004030812027e-05 true resid norm
> 4.004030812037e-05 ||r(i)||/||b|| 1.621791251517e-06
> 120 KSP unpreconditioned resid norm 3.256863709982e-05 true resid norm
> 3.256863709982e-05 ||r(i)||/||b|| 1.319158947617e-06
> 121 KSP unpreconditioned resid norm 2.751959681502e-05 true resid norm
> 2.751959681503e-05 ||r(i)||/||b|| 1.114652795021e-06
> 122 KSP unpreconditioned resid norm 2.420611122789e-05 true resid norm
> 2.420611122788e-05 ||r(i)||/||b|| 9.804434897105e-07
>
>
> Now I'd like to try GAMG instead of ML. However, I don't know how to set
> it up to get similar performance.
> The obvious/naive
>
>     -pc_type gamg
>     -pc_gamg_type agg
>
> # with and without
>     -pc_gamg_threshold 0.03
>     -pc_mg_levels 3
>
> converges very slowly on 1 proc and much worse on 8 (~200k dofs per
> proc), for instance:
> np = 1:
> 980 KSP unpreconditioned resid norm 1.065009356215e-02 true resid norm
> 1.065009356215e-02 ||r(i)||/||b|| 4.532259705508e-04
> 981 KSP unpreconditioned resid norm 1.064978578182e-02 true resid norm
> 1.064978578182e-02 ||r(i)||/||b|| 4.532128726342e-04
> 982 KSP unpreconditioned resid norm 1.064956706598e-02 true resid norm
> 1.064956706598e-02 ||r(i)||/||b|| 4.532035649508e-04
>
> np = 8:
> 980 KSP unpreconditioned resid norm 3.179946748495e-02 true resid norm
> 3.179946748495e-02 ||r(i)||/||b|| 1.353259896710e-03
> 981 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
> 982 KSP unpreconditioned resid norm 3.179946748317e-02 true resid norm
> 3.179946748317e-02 ||r(i)||/||b|| 1.353259896634e-03
>
> A very high threshold seems to improve the GAMG PC, for instance with
> 0.75 I get convergence to rtol=1e-6 after 744 iterations.
> What else should I try?
>
> I would very much appreciate any advice on configuring GAMG and
> differences w.r.t ML to be taken into account (not a multigrid expert
> though).
>
> Thanks, best wishes
> David
>
>
> --
> ksp_view for -pc_type gamg      -pc_gamg_threshold 0.75 -pc_mg_levels 3
>
> KSP Object: 1 MPI processes
>   type: fgmres
>     GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>     GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1
>   tolerances:  relative=1e-06, absolute=1e-50, divergence=1.
>   right preconditioning
>   using nonzero initial guess
>   using UNPRECONDITIONED norm type for convergence test
> PC Object: 1 MPI processes
>   type: gamg
>     MG: type is MULTIPLICATIVE, levels=1 cycles=v
>   Cycles per PCApply=1
>   Using Galerkin computed coarse grid matrices
>   GAMG specific options
>     Threshold for dropping small values from graph 0.75
>     AGG specific options
>   Symmetric graph false
>   Coarse grid solver -- level ---
>     KSP Object:    (mg_levels_0_) 1 MPI processes
>   type: preonly
>   maximum iterations=2, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
>     PC Object:    (mg_levels_0_) 1 MPI processes
>   type: sor
>     SOR: type = local_symmetric, iterations = 1, local iterations =
> 1, omega = 1.
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
>     type: seqaij
>     rows=1745224, cols=1745224
>     total: nonzeros=99452608, allocated nonzeros=99452608
>     total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 1037847 nodes, limit used is 5
>   linear system matrix = precond matrix:

Re: [petsc-users] GAMG scaling

2017-05-04 Thread Hong
Mark,
Fixed
https://bitbucket.org/petsc/petsc/commits/68eacb73b84ae7f3fd7363217d47f23a8f967155

Run ex56 gives
mpiexec -n 8 ./ex56 -ne 13 ... -h |grep via
  -mattransposematmult_via  Algorithmic approach (choose one of)
scalable nonscalable matmatmult (MatTransposeMatMult)
  -matmatmult_via  Algorithmic approach (choose one of)
scalable nonscalable hypre (MatMatMult)
  -matptap_via  Algorithmic approach (choose one of) scalable
nonscalable hypre (MatPtAP)
...

I'll merge it to master after regression tests.

Hong

On Thu, May 4, 2017 at 10:33 AM, Hong  wrote:

> Mark:
>>
>> I am not seeing these options with -help ...
>>
> Hmm, this might be a bug - I'll check it.
> Hong
>
>
>>
>> On Wed, May 3, 2017 at 10:05 PM, Hong  wrote:
>>
>>> I basically used 'runex56' and set '-ne' be compatible with np.
>>> Then I used option
>>> '-matptap_via scalable'
>>> '-matptap_via hypre'
>>> '-matptap_via nonscalable'
>>>
>>> I attached a job script below.
>>>
>>> In master branch, I set default as 'nonscalable' for small - medium size
>>> matrices, and automatically switch to 'scalable' when matrix size gets
>>> larger.
>>>
>>> Petsc solver uses MatPtAP,  which does local RAP to reduce communication
>>> and accelerate computation.
>>> I suggest you simply use default setting. Let me know if you encounter
>>> trouble.
>>>
>>> Hong
>>>
>>> job.ne174.n8.np125.sh:
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable >
>>> log.ne174.n8.np125.scalable
>>>
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre >
>>> log.ne174.n8.np125.hypre
>>>
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable >
>>> log.ne174.n8.np125.nonscalable
>>>
>>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56
>>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
>>> -pc_gamg_reuse_interpolation true -ksp_converged_reason
>>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
>>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
>>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
>>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
>>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
>>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
>>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
>>> -pc_gamg_repartition false -pc_mg_cycle_type v
>>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
>>> -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125
>>>
>>> On Wed, May 3, 2017 at 2:08 PM, Mark Adams  wrote:
>>>

Re: [petsc-users] GAMG scaling

2017-05-04 Thread Mark Adams
Thanks Hong,

I am not seeing these options with -help ...

On Wed, May 3, 2017 at 10:05 PM, Hong  wrote:

> I basically used 'runex56' and set '-ne' be compatible with np.
> Then I used option
> '-matptap_via scalable'
> '-matptap_via hypre'
> '-matptap_via nonscalable'
>
> I attached a job script below.
>
> In master branch, I set default as 'nonscalable' for small - medium size
> matrices, and automatically switch to 'scalable' when matrix size gets
> larger.
>
> Petsc solver uses MatPtAP,  which does local RAP to reduce communication
> and accelerate computation.
> I suggest you simply use default setting. Let me know if you encounter
> trouble.
>
> Hong
>
> job.ne174.n8.np125.sh:
> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
> 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
> -pc_gamg_reuse_interpolation true -ksp_converged_reason
> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
> -pc_gamg_repartition false -pc_mg_cycle_type v 
> -pc_gamg_use_parallel_coarse_grid_solver
> -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view
> -matptap_via scalable > log.ne174.n8.np125.scalable
>
> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
> 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
> -pc_gamg_reuse_interpolation true -ksp_converged_reason
> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
> -pc_gamg_repartition false -pc_mg_cycle_type v 
> -pc_gamg_use_parallel_coarse_grid_solver
> -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view
> -matptap_via hypre > log.ne174.n8.np125.hypre
>
> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
> 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
> -pc_gamg_reuse_interpolation true -ksp_converged_reason
> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
> -pc_gamg_repartition false -pc_mg_cycle_type v 
> -pc_gamg_use_parallel_coarse_grid_solver
> -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view
> -matptap_via nonscalable > log.ne174.n8.np125.nonscalable
>
> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
> 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
> -pc_gamg_reuse_interpolation true -ksp_converged_reason
> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
> -pc_gamg_repartition false -pc_mg_cycle_type v 
> -pc_gamg_use_parallel_coarse_grid_solver
> -mg_coarse_pc_type jacobi -mg_coarse_ksp_type cg -ksp_monitor -log_view >
> log.ne174.n8.np125
>
> On Wed, May 3, 2017 at 2:08 PM, Mark Adams  wrote:
>
>> Hong,the input files do not seem to be accessible. What are the command
>> line option? (I don't see a "rap" or "scale" in the source).
>>
>>
>>
>> On Wed, May 3, 2017 at 12:17 PM, Hong  wrote:
>>
>>> Mark,
>>> Below is the copy of my email sent to you on Feb 27:
>>>
>>> I implemented scalable MatPtAP and did comparisons of three
>>> implementations using ex56.c on alcf cetus machine (this machine has
>>> small memory, 1GB/core):
>>> - nonscalable PtAP: use an array of length PN to do dense axpy
>>> - scalable PtAP:   do sparse axpy without use of PN array
>>> - hypre PtAP.
>>>
>>> The results are attached. Summary:
>>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>>> - scalable PtAP is 4x faster than hypre PtAP
>>> - hypre uses less memory 

Re: [petsc-users] GAMG scaling

2017-05-03 Thread Hong
I basically used 'runex56' and set '-ne' be compatible with np.
Then I used option
'-matptap_via scalable'
'-matptap_via hypre'
'-matptap_via nonscalable'

I attached a job script below.

In master branch, I set default as 'nonscalable' for small - medium size
matrices, and automatically switch to 'scalable' when matrix size gets
larger.

Petsc solver uses MatPtAP,  which does local RAP to reduce communication
and accelerate computation.
I suggest you simply use default setting. Let me know if you encounter
trouble.

Hong

job.ne174.n8.np125.sh:
runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable >
log.ne174.n8.np125.scalable

runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre >
log.ne174.n8.np125.hypre

runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable >
log.ne174.n8.np125.nonscalable

runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 -ne
174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1
-pc_gamg_reuse_interpolation true -ksp_converged_reason
-use_mat_nearnullspace -mg_levels_esteig_ksp_type cg
-mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1
-mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev
-mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg
-gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu
-mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01
-pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30
-pc_gamg_repartition false -pc_mg_cycle_type v
-pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi
-mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125

On Wed, May 3, 2017 at 2:08 PM, Mark Adams  wrote:

> Hong,the input files do not seem to be accessible. What are the command
> line option? (I don't see a "rap" or "scale" in the source).
>
>
>
> On Wed, May 3, 2017 at 12:17 PM, Hong  wrote:
>
>> Mark,
>> Below is the copy of my email sent to you on Feb 27:
>>
>> I implemented scalable MatPtAP and did comparisons of three
>> implementations using ex56.c on alcf cetus machine (this machine has
>> small memory, 1GB/core):
>> - nonscalable PtAP: use an array of length PN to do dense axpy
>> - scalable PtAP:   do sparse axpy without use of PN array
>> - hypre PtAP.
>>
>> The results are attached. Summary:
>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
>> - scalable PtAP is 4x faster than hypre PtAP
>> - hypre uses less memory (see job.ne399.n63.np1000.sh)
>>
>> Based on above observation, I set the default PtAP algorithm as
>> 'nonscalable'.
>> When PN > local estimated nonzero of C=PtAP, then switch default to
>> 'scalable'.
>> User can overwrite default.
>>
>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I 

Re: [petsc-users] GAMG scaling

2017-05-03 Thread Mark Adams
Hong,the input files do not seem to be accessible. What are the command
line option? (I don't see a "rap" or "scale" in the source).



On Wed, May 3, 2017 at 12:17 PM, Hong  wrote:

> Mark,
> Below is the copy of my email sent to you on Feb 27:
>
> I implemented scalable MatPtAP and did comparisons of three
> implementations using ex56.c on alcf cetus machine (this machine has
> small memory, 1GB/core):
> - nonscalable PtAP: use an array of length PN to do dense axpy
> - scalable PtAP:   do sparse axpy without use of PN array
> - hypre PtAP.
>
> The results are attached. Summary:
> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
> - scalable PtAP is 4x faster than hypre PtAP
> - hypre uses less memory (see job.ne399.n63.np1000.sh)
>
> Based on above observation, I set the default PtAP algorithm as
> 'nonscalable'.
> When PN > local estimated nonzero of C=PtAP, then switch default to
> 'scalable'.
> User can overwrite default.
>
> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
> MatPtAP   3.6224e+01 (nonscalable for small mats, scalable
> for larger ones)
> scalable MatPtAP 4.6129e+01
> hypre1.9389e+02
>
> This work in on petsc-master. Give it a try. If you encounter any problem,
> let me know.
>
> Hong
>
> On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:
>
>> (Hong), what is the current state of optimizing RAP for scaling?
>>
>> Nate, is driving 3D elasticity problems at scaling with GAMG and we are
>> working out performance problems. They are hitting problems at ~1.5B dof
>> problems on a basic Cray (XC30 I think).
>>
>> Thanks,
>> Mark
>>
>
>


Re: [petsc-users] GAMG scaling

2017-05-03 Thread Hong
Mark,
Below is the copy of my email sent to you on Feb 27:

I implemented scalable MatPtAP and did comparisons of three implementations
using ex56.c on alcf cetus machine (this machine has small memory,
1GB/core):
- nonscalable PtAP: use an array of length PN to do dense axpy
- scalable PtAP:   do sparse axpy without use of PN array
- hypre PtAP.

The results are attached. Summary:
- nonscalable PtAP is 2x faster than scalable, 8x faster than hypre PtAP
- scalable PtAP is 4x faster than hypre PtAP
- hypre uses less memory (see job.ne399.n63.np1000.sh)

Based on above observation, I set the default PtAP algorithm as
'nonscalable'.
When PN > local estimated nonzero of C=PtAP, then switch default to
'scalable'.
User can overwrite default.

For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get
MatPtAP   3.6224e+01 (nonscalable for small mats, scalable
for larger ones)
scalable MatPtAP 4.6129e+01
hypre1.9389e+02

This work in on petsc-master. Give it a try. If you encounter any problem,
let me know.

Hong

On Wed, May 3, 2017 at 10:01 AM, Mark Adams  wrote:

> (Hong), what is the current state of optimizing RAP for scaling?
>
> Nate, is driving 3D elasticity problems at scaling with GAMG and we are
> working out performance problems. They are hitting problems at ~1.5B dof
> problems on a basic Cray (XC30 I think).
>
> Thanks,
> Mark
>


out_ex56_cetus_short
Description: Binary data


Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-19 Thread Kong, Fande
Thanks, Mark,

Now, the total compute time using GAMG is competitive with ASM.  Looks like
I could not use something like: "-mg_level_1_ksp_type gmres" because this
option makes the compute time much worse.

Fande,

On Thu, Apr 13, 2017 at 9:14 AM, Mark Adams  wrote:

>
>
> On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande  wrote:
>
>>
>>
>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:
>>
>>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>>> the coarse grid. I don't understand that.
>>>
>>> You are also calling the AMG setup a lot, but not spending much time
>>> in it. Try running with -info and grep on "GAMG".
>>>
>>
>> I got the following output:
>>
>> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
>> nnz/row (ave)=71, np=384
>> [0] PCGAMGFilterGraph():  100.% nnz after filtering, with threshold
>> 0., 73.6364 nnz ave. (N=3020875)
>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>> [0] PCGAMGProlongator_AGG(): New grid 18162 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00
>> min=2.559747e-02 PC=jacobi
>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
>> neq(loc)=40
>> [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384
>> active pes
>> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795
>> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
>> nnz/row (ave)=71, np=384
>> [0] PCGAMGFilterGraph():  100.% nnz after filtering, with threshold
>> 0., 73.6364 nnz ave. (N=3020875)
>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>> [0] PCGAMGProlongator_AGG(): New grid 18145 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00
>> min=2.557887e-02 PC=jacobi
>> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
>> neq(loc)=37
>> [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384
>> active pes
>>
>
> You are still doing two levels. Just use the parameters that I told you
> and you should see that 1) this coarsest (last) grid has "1 active pes" and
> 2) the overall solve time and overall convergence rate is much better.
>
>
>> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792
>> GAMG specific options
>> PCGAMGGraph_AGG   40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04
>> 7.6e+02  2  0  2  4  2   2  0  2  4  2  1170
>> PCGAMGCoarse_AGG  40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04
>> 1.2e+03 18 37  5 27  3  18 37  5 27  3 14632
>> PCGAMGProl_AGG40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03
>> 9.6e+02  0  0  1  0  2   0  0  1  0  2 0
>> PCGAMGPOpt_AGG40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03
>> 1.9e+03  1  4  4  1  4   1  4  4  1  4 51328
>> GAMG: createProl  40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04
>> 4.8e+03 21 42 12 32 10  21 42 12 32 10 14134
>> GAMG: partLevel   40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03
>> 1.5e+03  2  2  4  1  3   2  2  4  1  3  9431
>>
>>
>>
>>
>>
>>
>>
>>
>>>
>>>
>>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
>>> > Thanks, Barry.
>>> >
>>> > It works.
>>> >
>>> > GAMG is three times better than ASM in terms of the number of linear
>>> > iterations, but it is five times slower than ASM. Any suggestions to
>>> improve
>>> > the performance of GAMG? Log files are attached.
>>> >
>>> > Fande,
>>> >
>>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith 
>>> wrote:
>>> >>
>>> >>
>>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
>>> >> >
>>> >> > Thanks, Mark and Barry,
>>> >> >
>>> >> > It works pretty wells in terms of the number of linear iterations
>>> (using
>>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>>> I am
>>> >> > using the two-level method via "-pc_mg_levels 2". The reason why
>>> the compute
>>> >> > time is larger than other preconditioning options is that a matrix
>>> free
>>> >> > method is used in the fine level and in my particular problem the
>>> function
>>> >> > evaluation is expensive.
>>> >> >
>>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
>>> Newton,
>>> >> > but I do not think I want to make the preconditioning part
>>> matrix-free.  Do
>>> >> > you guys know how to turn off the matrix-free method for GAMG?
>>> >>
>>> >>-pc_use_amat false
>>> >>
>>> >> >
>>> >> > Here is the detailed solver:
>>> >> >
>>> >> > SNES Object: 384 MPI processes
>>> >> >   type: newtonls
>>> >> >   maximum iterations=200, maximum function evaluations=1
>>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>>> >> >   total number of linear solver iterations=20
>>> >> >   total number of function evaluations=166
>>> >> >   norm schedule ALWAYS
>>> >> >   SNESLineSearch Object:   384 MPI processes
>>> >> > type: bt
>>> >> >   interpolation: cubic
>>> >> >   

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-13 Thread Mark Adams
On Wed, Apr 12, 2017 at 7:04 PM, Kong, Fande  wrote:

>
>
> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:
>
>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>> the coarse grid. I don't understand that.
>>
>> You are also calling the AMG setup a lot, but not spending much time
>> in it. Try running with -info and grep on "GAMG".
>>
>
> I got the following output:
>
> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
> nnz/row (ave)=71, np=384
> [0] PCGAMGFilterGraph():  100.% nnz after filtering, with threshold
> 0., 73.6364 nnz ave. (N=3020875)
> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> [0] PCGAMGProlongator_AGG(): New grid 18162 nodes
> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00
> min=2.559747e-02 PC=jacobi
> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
> neq(loc)=40
> [0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384
> active pes
> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795
> [0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
> nnz/row (ave)=71, np=384
> [0] PCGAMGFilterGraph():  100.% nnz after filtering, with threshold
> 0., 73.6364 nnz ave. (N=3020875)
> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
> [0] PCGAMGProlongator_AGG(): New grid 18145 nodes
> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00
> min=2.557887e-02 PC=jacobi
> [0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
> neq(loc)=37
> [0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384
> active pes
>

You are still doing two levels. Just use the parameters that I told you and
you should see that 1) this coarsest (last) grid has "1 active pes" and 2)
the overall solve time and overall convergence rate is much better.


> [0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792
> GAMG specific options
> PCGAMGGraph_AGG   40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04
> 7.6e+02  2  0  2  4  2   2  0  2  4  2  1170
> PCGAMGCoarse_AGG  40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04
> 1.2e+03 18 37  5 27  3  18 37  5 27  3 14632
> PCGAMGProl_AGG40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03
> 9.6e+02  0  0  1  0  2   0  0  1  0  2 0
> PCGAMGPOpt_AGG40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03
> 1.9e+03  1  4  4  1  4   1  4  4  1  4 51328
> GAMG: createProl  40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04
> 4.8e+03 21 42 12 32 10  21 42 12 32 10 14134
> GAMG: partLevel   40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03
> 1.5e+03  2  2  4  1  3   2  2  4  1  3  9431
>
>
>
>
>
>
>
>
>>
>>
>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
>> > Thanks, Barry.
>> >
>> > It works.
>> >
>> > GAMG is three times better than ASM in terms of the number of linear
>> > iterations, but it is five times slower than ASM. Any suggestions to
>> improve
>> > the performance of GAMG? Log files are attached.
>> >
>> > Fande,
>> >
>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
>> >>
>> >>
>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
>> >> >
>> >> > Thanks, Mark and Barry,
>> >> >
>> >> > It works pretty wells in terms of the number of linear iterations
>> (using
>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>> I am
>> >> > using the two-level method via "-pc_mg_levels 2". The reason why the
>> compute
>> >> > time is larger than other preconditioning options is that a matrix
>> free
>> >> > method is used in the fine level and in my particular problem the
>> function
>> >> > evaluation is expensive.
>> >> >
>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> >> > but I do not think I want to make the preconditioning part
>> matrix-free.  Do
>> >> > you guys know how to turn off the matrix-free method for GAMG?
>> >>
>> >>-pc_use_amat false
>> >>
>> >> >
>> >> > Here is the detailed solver:
>> >> >
>> >> > SNES Object: 384 MPI processes
>> >> >   type: newtonls
>> >> >   maximum iterations=200, maximum function evaluations=1
>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> >> >   total number of linear solver iterations=20
>> >> >   total number of function evaluations=166
>> >> >   norm schedule ALWAYS
>> >> >   SNESLineSearch Object:   384 MPI processes
>> >> > type: bt
>> >> >   interpolation: cubic
>> >> >   alpha=1.00e-04
>> >> > maxstep=1.00e+08, minlambda=1.00e-12
>> >> > tolerances: relative=1.00e-08, absolute=1.00e-15,
>> >> > lambda=1.00e-08
>> >> > maximum iterations=40
>> >> >   KSP Object:   384 MPI processes
>> >> > type: gmres
>> >> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> >> > Orthogonalization with no iterative refinement
>> >> >   GMRES: 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-13 Thread Mark Adams
On Wed, Apr 12, 2017 at 1:31 PM, Kong, Fande  wrote:

> Hi Mark,
>
> Thanks for your reply.
>
> On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams  wrote:
>
>> The problem comes from setting the number of MG levels (-pc_mg_levels 2).
>> Not your fault, it looks like the GAMG logic is faulty, in your version at
>> least.
>>
>
> What I want is that GAMG coarsens the fine matrix once and then stops
> doing anything.  I did not see any benefits to have more levels if the
> number of processors is small.
>

The number of levels is a math issue and has nothing to do with
parallelism. If you do just one level your coarse grid is very large and
expensive to solve, so you want to keep coarsening. There is rarely a need
to use -pc_mg_levels


>
>
>>
>> GAMG will force the coarsest grid to one processor by default, in newer
>> versions. You can override the default with:
>>
>> -pc_gamg_use_parallel_coarse_grid_solver
>>
>> Your coarse grid solver is ASM with these 37 equation per process and 512
>> processes. That is bad.
>>
>
> Why this is bad? The subdomain problem is too small?
>

Because ASM with 512 blocks is a weak solver. You want the coarse grid to
be solved exactly.


>
>
>> Note, you could run this on one process to see the proper convergence
>> rate.
>>
>
> Convergence rate for which part? coarse solver, subdomain solver?
>

The overall converge rate.


>
>
>> You can fix this with parameters:
>>
>> >   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations
>> per process on coarse grids (PCGAMGSetProcEqLim)
>> >   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the
>> coarse grid (PCGAMGSetCoarseEqLim)
>>
>> If you really want two levels then set something like
>> -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145
>> (or higher).
>>
>
>
> May have something like: make the coarse problem 1/8 large as the original
> problem? Otherwise, this number is just problem dependent.
>

GAMG will stop automatically so that you do not need problem dependant
parameters.


>
>
>
>> You can run with -info and grep on GAMG and you will meta-data for each
>> level. you should see "npe=1" for the coarsest, last, grid. Or use a
>> parallel direct solver.
>>
>
> I will try.
>
>
>>
>> Note, you should not see much degradation as you increase the number of
>> levels. 18145 eqs on a 3D problem will probably be noticeable. I generally
>> aim for about 3000.
>>
>
> It should be fine as long as the coarse problem is solved by a parallel
> solver.
>

>
> Fande,
>
>
>>
>>
>> On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande  wrote:
>>
>>>
>>>
>>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:
>>>
 You seem to have two levels here and 3M eqs on the fine grid and 37 on
 the coarse grid.
>>>
>>>
>>> 37 is on the sub domain.
>>>
>>>  rows=18145, cols=18145 on the entire coarse grid.
>>>
>>>
>>>
>>>
>>>
 I don't understand that.

 You are also calling the AMG setup a lot, but not spending much time
 in it. Try running with -info and grep on "GAMG".


 On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
 > Thanks, Barry.
 >
 > It works.
 >
 > GAMG is three times better than ASM in terms of the number of linear
 > iterations, but it is five times slower than ASM. Any suggestions to
 improve
 > the performance of GAMG? Log files are attached.
 >
 > Fande,
 >
 > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith 
 wrote:
 >>
 >>
 >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande 
 wrote:
 >> >
 >> > Thanks, Mark and Barry,
 >> >
 >> > It works pretty wells in terms of the number of linear iterations
 (using
 >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute
 time. I am
 >> > using the two-level method via "-pc_mg_levels 2". The reason why
 the compute
 >> > time is larger than other preconditioning options is that a matrix
 free
 >> > method is used in the fine level and in my particular problem the
 function
 >> > evaluation is expensive.
 >> >
 >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
 Newton,
 >> > but I do not think I want to make the preconditioning part
 matrix-free.  Do
 >> > you guys know how to turn off the matrix-free method for GAMG?
 >>
 >>-pc_use_amat false
 >>
 >> >
 >> > Here is the detailed solver:
 >> >
 >> > SNES Object: 384 MPI processes
 >> >   type: newtonls
 >> >   maximum iterations=200, maximum function evaluations=1
 >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
 >> >   total number of linear solver iterations=20
 >> >   total number of function evaluations=166
 >> >   norm schedule ALWAYS
 >> >   SNESLineSearch Object:   384 MPI processes
 >> 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-12 Thread Kong, Fande
On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:

> You seem to have two levels here and 3M eqs on the fine grid and 37 on
> the coarse grid. I don't understand that.
>
> You are also calling the AMG setup a lot, but not spending much time
> in it. Try running with -info and grep on "GAMG".
>

I got the following output:

[0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
nnz/row (ave)=71, np=384
[0] PCGAMGFilterGraph():  100.% nnz after filtering, with threshold 0.,
73.6364 nnz ave. (N=3020875)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 18162 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978702e+00
min=2.559747e-02 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
neq(loc)=40
[0] PCSetUp_GAMG(): 1) N=18162, n data cols=1, nnz/row (ave)=94, 384 active
pes
[0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00795
[0] PCSetUp_GAMG(): level 0) N=3020875, n data rows=1, n data cols=1,
nnz/row (ave)=71, np=384
[0] PCGAMGFilterGraph():  100.% nnz after filtering, with threshold 0.,
73.6364 nnz ave. (N=3020875)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 18145 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.978584e+00
min=2.557887e-02 PC=jacobi
[0] PCGAMGCreateLevel_GAMG(): Aggregate processors noop: new_size=384,
neq(loc)=37
[0] PCSetUp_GAMG(): 1) N=18145, n data cols=1, nnz/row (ave)=94, 384 active
pes
[0] PCSetUp_GAMG(): 2 levels, grid complexity = 1.00792
GAMG specific options
PCGAMGGraph_AGG   40 1.0 8.0759e+00 1.0 3.56e+07 2.3 1.6e+06 1.9e+04
7.6e+02  2  0  2  4  2   2  0  2  4  2  1170
PCGAMGCoarse_AGG  40 1.0 7.1698e+01 1.0 4.05e+09 2.3 4.0e+06 5.1e+04
1.2e+03 18 37  5 27  3  18 37  5 27  3 14632
PCGAMGProl_AGG40 1.0 9.2650e-01 1.2 0.00e+00 0.0 9.8e+05 2.9e+03
9.6e+02  0  0  1  0  2   0  0  1  0  2 0
PCGAMGPOpt_AGG40 1.0 2.4484e+00 1.0 4.72e+08 2.3 3.1e+06 2.3e+03
1.9e+03  1  4  4  1  4   1  4  4  1  4 51328
GAMG: createProl  40 1.0 8.3786e+01 1.0 4.56e+09 2.3 9.6e+06 2.5e+04
4.8e+03 21 42 12 32 10  21 42 12 32 10 14134
GAMG: partLevel   40 1.0 6.7755e+00 1.1 2.59e+08 2.3 2.9e+06 2.5e+03
1.5e+03  2  2  4  1  3   2  2  4  1  3  9431








>
>
> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
> > Thanks, Barry.
> >
> > It works.
> >
> > GAMG is three times better than ASM in terms of the number of linear
> > iterations, but it is five times slower than ASM. Any suggestions to
> improve
> > the performance of GAMG? Log files are attached.
> >
> > Fande,
> >
> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
> >>
> >>
> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> >> >
> >> > Thanks, Mark and Barry,
> >> >
> >> > It works pretty wells in terms of the number of linear iterations
> (using
> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I
> am
> >> > using the two-level method via "-pc_mg_levels 2". The reason why the
> compute
> >> > time is larger than other preconditioning options is that a matrix
> free
> >> > method is used in the fine level and in my particular problem the
> function
> >> > evaluation is expensive.
> >> >
> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
> >> > but I do not think I want to make the preconditioning part
> matrix-free.  Do
> >> > you guys know how to turn off the matrix-free method for GAMG?
> >>
> >>-pc_use_amat false
> >>
> >> >
> >> > Here is the detailed solver:
> >> >
> >> > SNES Object: 384 MPI processes
> >> >   type: newtonls
> >> >   maximum iterations=200, maximum function evaluations=1
> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> >> >   total number of linear solver iterations=20
> >> >   total number of function evaluations=166
> >> >   norm schedule ALWAYS
> >> >   SNESLineSearch Object:   384 MPI processes
> >> > type: bt
> >> >   interpolation: cubic
> >> >   alpha=1.00e-04
> >> > maxstep=1.00e+08, minlambda=1.00e-12
> >> > tolerances: relative=1.00e-08, absolute=1.00e-15,
> >> > lambda=1.00e-08
> >> > maximum iterations=40
> >> >   KSP Object:   384 MPI processes
> >> > type: gmres
> >> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> >> > Orthogonalization with no iterative refinement
> >> >   GMRES: happy breakdown tolerance 1e-30
> >> > maximum iterations=100, initial guess is zero
> >> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> >> > right preconditioning
> >> > using UNPRECONDITIONED norm type for convergence test
> >> >   PC Object:   384 MPI processes
> >> > type: gamg
> >> >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> >> > Cycles per PCApply=1
> >> > Using Galerkin computed coarse 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-12 Thread Kong, Fande
Hi Mark,

Thanks for your reply.

On Wed, Apr 12, 2017 at 9:16 AM, Mark Adams  wrote:

> The problem comes from setting the number of MG levels (-pc_mg_levels 2).
> Not your fault, it looks like the GAMG logic is faulty, in your version at
> least.
>

What I want is that GAMG coarsens the fine matrix once and then stops doing
anything.  I did not see any benefits to have more levels if the number of
processors is small.


>
> GAMG will force the coarsest grid to one processor by default, in newer
> versions. You can override the default with:
>
> -pc_gamg_use_parallel_coarse_grid_solver
>
> Your coarse grid solver is ASM with these 37 equation per process and 512
> processes. That is bad.
>

Why this is bad? The subdomain problem is too small?


> Note, you could run this on one process to see the proper convergence
> rate.
>

Convergence rate for which part? coarse solver, subdomain solver?


> You can fix this with parameters:
>
> >   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations
> per process on coarse grids (PCGAMGSetProcEqLim)
> >   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the
> coarse grid (PCGAMGSetCoarseEqLim)
>
> If you really want two levels then set something like
> -pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145
> (or higher).
>


May have something like: make the coarse problem 1/8 large as the original
problem? Otherwise, this number is just problem dependent.



> You can run with -info and grep on GAMG and you will meta-data for each
> level. you should see "npe=1" for the coarsest, last, grid. Or use a
> parallel direct solver.
>

I will try.


>
> Note, you should not see much degradation as you increase the number of
> levels. 18145 eqs on a 3D problem will probably be noticeable. I generally
> aim for about 3000.
>

It should be fine as long as the coarse problem is solved by a parallel
solver.


Fande,


>
>
> On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande  wrote:
>
>>
>>
>> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:
>>
>>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>>> the coarse grid.
>>
>>
>> 37 is on the sub domain.
>>
>>  rows=18145, cols=18145 on the entire coarse grid.
>>
>>
>>
>>
>>
>>> I don't understand that.
>>>
>>> You are also calling the AMG setup a lot, but not spending much time
>>> in it. Try running with -info and grep on "GAMG".
>>>
>>>
>>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
>>> > Thanks, Barry.
>>> >
>>> > It works.
>>> >
>>> > GAMG is three times better than ASM in terms of the number of linear
>>> > iterations, but it is five times slower than ASM. Any suggestions to
>>> improve
>>> > the performance of GAMG? Log files are attached.
>>> >
>>> > Fande,
>>> >
>>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith 
>>> wrote:
>>> >>
>>> >>
>>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
>>> >> >
>>> >> > Thanks, Mark and Barry,
>>> >> >
>>> >> > It works pretty wells in terms of the number of linear iterations
>>> (using
>>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>>> I am
>>> >> > using the two-level method via "-pc_mg_levels 2". The reason why
>>> the compute
>>> >> > time is larger than other preconditioning options is that a matrix
>>> free
>>> >> > method is used in the fine level and in my particular problem the
>>> function
>>> >> > evaluation is expensive.
>>> >> >
>>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
>>> Newton,
>>> >> > but I do not think I want to make the preconditioning part
>>> matrix-free.  Do
>>> >> > you guys know how to turn off the matrix-free method for GAMG?
>>> >>
>>> >>-pc_use_amat false
>>> >>
>>> >> >
>>> >> > Here is the detailed solver:
>>> >> >
>>> >> > SNES Object: 384 MPI processes
>>> >> >   type: newtonls
>>> >> >   maximum iterations=200, maximum function evaluations=1
>>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>>> >> >   total number of linear solver iterations=20
>>> >> >   total number of function evaluations=166
>>> >> >   norm schedule ALWAYS
>>> >> >   SNESLineSearch Object:   384 MPI processes
>>> >> > type: bt
>>> >> >   interpolation: cubic
>>> >> >   alpha=1.00e-04
>>> >> > maxstep=1.00e+08, minlambda=1.00e-12
>>> >> > tolerances: relative=1.00e-08, absolute=1.00e-15,
>>> >> > lambda=1.00e-08
>>> >> > maximum iterations=40
>>> >> >   KSP Object:   384 MPI processes
>>> >> > type: gmres
>>> >> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>>> >> > Orthogonalization with no iterative refinement
>>> >> >   GMRES: happy breakdown tolerance 1e-30
>>> >> > maximum iterations=100, initial guess is zero
>>> >> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
>>> >> > right preconditioning

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-12 Thread Mark Adams
The problem comes from setting the number of MG levels (-pc_mg_levels 2).
Not your fault, it looks like the GAMG logic is faulty, in your version at
least.

GAMG will force the coarsest grid to one processor by default, in newer
versions. You can override the default with:

-pc_gamg_use_parallel_coarse_grid_solver

Your coarse grid solver is ASM with these 37 equation per process and 512
processes. That is bad. Note, you could run this on one process to see the
proper convergence rate.  You can fix this with parameters:

>   -pc_gamg_process_eq_limit <50>: Limit (goal) on number of equations per
process on coarse grids (PCGAMGSetProcEqLim)
>   -pc_gamg_coarse_eq_limit <50>: Limit on number of equations for the
coarse grid (PCGAMGSetCoarseEqLim)

If you really want two levels then set something like
-pc_gamg_coarse_eq_limit 18145 (or higher) -pc_gamg_coarse_eq_limit 18145
(or higher). You can run with -info and grep on GAMG and you will meta-data
for each level. you should see "npe=1" for the coarsest, last, grid. Or use
a parallel direct solver.

Note, you should not see much degradation as you increase the number of
levels. 18145 eqs on a 3D problem will probably be noticeable. I generally
aim for about 3000.


On Mon, Apr 10, 2017 at 12:17 PM, Kong, Fande  wrote:

>
>
> On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:
>
>> You seem to have two levels here and 3M eqs on the fine grid and 37 on
>> the coarse grid.
>
>
> 37 is on the sub domain.
>
>  rows=18145, cols=18145 on the entire coarse grid.
>
>
>
>
>
>> I don't understand that.
>>
>> You are also calling the AMG setup a lot, but not spending much time
>> in it. Try running with -info and grep on "GAMG".
>>
>>
>> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
>> > Thanks, Barry.
>> >
>> > It works.
>> >
>> > GAMG is three times better than ASM in terms of the number of linear
>> > iterations, but it is five times slower than ASM. Any suggestions to
>> improve
>> > the performance of GAMG? Log files are attached.
>> >
>> > Fande,
>> >
>> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
>> >>
>> >>
>> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
>> >> >
>> >> > Thanks, Mark and Barry,
>> >> >
>> >> > It works pretty wells in terms of the number of linear iterations
>> (using
>> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
>> I am
>> >> > using the two-level method via "-pc_mg_levels 2". The reason why the
>> compute
>> >> > time is larger than other preconditioning options is that a matrix
>> free
>> >> > method is used in the fine level and in my particular problem the
>> function
>> >> > evaluation is expensive.
>> >> >
>> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> >> > but I do not think I want to make the preconditioning part
>> matrix-free.  Do
>> >> > you guys know how to turn off the matrix-free method for GAMG?
>> >>
>> >>-pc_use_amat false
>> >>
>> >> >
>> >> > Here is the detailed solver:
>> >> >
>> >> > SNES Object: 384 MPI processes
>> >> >   type: newtonls
>> >> >   maximum iterations=200, maximum function evaluations=1
>> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> >> >   total number of linear solver iterations=20
>> >> >   total number of function evaluations=166
>> >> >   norm schedule ALWAYS
>> >> >   SNESLineSearch Object:   384 MPI processes
>> >> > type: bt
>> >> >   interpolation: cubic
>> >> >   alpha=1.00e-04
>> >> > maxstep=1.00e+08, minlambda=1.00e-12
>> >> > tolerances: relative=1.00e-08, absolute=1.00e-15,
>> >> > lambda=1.00e-08
>> >> > maximum iterations=40
>> >> >   KSP Object:   384 MPI processes
>> >> > type: gmres
>> >> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> >> > Orthogonalization with no iterative refinement
>> >> >   GMRES: happy breakdown tolerance 1e-30
>> >> > maximum iterations=100, initial guess is zero
>> >> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
>> >> > right preconditioning
>> >> > using UNPRECONDITIONED norm type for convergence test
>> >> >   PC Object:   384 MPI processes
>> >> > type: gamg
>> >> >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> >> > Cycles per PCApply=1
>> >> > Using Galerkin computed coarse grid matrices
>> >> > GAMG specific options
>> >> >   Threshold for dropping small values from graph 0.
>> >> >   AGG specific options
>> >> > Symmetric graph true
>> >> > Coarse grid solver -- level ---
>> >> >   KSP Object:  (mg_coarse_)   384 MPI processes
>> >> > type: preonly
>> >> > maximum iterations=1, initial guess is zero
>> >> > tolerances:  relative=1e-05, absolute=1e-50,
>> divergence=1.
>> >> > left 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-10 Thread Kong, Fande
On Sun, Apr 9, 2017 at 6:04 AM, Mark Adams  wrote:

> You seem to have two levels here and 3M eqs on the fine grid and 37 on
> the coarse grid.


37 is on the sub domain.

 rows=18145, cols=18145 on the entire coarse grid.





> I don't understand that.
>
> You are also calling the AMG setup a lot, but not spending much time
> in it. Try running with -info and grep on "GAMG".
>
>
> On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
> > Thanks, Barry.
> >
> > It works.
> >
> > GAMG is three times better than ASM in terms of the number of linear
> > iterations, but it is five times slower than ASM. Any suggestions to
> improve
> > the performance of GAMG? Log files are attached.
> >
> > Fande,
> >
> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
> >>
> >>
> >> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> >> >
> >> > Thanks, Mark and Barry,
> >> >
> >> > It works pretty wells in terms of the number of linear iterations
> (using
> >> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I
> am
> >> > using the two-level method via "-pc_mg_levels 2". The reason why the
> compute
> >> > time is larger than other preconditioning options is that a matrix
> free
> >> > method is used in the fine level and in my particular problem the
> function
> >> > evaluation is expensive.
> >> >
> >> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
> >> > but I do not think I want to make the preconditioning part
> matrix-free.  Do
> >> > you guys know how to turn off the matrix-free method for GAMG?
> >>
> >>-pc_use_amat false
> >>
> >> >
> >> > Here is the detailed solver:
> >> >
> >> > SNES Object: 384 MPI processes
> >> >   type: newtonls
> >> >   maximum iterations=200, maximum function evaluations=1
> >> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> >> >   total number of linear solver iterations=20
> >> >   total number of function evaluations=166
> >> >   norm schedule ALWAYS
> >> >   SNESLineSearch Object:   384 MPI processes
> >> > type: bt
> >> >   interpolation: cubic
> >> >   alpha=1.00e-04
> >> > maxstep=1.00e+08, minlambda=1.00e-12
> >> > tolerances: relative=1.00e-08, absolute=1.00e-15,
> >> > lambda=1.00e-08
> >> > maximum iterations=40
> >> >   KSP Object:   384 MPI processes
> >> > type: gmres
> >> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> >> > Orthogonalization with no iterative refinement
> >> >   GMRES: happy breakdown tolerance 1e-30
> >> > maximum iterations=100, initial guess is zero
> >> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> >> > right preconditioning
> >> > using UNPRECONDITIONED norm type for convergence test
> >> >   PC Object:   384 MPI processes
> >> > type: gamg
> >> >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> >> > Cycles per PCApply=1
> >> > Using Galerkin computed coarse grid matrices
> >> > GAMG specific options
> >> >   Threshold for dropping small values from graph 0.
> >> >   AGG specific options
> >> > Symmetric graph true
> >> > Coarse grid solver -- level ---
> >> >   KSP Object:  (mg_coarse_)   384 MPI processes
> >> > type: preonly
> >> > maximum iterations=1, initial guess is zero
> >> > tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >> > left preconditioning
> >> > using NONE norm type for convergence test
> >> >   PC Object:  (mg_coarse_)   384 MPI processes
> >> > type: bjacobi
> >> >   block Jacobi: number of blocks = 384
> >> >   Local solve is same for all blocks, in the following KSP and
> >> > PC objects:
> >> > KSP Object:(mg_coarse_sub_) 1 MPI processes
> >> >   type: preonly
> >> >   maximum iterations=1, initial guess is zero
> >> >   tolerances:  relative=1e-05, absolute=1e-50,
> divergence=1.
> >> >   left preconditioning
> >> >   using NONE norm type for convergence test
> >> > PC Object:(mg_coarse_sub_) 1 MPI processes
> >> >   type: lu
> >> > LU: out-of-place factorization
> >> > tolerance for zero pivot 2.22045e-14
> >> > using diagonal shift on blocks to prevent zero pivot
> >> > [INBLOCKS]
> >> > matrix ordering: nd
> >> > factor fill ratio given 5., needed 1.31367
> >> >   Factored matrix follows:
> >> > Mat Object: 1 MPI processes
> >> >   type: seqaij
> >> >   rows=37, cols=37
> >> >   package used to perform factorization: petsc
> >> >   total: nonzeros=913, allocated nonzeros=913
> >> >   

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-09 Thread Mark Adams
You seem to have two levels here and 3M eqs on the fine grid and 37 on
the coarse grid. I don't understand that.

You are also calling the AMG setup a lot, but not spending much time
in it. Try running with -info and grep on "GAMG".


On Fri, Apr 7, 2017 at 5:29 PM, Kong, Fande  wrote:
> Thanks, Barry.
>
> It works.
>
> GAMG is three times better than ASM in terms of the number of linear
> iterations, but it is five times slower than ASM. Any suggestions to improve
> the performance of GAMG? Log files are attached.
>
> Fande,
>
> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
>>
>>
>> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
>> >
>> > Thanks, Mark and Barry,
>> >
>> > It works pretty wells in terms of the number of linear iterations (using
>> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
>> > using the two-level method via "-pc_mg_levels 2". The reason why the 
>> > compute
>> > time is larger than other preconditioning options is that a matrix free
>> > method is used in the fine level and in my particular problem the function
>> > evaluation is expensive.
>> >
>> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
>> > but I do not think I want to make the preconditioning part matrix-free.  Do
>> > you guys know how to turn off the matrix-free method for GAMG?
>>
>>-pc_use_amat false
>>
>> >
>> > Here is the detailed solver:
>> >
>> > SNES Object: 384 MPI processes
>> >   type: newtonls
>> >   maximum iterations=200, maximum function evaluations=1
>> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>> >   total number of linear solver iterations=20
>> >   total number of function evaluations=166
>> >   norm schedule ALWAYS
>> >   SNESLineSearch Object:   384 MPI processes
>> > type: bt
>> >   interpolation: cubic
>> >   alpha=1.00e-04
>> > maxstep=1.00e+08, minlambda=1.00e-12
>> > tolerances: relative=1.00e-08, absolute=1.00e-15,
>> > lambda=1.00e-08
>> > maximum iterations=40
>> >   KSP Object:   384 MPI processes
>> > type: gmres
>> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
>> > Orthogonalization with no iterative refinement
>> >   GMRES: happy breakdown tolerance 1e-30
>> > maximum iterations=100, initial guess is zero
>> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
>> > right preconditioning
>> > using UNPRECONDITIONED norm type for convergence test
>> >   PC Object:   384 MPI processes
>> > type: gamg
>> >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
>> > Cycles per PCApply=1
>> > Using Galerkin computed coarse grid matrices
>> > GAMG specific options
>> >   Threshold for dropping small values from graph 0.
>> >   AGG specific options
>> > Symmetric graph true
>> > Coarse grid solver -- level ---
>> >   KSP Object:  (mg_coarse_)   384 MPI processes
>> > type: preonly
>> > maximum iterations=1, initial guess is zero
>> > tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>> > left preconditioning
>> > using NONE norm type for convergence test
>> >   PC Object:  (mg_coarse_)   384 MPI processes
>> > type: bjacobi
>> >   block Jacobi: number of blocks = 384
>> >   Local solve is same for all blocks, in the following KSP and
>> > PC objects:
>> > KSP Object:(mg_coarse_sub_) 1 MPI processes
>> >   type: preonly
>> >   maximum iterations=1, initial guess is zero
>> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>> >   left preconditioning
>> >   using NONE norm type for convergence test
>> > PC Object:(mg_coarse_sub_) 1 MPI processes
>> >   type: lu
>> > LU: out-of-place factorization
>> > tolerance for zero pivot 2.22045e-14
>> > using diagonal shift on blocks to prevent zero pivot
>> > [INBLOCKS]
>> > matrix ordering: nd
>> > factor fill ratio given 5., needed 1.31367
>> >   Factored matrix follows:
>> > Mat Object: 1 MPI processes
>> >   type: seqaij
>> >   rows=37, cols=37
>> >   package used to perform factorization: petsc
>> >   total: nonzeros=913, allocated nonzeros=913
>> >   total number of mallocs used during MatSetValues calls
>> > =0
>> > not using I-node routines
>> >   linear system matrix = precond matrix:
>> >   Mat Object:   1 MPI processes
>> > type: seqaij
>> > rows=37, cols=37
>> > total: nonzeros=695, allocated nonzeros=695
>> > total number 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-07 Thread Kong, Fande
On Fri, Apr 7, 2017 at 3:52 PM, Barry Smith  wrote:

>
> > On Apr 7, 2017, at 4:46 PM, Kong, Fande  wrote:
> >
> >
> >
> > On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith  wrote:
> >
> >   Using Petsc Release Version 3.7.5, unknown
> >
> >So are you using the release or are you using master branch?
> >
> > I am working on the maint branch.
> >
> > I did something two months ago:
> >
> >  git clone -b maint https://urldefense.proofpoint.
> com/v2/url?u=https-3A__bitbucket.org_petsc_petsc=DwIFAg=
> 54IZrppPQZKX9mLzcGdPfFD1hxrcB__aEkJFOKJFd00=DUUt3SRGI0_
> JgtNaS3udV68GRkgV4ts7XKfj2opmiCY=c92UNplDTVgzFrXIn_
> 70buWa2rXPGUKN083_aJYI0FQ=yrulwZxJiduZc-703r7PJOUApPDehsFIkhS0BTrroXc=
> petsc.
> >
> >
> > I am interested to improve the GAMG performance.
>
>   Why, why not use the best solver for your problem?
>

I am just curious. I want to understand the potential of interesting
preconditioners.



>
> > Is it possible? It can not beat ASM at all? The multilevel method should
> be better than the one-level if the number of processor cores is large.
>
>The ASM is taking 30 iterations, this is fantastic, it is really going
> to be tough to get GAMG to be faster (set up time for GAMG is high).
>
>What happens to both with 10 times as many processes? 100 times as many?
>


Did not try many processes yet.

Fande,



>
>
>Barry
>
> >
> > Fande,
> >
> >
> >If you use master the ASM will be even faster.
> >
> > What's new in master?
> >
> >
> > Fande,
> >
> >
> >
> > > On Apr 7, 2017, at 4:29 PM, Kong, Fande  wrote:
> > >
> > > Thanks, Barry.
> > >
> > > It works.
> > >
> > > GAMG is three times better than ASM in terms of the number of linear
> iterations, but it is five times slower than ASM. Any suggestions to
> improve the performance of GAMG? Log files are attached.
> > >
> > > Fande,
> > >
> > > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith 
> wrote:
> > >
> > > > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> > > >
> > > > Thanks, Mark and Barry,
> > > >
> > > > It works pretty wells in terms of the number of linear iterations
> (using "-pc_gamg_sym_graph true"), but it is horrible in the compute time.
> I am using the two-level method via "-pc_mg_levels 2". The reason why the
> compute time is larger than other preconditioning options is that a matrix
> free method is used in the fine level and in my particular problem the
> function evaluation is expensive.
> > > >
> > > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free
> Newton, but I do not think I want to make the preconditioning part
> matrix-free.  Do you guys know how to turn off the matrix-free method for
> GAMG?
> > >
> > >-pc_use_amat false
> > >
> > > >
> > > > Here is the detailed solver:
> > > >
> > > > SNES Object: 384 MPI processes
> > > >   type: newtonls
> > > >   maximum iterations=200, maximum function evaluations=1
> > > >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> > > >   total number of linear solver iterations=20
> > > >   total number of function evaluations=166
> > > >   norm schedule ALWAYS
> > > >   SNESLineSearch Object:   384 MPI processes
> > > > type: bt
> > > >   interpolation: cubic
> > > >   alpha=1.00e-04
> > > > maxstep=1.00e+08, minlambda=1.00e-12
> > > > tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> > > > maximum iterations=40
> > > >   KSP Object:   384 MPI processes
> > > > type: gmres
> > > >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> > > >   GMRES: happy breakdown tolerance 1e-30
> > > > maximum iterations=100, initial guess is zero
> > > > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> > > > right preconditioning
> > > > using UNPRECONDITIONED norm type for convergence test
> > > >   PC Object:   384 MPI processes
> > > > type: gamg
> > > >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > > > Cycles per PCApply=1
> > > > Using Galerkin computed coarse grid matrices
> > > > GAMG specific options
> > > >   Threshold for dropping small values from graph 0.
> > > >   AGG specific options
> > > > Symmetric graph true
> > > > Coarse grid solver -- level ---
> > > >   KSP Object:  (mg_coarse_)   384 MPI processes
> > > > type: preonly
> > > > maximum iterations=1, initial guess is zero
> > > > tolerances:  relative=1e-05, absolute=1e-50,
> divergence=1.
> > > > left preconditioning
> > > > using NONE norm type for convergence test
> > > >   PC Object:  (mg_coarse_)   384 MPI processes
> > > > type: bjacobi
> > > >   block Jacobi: number of blocks = 384
> > > >   Local solve is 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-07 Thread Barry Smith

> On Apr 7, 2017, at 4:46 PM, Kong, Fande  wrote:
> 
> 
> 
> On Fri, Apr 7, 2017 at 3:39 PM, Barry Smith  wrote:
> 
>   Using Petsc Release Version 3.7.5, unknown
> 
>So are you using the release or are you using master branch?
> 
> I am working on the maint branch. 
> 
> I did something two months ago:
> 
>  git clone -b maint https://bitbucket.org/petsc/petsc petsc.
> 
> 
> I am interested to improve the GAMG performance.

  Why, why not use the best solver for your problem?

> Is it possible? It can not beat ASM at all? The multilevel method should be 
> better than the one-level if the number of processor cores is large.

   The ASM is taking 30 iterations, this is fantastic, it is really going to be 
tough to get GAMG to be faster (set up time for GAMG is high).

   What happens to both with 10 times as many processes? 100 times as many?


   Barry

> 
> Fande,
>  
> 
>If you use master the ASM will be even faster.
> 
> What's new in master?
> 
> 
> Fande,
>  
> 
> 
> > On Apr 7, 2017, at 4:29 PM, Kong, Fande  wrote:
> >
> > Thanks, Barry.
> >
> > It works.
> >
> > GAMG is three times better than ASM in terms of the number of linear 
> > iterations, but it is five times slower than ASM. Any suggestions to 
> > improve the performance of GAMG? Log files are attached.
> >
> > Fande,
> >
> > On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
> >
> > > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> > >
> > > Thanks, Mark and Barry,
> > >
> > > It works pretty wells in terms of the number of linear iterations (using 
> > > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am 
> > > using the two-level method via "-pc_mg_levels 2". The reason why the 
> > > compute time is larger than other preconditioning options is that a 
> > > matrix free method is used in the fine level and in my particular problem 
> > > the function evaluation is expensive.
> > >
> > > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but 
> > > I do not think I want to make the preconditioning part matrix-free.  Do 
> > > you guys know how to turn off the matrix-free method for GAMG?
> >
> >-pc_use_amat false
> >
> > >
> > > Here is the detailed solver:
> > >
> > > SNES Object: 384 MPI processes
> > >   type: newtonls
> > >   maximum iterations=200, maximum function evaluations=1
> > >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> > >   total number of linear solver iterations=20
> > >   total number of function evaluations=166
> > >   norm schedule ALWAYS
> > >   SNESLineSearch Object:   384 MPI processes
> > > type: bt
> > >   interpolation: cubic
> > >   alpha=1.00e-04
> > > maxstep=1.00e+08, minlambda=1.00e-12
> > > tolerances: relative=1.00e-08, absolute=1.00e-15, 
> > > lambda=1.00e-08
> > > maximum iterations=40
> > >   KSP Object:   384 MPI processes
> > > type: gmres
> > >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt 
> > > Orthogonalization with no iterative refinement
> > >   GMRES: happy breakdown tolerance 1e-30
> > > maximum iterations=100, initial guess is zero
> > > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> > > right preconditioning
> > > using UNPRECONDITIONED norm type for convergence test
> > >   PC Object:   384 MPI processes
> > > type: gamg
> > >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > > Cycles per PCApply=1
> > > Using Galerkin computed coarse grid matrices
> > > GAMG specific options
> > >   Threshold for dropping small values from graph 0.
> > >   AGG specific options
> > > Symmetric graph true
> > > Coarse grid solver -- level ---
> > >   KSP Object:  (mg_coarse_)   384 MPI processes
> > > type: preonly
> > > maximum iterations=1, initial guess is zero
> > > tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> > > left preconditioning
> > > using NONE norm type for convergence test
> > >   PC Object:  (mg_coarse_)   384 MPI processes
> > > type: bjacobi
> > >   block Jacobi: number of blocks = 384
> > >   Local solve is same for all blocks, in the following KSP and PC 
> > > objects:
> > > KSP Object:(mg_coarse_sub_) 1 MPI processes
> > >   type: preonly
> > >   maximum iterations=1, initial guess is zero
> > >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> > >   left preconditioning
> > >   using NONE norm type for convergence test
> > > PC Object:(mg_coarse_sub_) 1 MPI processes
> > >   type: lu
> > > LU: out-of-place factorization
> > > tolerance for zero pivot 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-07 Thread Barry Smith

  Using Petsc Release Version 3.7.5, unknown 

   So are you using the release or are you using master branch?

   If you use master the ASM will be even faster.


> On Apr 7, 2017, at 4:29 PM, Kong, Fande  wrote:
> 
> Thanks, Barry.
> 
> It works.
> 
> GAMG is three times better than ASM in terms of the number of linear 
> iterations, but it is five times slower than ASM. Any suggestions to improve 
> the performance of GAMG? Log files are attached.
> 
> Fande,
> 
> On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:
> 
> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> >
> > Thanks, Mark and Barry,
> >
> > It works pretty wells in terms of the number of linear iterations (using 
> > "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am 
> > using the two-level method via "-pc_mg_levels 2". The reason why the 
> > compute time is larger than other preconditioning options is that a matrix 
> > free method is used in the fine level and in my particular problem the 
> > function evaluation is expensive.
> >
> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I 
> > do not think I want to make the preconditioning part matrix-free.  Do you 
> > guys know how to turn off the matrix-free method for GAMG?
> 
>-pc_use_amat false
> 
> >
> > Here is the detailed solver:
> >
> > SNES Object: 384 MPI processes
> >   type: newtonls
> >   maximum iterations=200, maximum function evaluations=1
> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> >   total number of linear solver iterations=20
> >   total number of function evaluations=166
> >   norm schedule ALWAYS
> >   SNESLineSearch Object:   384 MPI processes
> > type: bt
> >   interpolation: cubic
> >   alpha=1.00e-04
> > maxstep=1.00e+08, minlambda=1.00e-12
> > tolerances: relative=1.00e-08, absolute=1.00e-15, 
> > lambda=1.00e-08
> > maximum iterations=40
> >   KSP Object:   384 MPI processes
> > type: gmres
> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt 
> > Orthogonalization with no iterative refinement
> >   GMRES: happy breakdown tolerance 1e-30
> > maximum iterations=100, initial guess is zero
> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> > right preconditioning
> > using UNPRECONDITIONED norm type for convergence test
> >   PC Object:   384 MPI processes
> > type: gamg
> >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > Cycles per PCApply=1
> > Using Galerkin computed coarse grid matrices
> > GAMG specific options
> >   Threshold for dropping small values from graph 0.
> >   AGG specific options
> > Symmetric graph true
> > Coarse grid solver -- level ---
> >   KSP Object:  (mg_coarse_)   384 MPI processes
> > type: preonly
> > maximum iterations=1, initial guess is zero
> > tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> > left preconditioning
> > using NONE norm type for convergence test
> >   PC Object:  (mg_coarse_)   384 MPI processes
> > type: bjacobi
> >   block Jacobi: number of blocks = 384
> >   Local solve is same for all blocks, in the following KSP and PC 
> > objects:
> > KSP Object:(mg_coarse_sub_) 1 MPI processes
> >   type: preonly
> >   maximum iterations=1, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >   left preconditioning
> >   using NONE norm type for convergence test
> > PC Object:(mg_coarse_sub_) 1 MPI processes
> >   type: lu
> > LU: out-of-place factorization
> > tolerance for zero pivot 2.22045e-14
> > using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
> > matrix ordering: nd
> > factor fill ratio given 5., needed 1.31367
> >   Factored matrix follows:
> > Mat Object: 1 MPI processes
> >   type: seqaij
> >   rows=37, cols=37
> >   package used to perform factorization: petsc
> >   total: nonzeros=913, allocated nonzeros=913
> >   total number of mallocs used during MatSetValues calls =0
> > not using I-node routines
> >   linear system matrix = precond matrix:
> >   Mat Object:   1 MPI processes
> > type: seqaij
> > rows=37, cols=37
> > total: nonzeros=695, allocated nonzeros=695
> > total number of mallocs used during MatSetValues calls =0
> >   not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-07 Thread Kong, Fande
Thanks, Barry.

It works.

GAMG is three times better than ASM in terms of the number of linear
iterations, but it is five times slower than ASM. Any suggestions to
improve the performance of GAMG? Log files are attached.

Fande,

On Thu, Apr 6, 2017 at 3:39 PM, Barry Smith  wrote:

>
> > On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> >
> > Thanks, Mark and Barry,
> >
> > It works pretty wells in terms of the number of linear iterations (using
> "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
> using the two-level method via "-pc_mg_levels 2". The reason why the
> compute time is larger than other preconditioning options is that a matrix
> free method is used in the fine level and in my particular problem the
> function evaluation is expensive.
> >
> > I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton,
> but I do not think I want to make the preconditioning part matrix-free.  Do
> you guys know how to turn off the matrix-free method for GAMG?
>
>-pc_use_amat false
>
> >
> > Here is the detailed solver:
> >
> > SNES Object: 384 MPI processes
> >   type: newtonls
> >   maximum iterations=200, maximum function evaluations=1
> >   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
> >   total number of linear solver iterations=20
> >   total number of function evaluations=166
> >   norm schedule ALWAYS
> >   SNESLineSearch Object:   384 MPI processes
> > type: bt
> >   interpolation: cubic
> >   alpha=1.00e-04
> > maxstep=1.00e+08, minlambda=1.00e-12
> > tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> > maximum iterations=40
> >   KSP Object:   384 MPI processes
> > type: gmres
> >   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> >   GMRES: happy breakdown tolerance 1e-30
> > maximum iterations=100, initial guess is zero
> > tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> > right preconditioning
> > using UNPRECONDITIONED norm type for convergence test
> >   PC Object:   384 MPI processes
> > type: gamg
> >   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> > Cycles per PCApply=1
> > Using Galerkin computed coarse grid matrices
> > GAMG specific options
> >   Threshold for dropping small values from graph 0.
> >   AGG specific options
> > Symmetric graph true
> > Coarse grid solver -- level ---
> >   KSP Object:  (mg_coarse_)   384 MPI processes
> > type: preonly
> > maximum iterations=1, initial guess is zero
> > tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> > left preconditioning
> > using NONE norm type for convergence test
> >   PC Object:  (mg_coarse_)   384 MPI processes
> > type: bjacobi
> >   block Jacobi: number of blocks = 384
> >   Local solve is same for all blocks, in the following KSP and
> PC objects:
> > KSP Object:(mg_coarse_sub_) 1 MPI processes
> >   type: preonly
> >   maximum iterations=1, initial guess is zero
> >   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> >   left preconditioning
> >   using NONE norm type for convergence test
> > PC Object:(mg_coarse_sub_) 1 MPI processes
> >   type: lu
> > LU: out-of-place factorization
> > tolerance for zero pivot 2.22045e-14
> > using diagonal shift on blocks to prevent zero pivot
> [INBLOCKS]
> > matrix ordering: nd
> > factor fill ratio given 5., needed 1.31367
> >   Factored matrix follows:
> > Mat Object: 1 MPI processes
> >   type: seqaij
> >   rows=37, cols=37
> >   package used to perform factorization: petsc
> >   total: nonzeros=913, allocated nonzeros=913
> >   total number of mallocs used during MatSetValues calls
> =0
> > not using I-node routines
> >   linear system matrix = precond matrix:
> >   Mat Object:   1 MPI processes
> > type: seqaij
> > rows=37, cols=37
> > total: nonzeros=695, allocated nonzeros=695
> > total number of mallocs used during MatSetValues calls =0
> >   not using I-node routines
> > linear system matrix = precond matrix:
> > Mat Object: 384 MPI processes
> >   type: mpiaij
> >   rows=18145, cols=18145
> >   total: nonzeros=1709115, allocated nonzeros=1709115
> >   total number of mallocs used during MatSetValues calls =0
> > not using I-node (on process 0) routines
> >  

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-06 Thread Barry Smith

> On Apr 6, 2017, at 9:39 AM, Kong, Fande  wrote:
> 
> Thanks, Mark and Barry,
> 
> It works pretty wells in terms of the number of linear iterations (using 
> "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am 
> using the two-level method via "-pc_mg_levels 2". The reason why the compute 
> time is larger than other preconditioning options is that a matrix free 
> method is used in the fine level and in my particular problem the function 
> evaluation is expensive. 
> 
> I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I 
> do not think I want to make the preconditioning part matrix-free.  Do you 
> guys know how to turn off the matrix-free method for GAMG?

   -pc_use_amat false

> 
> Here is the detailed solver:
> 
> SNES Object: 384 MPI processes
>   type: newtonls
>   maximum iterations=200, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-08, solution=1e-50
>   total number of linear solver iterations=20
>   total number of function evaluations=166
>   norm schedule ALWAYS
>   SNESLineSearch Object:   384 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15, 
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object:   384 MPI processes
> type: gmres
>   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt 
> Orthogonalization with no iterative refinement
>   GMRES: happy breakdown tolerance 1e-30
> maximum iterations=100, initial guess is zero
> tolerances:  relative=0.001, absolute=1e-50, divergence=1.
> right preconditioning
> using UNPRECONDITIONED norm type for convergence test
>   PC Object:   384 MPI processes
> type: gamg
>   MG: type is MULTIPLICATIVE, levels=2 cycles=v
> Cycles per PCApply=1
> Using Galerkin computed coarse grid matrices
> GAMG specific options
>   Threshold for dropping small values from graph 0.
>   AGG specific options
> Symmetric graph true
> Coarse grid solver -- level ---
>   KSP Object:  (mg_coarse_)   384 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object:  (mg_coarse_)   384 MPI processes
> type: bjacobi
>   block Jacobi: number of blocks = 384
>   Local solve is same for all blocks, in the following KSP and PC 
> objects:
> KSP Object:(mg_coarse_sub_) 1 MPI processes
>   type: preonly
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using NONE norm type for convergence test
> PC Object:(mg_coarse_sub_) 1 MPI processes
>   type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
> matrix ordering: nd
> factor fill ratio given 5., needed 1.31367
>   Factored matrix follows:
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=37, cols=37
>   package used to perform factorization: petsc
>   total: nonzeros=913, allocated nonzeros=913
>   total number of mallocs used during MatSetValues calls =0
> not using I-node routines
>   linear system matrix = precond matrix:
>   Mat Object:   1 MPI processes
> type: seqaij
> rows=37, cols=37
> total: nonzeros=695, allocated nonzeros=695
> total number of mallocs used during MatSetValues calls =0
>   not using I-node routines
> linear system matrix = precond matrix:
> Mat Object: 384 MPI processes
>   type: mpiaij
>   rows=18145, cols=18145
>   total: nonzeros=1709115, allocated nonzeros=1709115
>   total number of mallocs used during MatSetValues calls =0
> not using I-node (on process 0) routines
> Down solver (pre-smoother) on level 1 ---
>   KSP Object:  (mg_levels_1_)   384 MPI processes
> type: chebyshev
>   Chebyshev: eigenvalue estimates:  min = 0.19, max = 1.46673
>   Chebyshev: eigenvalues estimated using gmres with translations  [0. 
> 0.1; 0. 1.1]
>   KSP Object:  (mg_levels_1_esteig_)   384 MPI 
> processes
> type: gmres
>   GMRES: 

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-06 Thread Mark Adams
On Thu, Apr 6, 2017 at 7:39 AM, Kong, Fande  wrote:
> Thanks, Mark and Barry,
>
> It works pretty wells in terms of the number of linear iterations (using
> "-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
> using the two-level method via "-pc_mg_levels 2". The reason why the compute
> time is larger than other preconditioning options is that a matrix free
> method is used in the fine level and in my particular problem the function
> evaluation is expensive.
>
> I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I
> do not think I want to make the preconditioning part matrix-free.  Do you
> guys know how to turn off the matrix-free method for GAMG?

You do have an option to use the operator or the preconditioner
operator (matrix) for the fine grid smoother, but I thought it uses
the PC matrix by default. I don't recall the parameters nor do I see
this in the view output.  Others should be able to help.


Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-06 Thread Kong, Fande
Thanks, Mark and Barry,

It works pretty wells in terms of the number of linear iterations (using
"-pc_gamg_sym_graph true"), but it is horrible in the compute time. I am
using the two-level method via "-pc_mg_levels 2". The reason why the
compute time is larger than other preconditioning options is that a matrix
free method is used in the fine level and in my particular problem the
function evaluation is expensive.

I am using "-snes_mf_operator 1" to turn on the Jacobian-free Newton, but I
do not think I want to make the preconditioning part matrix-free.  Do you
guys know how to turn off the matrix-free method for GAMG?

Here is the detailed solver:































































































































*SNES Object: 384 MPI processes  type: newtonls  maximum iterations=200,
maximum function evaluations=1  tolerances: relative=1e-08,
absolute=1e-08, solution=1e-50  total number of linear solver
iterations=20  total number of function evaluations=166  norm schedule
ALWAYS  SNESLineSearch Object:   384 MPI processestype: bt
interpolation: cubic  alpha=1.00e-04maxstep=1.00e+08,
minlambda=1.00e-12tolerances: relative=1.00e-08,
absolute=1.00e-15, lambda=1.00e-08maximum iterations=40  KSP
Object:   384 MPI processestype: gmres  GMRES: restart=100, using
Classical (unmodified) Gram-Schmidt Orthogonalization with no iterative
refinement  GMRES: happy breakdown tolerance 1e-30maximum
iterations=100, initial guess is zerotolerances:  relative=0.001,
absolute=1e-50, divergence=1.right preconditioningusing
UNPRECONDITIONED norm type for convergence test  PC Object:   384 MPI
processestype: gamg  MG: type is MULTIPLICATIVE, levels=2
cycles=vCycles per PCApply=1Using Galerkin computed coarse
grid matricesGAMG specific options  Threshold for dropping
small values from graph 0.  AGG specific options
Symmetric graph trueCoarse grid solver -- level
---  KSP Object:  (mg_coarse_)
384 MPI processestype: preonlymaximum iterations=1,
initial guess is zerotolerances:  relative=1e-05, absolute=1e-50,
divergence=1.left preconditioningusing NONE norm type
for convergence test  PC Object:  (mg_coarse_)   384 MPI
processestype: bjacobi  block Jacobi: number of blocks =
384  Local solve is same for all blocks, in the following KSP and
PC objects:KSP Object:(mg_coarse_sub_) 1 MPI
processes  type: preonly  maximum iterations=1, initial
guess is zero  tolerances:  relative=1e-05, absolute=1e-50,
divergence=1.  left preconditioning  using NONE norm
type for convergence testPC Object:(mg_coarse_sub_)
1 MPI processes  type: luLU: out-of-place
factorizationtolerance for zero pivot 2.22045e-14
using diagonal shift on blocks to prevent zero pivot [INBLOCKS]
matrix ordering: ndfactor fill ratio given 5., needed
1.31367  Factored matrix follows:Mat
Object: 1 MPI processes  type:
seqaij  rows=37, cols=37  package used to
perform factorization: petsc  total: nonzeros=913,
allocated nonzeros=913  total number of mallocs used during
MatSetValues calls =0not using I-node routines
linear system matrix = precond matrix:  Mat Object:   1 MPI
processestype: seqaijrows=37, cols=37
total: nonzeros=695, allocated nonzeros=695total number of
mallocs used during MatSetValues calls =0  not using I-node
routineslinear system matrix = precond matrix:Mat
Object: 384 MPI processes  type: mpiaij
rows=18145, cols=18145  total: nonzeros=1709115, allocated
nonzeros=1709115  total number of mallocs used during MatSetValues
calls =0not using I-node (on process 0) routinesDown solver
(pre-smoother) on level 1 ---  KSP
Object:  (mg_levels_1_)   384 MPI processestype:
chebyshev  Chebyshev: eigenvalue estimates:  min = 0.19, max =
1.46673  Chebyshev: eigenvalues estimated using gmres with
translations  [0. 0.1; 0. 1.1]  KSP Object:
(mg_levels_1_esteig_)   384 MPI processestype:
gmres  GMRES: restart=30, using Classical (unmodified)
Gram-Schmidt Orthogonalization with no iterative refinement
GMRES: happy breakdown tolerance 1e-30maximum iterations=10,
initial guess is zerotolerances:  relative=1e-12,
absolute=1e-50, divergence=1.left
preconditioningusing PRECONDITIONED norm type for convergence
test   

Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-06 Thread Mark Adams
On Tue, Apr 4, 2017 at 10:10 AM, Barry Smith  wrote:
>
>> Does this mean that GAMG works for the symmetrical matrix only?
>
>   No, it means that for non symmetric nonzero structure you need the extra 
> flag. So use the extra flag. The reason we don't always use the flag is 
> because it adds extra cost and isn't needed if the matrix already has a 
> symmetric nonzero structure.

BTW, if you have symmetric non-zero structure you can just set
-pc_gamg_threshold -1.0', note the "or" in the message.

If you want to mess with the threshold then you need to use the
symmetrized flag.


Re: [petsc-users] GAMG for the unsymmetrical matrix

2017-04-04 Thread Barry Smith

> Does this mean that GAMG works for the symmetrical matrix only?

  No, it means that for non symmetric nonzero structure you need the extra 
flag. So use the extra flag. The reason we don't always use the flag is because 
it adds extra cost and isn't needed if the matrix already has a symmetric 
nonzero structure.


  Barry

> On Apr 4, 2017, at 11:46 AM, Kong, Fande  wrote:
> 
> Hi All,
> 
> I am using GAMG to solve a group of coupled diffusion equations, but the 
> resulting matrix is not symmetrical. I got the following error messages:
> 
> 
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Have un-symmetric graph (apparently). Use '-pc_gamg_sym_graph 
> true' to symetrize the graph or '-pc_gamg_threshold -1.0' if the matrix is 
> structurally symmetric.
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html for 
> trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.5, unknown 
> [0]PETSC ERROR: /home/kongf/workhome/projects/yak/yak-opt on a 
> arch-linux2-c-opt named r2i2n0 by kongf Mon Apr  3 16:19:59 2017
> [0]PETSC ERROR: /home/kongf/workhome/projects/yak/yak-opt on a 
> arch-linux2-c-opt named r2i2n0 by kongf Mon Apr  3 16:19:59 2017
> [0]PETSC ERROR: #1 smoothAggs() line 462 in 
> /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/agg.c
> [0]PETSC ERROR: #2 PCGAMGCoarsen_AGG() line 998 in 
> /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/agg.c
> [0]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in 
> /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/gamg.c
> [0]PETSC ERROR: #3 PCSetUp_GAMG() line 571 in 
> /home/kongf/workhome/projects/petsc/src/ksp/pc/impls/gamg/gamg.c
> 
> Does this mean that GAMG works for the symmetrical matrix only?
> 
> Fande,



Re: [petsc-users] GAMG huge hash being requested

2017-02-21 Thread Lawrence Mitchell
Hi Justin,

On 21/02/17 06:01, Justin Chang wrote:
> Okay thanks

Now done.

Cheers,

Lawrence



signature.asc
Description: OpenPGP digital signature


Re: [petsc-users] GAMG huge hash being requested

2017-02-20 Thread Justin Chang
Okay thanks

On Sun, Feb 19, 2017 at 2:32 PM, Lawrence Mitchell <
lawrence.mitch...@imperial.ac.uk> wrote:

>
>
> > On 19 Feb 2017, at 18:55, Justin Chang  wrote:
> >
> > Okay, it doesn't seem like the Firedrake fork (which is what I am using)
> has this latest fix. Lawrence, when do you think it's possible you folks
> can incorporate these fixes
>
> I'll fast forward our branch pointer on Monday.
>
> Lawrence
>


Re: [petsc-users] GAMG huge hash being requested

2017-02-19 Thread Lawrence Mitchell


> On 19 Feb 2017, at 18:55, Justin Chang  wrote:
> 
> Okay, it doesn't seem like the Firedrake fork (which is what I am using) has 
> this latest fix. Lawrence, when do you think it's possible you folks can 
> incorporate these fixes

I'll fast forward our branch pointer on Monday. 

Lawrence 


Re: [petsc-users] GAMG huge hash being requested

2017-02-19 Thread Justin Chang
Okay, it doesn't seem like the Firedrake fork (which is what I am using)
has this latest fix. Lawrence, when do you think it's possible you folks
can incorporate these fixes?

On Sun, Feb 19, 2017 at 8:56 AM, Matthew Knepley  wrote:

> Satish fixed this error. I believe the fix is now in master.
>
>   Thanks,
>
>  Matt
>
> On Sun, Feb 19, 2017 at 3:05 AM, Justin Chang  wrote:
>
>> Hi all,
>>
>> So I am attempting to employ the DG1 finite element method on the poisson
>> equation using GAMG. When I attempt to solve a problem with roughly 4
>> million DOFs across 20 cores, i get this error:
>>
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> _solve_varproblem(*args, **kwargs)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 152, in _solve_varproblem
>> solver.solve()
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/variational_solver.py",
>> line 220, in solve
>> self.snes.solve(None, v)
>>   File "PETSc/SNES.pyx", line 537, in petsc4py.PETSc.SNES.solve
>> (src/petsc4py.PETSc.c:172359)
>> petsc4py.PETSc.Error: error code 63
>> [ 6] SNESSolve() line 4128 in /tmp/pip-FNpsya-build/src/snes
>> /interface/snes.c
>> [ 6] SNESSolve_KSPONLY() line 40 in /tmp/pip-FNpsya-build/src/snes
>> /impls/ksponly/ksponly.c
>> [ 6] KSPSolve() line 620 in /tmp/pip-FNpsya-build/src/ksp/
>> ksp/interface/itfunc.c
>> [ 6] KSPSetUp() line 393 in /tmp/pip-FNpsya-build/src/ksp/
>> ksp/interface/itfunc.c
>> [ 6] PCSetUp() line 968 in /tmp/pip-FNpsya-build/src/ksp/
>> pc/interface/precon.c
>> [ 6] PCSetUp_GAMG() line 524 in /tmp/pip-FNpsya-build/src/ksp/
>> pc/impls/gamg/gamg.c
>> [ 6] PCGAMGCoarsen_AGG() line 955 in /tmp/pip-FNpsya-build/src/ksp/
>> pc/impls/gamg/agg.c
>> [ 6] MatTransposeMatMult() line 9962 in /tmp/pip-FNpsya-build/src/mat/
>> interface/matrix.c
>> [ 6] MatTransposeMatMult_MPIAIJ_MPIAIJ() line 902 in
>> /tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
>> [ 6] MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() line 1676 in
>> /tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
>> [ 6] PetscTableCreate() line 52 in /tmp/pip-FNpsya-build/src/sys/
>> utils/ctable.c
>> [ 6] PetscTableCreateHashSize() line 28 in /tmp/pip-FNpsya-build/src/sys/
>> utils/ctable.c
>> [ 6] Argument out of range
>> [ 6] A really huge hash is being requested.. cannot process: 4096000
>> 
>> --
>> MPI_ABORT was invoked on rank 6 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> 
>> --
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> Traceback (most recent call last):
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>>   File "pFiredrake.py", line 109, in 
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> _solve_varproblem(*args, **kwargs)
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 152, in _solve_varproblem
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> 

Re: [petsc-users] GAMG huge hash being requested

2017-02-19 Thread Matthew Knepley
Satish fixed this error. I believe the fix is now in master.

  Thanks,

 Matt

On Sun, Feb 19, 2017 at 3:05 AM, Justin Chang  wrote:

> Hi all,
>
> So I am attempting to employ the DG1 finite element method on the poisson
> equation using GAMG. When I attempt to solve a problem with roughly 4
> million DOFs across 20 cores, i get this error:
>
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
> _solve_varproblem(*args, **kwargs)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 152, in _solve_varproblem
> solver.solve()
>   File "/home/jchang23/Software/firedrake/src/firedrake/
> firedrake/variational_solver.py", line 220, in solve
> self.snes.solve(None, v)
>   File "PETSc/SNES.pyx", line 537, in petsc4py.PETSc.SNES.solve
> (src/petsc4py.PETSc.c:172359)
> petsc4py.PETSc.Error: error code 63
> [ 6] SNESSolve() line 4128 in /tmp/pip-FNpsya-build/src/
> snes/interface/snes.c
> [ 6] SNESSolve_KSPONLY() line 40 in /tmp/pip-FNpsya-build/src/
> snes/impls/ksponly/ksponly.c
> [ 6] KSPSolve() line 620 in /tmp/pip-FNpsya-build/src/ksp/
> ksp/interface/itfunc.c
> [ 6] KSPSetUp() line 393 in /tmp/pip-FNpsya-build/src/ksp/
> ksp/interface/itfunc.c
> [ 6] PCSetUp() line 968 in /tmp/pip-FNpsya-build/src/ksp/
> pc/interface/precon.c
> [ 6] PCSetUp_GAMG() line 524 in /tmp/pip-FNpsya-build/src/ksp/
> pc/impls/gamg/gamg.c
> [ 6] PCGAMGCoarsen_AGG() line 955 in /tmp/pip-FNpsya-build/src/ksp/
> pc/impls/gamg/agg.c
> [ 6] MatTransposeMatMult() line 9962 in /tmp/pip-FNpsya-build/src/mat/
> interface/matrix.c
> [ 6] MatTransposeMatMult_MPIAIJ_MPIAIJ() line 902 in
> /tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
> [ 6] MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() line 1676 in
> /tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
> [ 6] PetscTableCreate() line 52 in /tmp/pip-FNpsya-build/src/sys/
> utils/ctable.c
> [ 6] PetscTableCreateHashSize() line 28 in /tmp/pip-FNpsya-build/src/sys/
> utils/ctable.c
> [ 6] Argument out of range
> [ 6] A really huge hash is being requested.. cannot process: 4096000
> --
> MPI_ABORT was invoked on rank 6 in communicator MPI_COMM_WORLD
> with errorcode 1.
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
> You may or may not see output from other processes, depending on
> exactly when Open MPI kills them.
> --
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
> Traceback (most recent call last):
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
>   File "pFiredrake.py", line 109, in 
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
> _solve_varproblem(*args, **kwargs)
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 152, in _solve_varproblem
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
> Traceback (most recent call last):
>   File "pFiredrake.py", line 109, in 
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
>   File 
> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
> solve(a==L,solution,options_prefix='fe_',solver_
> parameters=solver_params)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 122, in solve
> _solve_varproblem(*args, **kwargs)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 152, in _solve_varproblem
> _solve_varproblem(*args, **kwargs)
>   File "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
> line 152, in _solve_varproblem
> _solve_varproblem(*args, **kwargs)
>   File 

Re: [petsc-users] GAMG

2016-11-01 Thread Mark Adams
>
>
> The labeling is right, I re-checked. That's the funny part, I can't get
> GAMG to work with PCSetCoordinates (which BTW, I think its documentation
> does not address the issue of DOF ordering).
>

Yep, this needs to be made clear. I guess people do actually look at the
manual so I will add that.

As far as symmetry and BCs. If all dofs are set at a node, so the node is
all zero when you zero out BC's, then this should cause a problem in
parallel (you may need several processor to hit the problem). You really
should remove them from the matrix and adjust the RHS accordingly, but if
you set -pc_gamg_threshold X to a negative number (default is zero) and
keep the zeros in the matrix after zero out rows then it should work (the
graph algorithms will keep your zero node so the graph is symmetric).


Re: [petsc-users] GAMG

2016-10-31 Thread Jeremy Theler
On Mon, 2016-10-31 at 08:44 -0600, Jed Brown wrote:

> > After understanding Matt's point about the near nullspace (and reading
> > some interesting comments from Jed on scicomp stackexchange) I did built
> > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody()
> > because I found out by running the code the nullspace should be an
> > orthonormal basis, it should say so in the docs).
> 
> Where?
> "vecs - the vectors that span the null space (excluding the constant 
> vector); these vectors must be orthonormal."
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatNullSpaceCreate.html

ok, I might have passed on that but I started with 

http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatSetNearNullSpace.html

that says “attaches a null space to a matrix, which is often the null
space (rigid body modes) of the operator without boundary conditions
This null space will be used to provide near null space vectors to a
multigrid preconditioner built from this matrix.”

It wouldn't hurt to remind dumb users like me that “...it is often the
set of _orthonormalized_ rigid body modes...”


> And if you run in debug mode (default), as you always should until you
> are confident that your code is correct, MatNullSpaceCreate tests that
> your vectors are orthonormal.

That's how I realized I needed to normalize. Then I found
MatNullSpaceCreateRigidBody() and copied the code to orthogonalize.

Wouldn't it be better to orthonormalize inside MatSetNullSpace()? I bet
an orthogonalization from PETSc's code would beat any user-side code.

> > Now, there are some results I do not understand. I tried these six
> > combinations:
> >
> > order  near-nullspace   iterationsnorm
> > -  --   --
> > unknownexplicit 101.6e-6
> > unknownPCSetCoordinates 151.7e-7
> > unknownnone 152.4e-7
> > node   explicit fails with error -11
> > node   PCSetCoordinates fails with error -11
> > node   none 133.8e-7
> 
> Did you set a block size for the "node-based" orderings?  Are you sure
> the above is labeled correctly?  Anyway, PCSetCoordinates uses
> "node-based" ordering.  Implementation performance will generally be
> better with node-based ordering -- it has better memory streaming and
> cache behavior.

Yes. Indeed, when I save the stiffnes matrix as a binary file I get
a .info file that contains

-matload_block_size 3

The labeling is right, I re-checked. That's the funny part, I can't get
GAMG to work with PCSetCoordinates (which BTW, I think its documentation
does not address the issue of DOF ordering).

Any idea of what can be happening to me?


> The AIJ matrix format will also automatically do an "inode" optimization
> to reduce memory bandwidth and enable block smoothing (default
> configuration uses SOR smoothing).  You can use -mat_no_inode to try
> turning that off.


That option does not make any difference.

> 
> > Error -11 is 
> > PETSc's linear solver did not converge with reason
> > 'DIVERGED_PCSETUP_FAILED' (-11)
> Isn't there an actual error message?

Sorry, KSPGetConvergedReason() returns -11 and then my code prints that
error string. Find attached the output with -info.

Thanks
--
jeremy



[0] PetscInitialize(): PETSc successfully started: number of processors = 1
[0] PetscGetHostName(): Rejecting domainname, likely is NIS tom.(none)
[0] PetscInitialize(): Running on machine: tom
[0] SlepcInitialize(): SLEPc successfully started
697 3611
[0] PetscCommDuplicate(): Duplicating a communicator 2 2 max tags = 1
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 2091; storage space: 835506 
unneeded,74079 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
2091) < 0.6. Do not use CompressedRow routines.
[0] MatSeqAIJCheckInode(): Found 697 nodes of 2091. Limit used: 5. Using Inode 
routines
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 2091; storage space: 0 
unneeded,74079 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
2091) < 0.6. Do not use CompressedRow routines.
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 2091 X 2091; storage space: 0 
unneeded,74079 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 81
[0] MatCheckCompressedRow(): Found the ratio (num_zerorows 0)/(num_localrows 
2091) < 0.6. Do not use CompressedRow routines.
[0] PCSetUp(): Setting up PC for first time
[0] PCSetUp_GAMG(): level 0) N=2091, n data rows=3, n data cols=6, nnz/row 

Re: [petsc-users] GAMG

2016-10-31 Thread Jed Brown
"Kong, Fande"  writes:
> If the boundary values are not zero, no way to maintain symmetry unless we
> reduce the extra part of  the matrix. Not updating the columns is better in
> this situation.

The inhomogeneity of the boundary condition has nothing to do with operator 
symmetry.

I like this formulation for Dirichlet conditions.

https://scicomp.stackexchange.com/questions/3298/appropriate-space-for-weak-solutions-to-an-elliptical-pde-with-mixed-inhomogeneo/3300#3300


signature.asc
Description: PGP signature


Re: [petsc-users] GAMG

2016-10-31 Thread Matthew Knepley
On Mon, Oct 31, 2016 at 10:29 AM, Kong, Fande  wrote:

> On Mon, Oct 31, 2016 at 8:44 AM, Jed Brown  wrote:
>
>> Jeremy Theler  writes:
>>
>> > Hi again
>> >
>> > I have been wokring on these issues. Long story short: it is about the
>> > ordering of the unknown fields in the vector.
>> >
>> > Long story:
>> > The physics is linear elastic problem, you can see it does work with LU
>> > over a simple cube (warp the displacements to see it does represent an
>> > elastic problem, E=200e3, nu=0.3):
>> >
>> > https://caeplex.com/demo/results.php?id=5817146bdb561
>> >
>> >
>> > Say my three displacements (unknowns) are u,v,w. I can define the
>> > unknown vector as (is this called node-based ordering?)
>> >
>> > [u1 v1 w1 u2 v2 w2 ... un vn wn]^T
>> >
>> > Another option is (is this called unknown-based ordering?)
>> >
>> > [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T
>> >
>> >
>> > With lu/preonly the results are the same, although the stiffnes matrixes
>> > for each case are attached as PNGs. And of course, the near-nullspace
>> > vectors are different. So PCSetCoordinates() should work with one
>> > ordering and not with another one, an issue I did not take into
>> > consideration.
>> >
>> > After understanding Matt's point about the near nullspace (and reading
>> > some interesting comments from Jed on scicomp stackexchange) I did built
>> > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody()
>> > because I found out by running the code the nullspace should be an
>> > orthonormal basis, it should say so in the docs).
>>
>> Where?
>>
>> "vecs   - the vectors that span the null space (excluding the constant
>> vector); these vectors must be orthonormal."
>>
>> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages
>> /Mat/MatNullSpaceCreate.html
>>
>> And if you run in debug mode (default), as you always should until you
>> are confident that your code is correct, MatNullSpaceCreate tests that
>> your vectors are orthonormal.
>>
>> > Now, there are some results I do not understand. I tried these six
>> > combinations:
>> >
>> > order  near-nullspace   iterationsnorm
>> > -  --   --
>> > unknownexplicit 101.6e-6
>> > unknownPCSetCoordinates 151.7e-7
>> > unknownnone 152.4e-7
>> > node   explicit fails with error -11
>> > node   PCSetCoordinates fails with error -11
>> > node   none 133.8e-7
>>
>> Did you set a block size for the "node-based" orderings?  Are you sure
>> the above is labeled correctly?  Anyway, PCSetCoordinates uses
>> "node-based" ordering.  Implementation performance will generally be
>> better with node-based ordering -- it has better memory streaming and
>> cache behavior.
>>
>> The AIJ matrix format will also automatically do an "inode" optimization
>> to reduce memory bandwidth and enable block smoothing (default
>> configuration uses SOR smoothing).  You can use -mat_no_inode to try
>> turning that off.
>>
>> > Error -11 is
>> > PETSc's linear solver did not converge with reason
>> > 'DIVERGED_PCSETUP_FAILED' (-11)
>>
>> Isn't there an actual error message?
>>
>> > Any explanation (for dumbs)?
>> > Another thing to take into account: I am setting the dirichlet BCs with
>> > MatZeroRows(), but I am not updating the columns to keep symmetry. Can
>> > this pose a problem for GAMG?
>>
>> Usually minor, but it is better to maintain symmetry.
>>
>
> If the boundary values are not zero, no way to maintain symmetry unless we
> reduce the extra part of  the matrix. Not updating the columns is better
> in this situation.
>

?

You just eliminate the unknowns.

   Matt


>
> Fande,
>
>
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener


Re: [petsc-users] GAMG

2016-10-31 Thread Kong, Fande
On Mon, Oct 31, 2016 at 8:44 AM, Jed Brown  wrote:

> Jeremy Theler  writes:
>
> > Hi again
> >
> > I have been wokring on these issues. Long story short: it is about the
> > ordering of the unknown fields in the vector.
> >
> > Long story:
> > The physics is linear elastic problem, you can see it does work with LU
> > over a simple cube (warp the displacements to see it does represent an
> > elastic problem, E=200e3, nu=0.3):
> >
> > https://caeplex.com/demo/results.php?id=5817146bdb561
> >
> >
> > Say my three displacements (unknowns) are u,v,w. I can define the
> > unknown vector as (is this called node-based ordering?)
> >
> > [u1 v1 w1 u2 v2 w2 ... un vn wn]^T
> >
> > Another option is (is this called unknown-based ordering?)
> >
> > [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T
> >
> >
> > With lu/preonly the results are the same, although the stiffnes matrixes
> > for each case are attached as PNGs. And of course, the near-nullspace
> > vectors are different. So PCSetCoordinates() should work with one
> > ordering and not with another one, an issue I did not take into
> > consideration.
> >
> > After understanding Matt's point about the near nullspace (and reading
> > some interesting comments from Jed on scicomp stackexchange) I did built
> > my own vectors (I had to take a look at MatNullSpaceCreateRigidBody()
> > because I found out by running the code the nullspace should be an
> > orthonormal basis, it should say so in the docs).
>
> Where?
>
> "vecs   - the vectors that span the null space (excluding the constant
> vector); these vectors must be orthonormal."
>
> https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/
> MatNullSpaceCreate.html
>
> And if you run in debug mode (default), as you always should until you
> are confident that your code is correct, MatNullSpaceCreate tests that
> your vectors are orthonormal.
>
> > Now, there are some results I do not understand. I tried these six
> > combinations:
> >
> > order  near-nullspace   iterationsnorm
> > -  --   --
> > unknownexplicit 101.6e-6
> > unknownPCSetCoordinates 151.7e-7
> > unknownnone 152.4e-7
> > node   explicit fails with error -11
> > node   PCSetCoordinates fails with error -11
> > node   none 133.8e-7
>
> Did you set a block size for the "node-based" orderings?  Are you sure
> the above is labeled correctly?  Anyway, PCSetCoordinates uses
> "node-based" ordering.  Implementation performance will generally be
> better with node-based ordering -- it has better memory streaming and
> cache behavior.
>
> The AIJ matrix format will also automatically do an "inode" optimization
> to reduce memory bandwidth and enable block smoothing (default
> configuration uses SOR smoothing).  You can use -mat_no_inode to try
> turning that off.
>
> > Error -11 is
> > PETSc's linear solver did not converge with reason
> > 'DIVERGED_PCSETUP_FAILED' (-11)
>
> Isn't there an actual error message?
>
> > Any explanation (for dumbs)?
> > Another thing to take into account: I am setting the dirichlet BCs with
> > MatZeroRows(), but I am not updating the columns to keep symmetry. Can
> > this pose a problem for GAMG?
>
> Usually minor, but it is better to maintain symmetry.
>

If the boundary values are not zero, no way to maintain symmetry unless we
reduce the extra part of  the matrix. Not updating the columns is better in
this situation.

Fande,


Re: [petsc-users] GAMG

2016-10-31 Thread Jed Brown
Jeremy Theler  writes:

> Hi again
>
> I have been wokring on these issues. Long story short: it is about the
> ordering of the unknown fields in the vector.
>
> Long story:
> The physics is linear elastic problem, you can see it does work with LU
> over a simple cube (warp the displacements to see it does represent an
> elastic problem, E=200e3, nu=0.3):
>
> https://caeplex.com/demo/results.php?id=5817146bdb561
>
>
> Say my three displacements (unknowns) are u,v,w. I can define the
> unknown vector as (is this called node-based ordering?)
>
> [u1 v1 w1 u2 v2 w2 ... un vn wn]^T
>
> Another option is (is this called unknown-based ordering?)
>
> [u1 u2 ... un v1 v2 ... vn w1 w2 ... wn]^T
>
>
> With lu/preonly the results are the same, although the stiffnes matrixes
> for each case are attached as PNGs. And of course, the near-nullspace
> vectors are different. So PCSetCoordinates() should work with one
> ordering and not with another one, an issue I did not take into
> consideration.
>
> After understanding Matt's point about the near nullspace (and reading
> some interesting comments from Jed on scicomp stackexchange) I did built
> my own vectors (I had to take a look at MatNullSpaceCreateRigidBody()
> because I found out by running the code the nullspace should be an
> orthonormal basis, it should say so in the docs).

Where?

"vecs   - the vectors that span the null space (excluding the constant vector); 
these vectors must be orthonormal."

https://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Mat/MatNullSpaceCreate.html

And if you run in debug mode (default), as you always should until you
are confident that your code is correct, MatNullSpaceCreate tests that
your vectors are orthonormal.

> Now, there are some results I do not understand. I tried these six
> combinations:
>
> order  near-nullspace   iterationsnorm
> -  --   --
> unknownexplicit 101.6e-6
> unknownPCSetCoordinates 151.7e-7
> unknownnone 152.4e-7
> node   explicit fails with error -11
> node   PCSetCoordinates fails with error -11
> node   none 133.8e-7

Did you set a block size for the "node-based" orderings?  Are you sure
the above is labeled correctly?  Anyway, PCSetCoordinates uses
"node-based" ordering.  Implementation performance will generally be
better with node-based ordering -- it has better memory streaming and
cache behavior.

The AIJ matrix format will also automatically do an "inode" optimization
to reduce memory bandwidth and enable block smoothing (default
configuration uses SOR smoothing).  You can use -mat_no_inode to try
turning that off.

> Error -11 is 
> PETSc's linear solver did not converge with reason
> 'DIVERGED_PCSETUP_FAILED' (-11)

Isn't there an actual error message?

> Any explanation (for dumbs)?
> Another thing to take into account: I am setting the dirichlet BCs with
> MatZeroRows(), but I am not updating the columns to keep symmetry. Can
> this pose a problem for GAMG?

Usually minor, but it is better to maintain symmetry.


signature.asc
Description: PGP signature


Re: [petsc-users] GAMG

2016-10-28 Thread Mark Adams
>
>
>>
> AMG (the agglomeration kind) needs to know the near null space of your
> operator in order
> to work. You have an elasticity problem (I think), and if you take that
> operator without boundary
> conditions, the energy is invariant to translations and rotations. The
> space of translations and
> rotations is a 6D space (3 translations, 3 rotations). You need to express
> these in the basis for
> your problem (I assume linear elements, P1).
>

Actually, these vectors are purely geometric. If these rigid body modes are
not your kernel then you have a bad discretization or you are not doing 3D
elasticity.

Anyway, this reminds me that the problem goes away w/o the RBMs.  The fine
grid eigen estimate was large and will not be affected by the null space
business. The second grid had a huge eigenvalue and that could be affected
by the null space.

What is your Poisson ratio?


> This is what PCSetCoordinates() tries to do. Something
> is going wrong, but its hard for us to say what since I have no idea what
> your problem looks like.
> So you can make these vectors yourself and provide them to GAMG using
> MatSetNearNullSpace().
>
>Matt
>
>
>> --
>> jeremy
>>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] GAMG

2016-10-28 Thread Matthew Knepley
On Fri, Oct 28, 2016 at 8:38 AM, Jeremy Theler  wrote:

>
> >
> > If I do not call PCSetCoordinates() the error goes away but
> > convergence
> > is slow.
> > Is it possible that your coordinates lie on a 2D surface? All this
> > does is make the 6 basis vectors
> > for translations and rotations. You can just make these yourself and
> > call MatSetNearNullSpace()
> > and see what you get.
> >
> No, they do not lie on a 2D surface :-/
>
> Sorry but I did not get the point about the 6 basis vectors and
> MatSetNearNullSpace().
>

AMG (the agglomeration kind) needs to know the near null space of your
operator in order
to work. You have an elasticity problem (I think), and if you take that
operator without boundary
conditions, the energy is invariant to translations and rotations. The
space of translations and
rotations is a 6D space (3 translations, 3 rotations). You need to express
these in the basis for
your problem (I assume linear elements, P1). This is what
PCSetCoordinates() tries to do. Something
is going wrong, but its hard for us to say what since I have no idea what
your problem looks like.
So you can make these vectors yourself and provide them to GAMG using
MatSetNearNullSpace().

   Matt


> --
> jeremy
>
-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener


Re: [petsc-users] GAMG

2016-10-28 Thread Mark Adams
So this is a fully 3D problem, or is it a very flat disc? What is the worst
aspect ratio (or whatever) of an element, approximately. That is, is this a
bad mesh?

You might want to start with a simple problem like a cube. The eigen
estimates (Smooth P0: max eigen=1.09e+01) are huge and they are a lower
bound.

You might also try -gamg_est_ksp_max_it 50 and see if these eigen
estimates go up much. (and use -gamg_est_ksp_type cg). GAMG's eigen
estimates are working but I use  manufacture a seed vector, which is
slightly different than what Cheby does.

Also, what version of PETSc are you using? It would be best to use git to
clone the repository. This would give you maint or master branch which have
a fix for the cheby eigen estimator that your version might not have (use
-ksp_view and grep for "noisy" to see if you have an up to date version).


On Fri, Oct 28, 2016 at 10:12 AM, jeremy theler  wrote:

> I will try these options in a couple of hours (I have to go out now). I
> forgot to mention that the geometry has revolution symmetry around the z
> axis (just the geometry, not the problem because it has a non-symmetric
> temperature distribution).
> I am solving with only one proc, there are approx 50k nodes so 150k dofs.
> Thanks again.
>
> On Fri, Oct 28, 2016, 11:07 Mark Adams  wrote:
>
>> Also, try solving the problem with a one level iterative method and
>> Chebyshev, like:
>>
>> -ksp_type chebyshev
>> -pc_type jacobi
>>
>> It will take a long time to solve but I just want to see if it has the
>> same error.
>>
>>
>> On Fri, Oct 28, 2016 at 10:04 AM, Mark Adams  wrote:
>>
>> GAMG's eigen estimator worked but the values are very high.  You have
>> very low number of equations per processor, is this a thin body? Are the
>> elements badly stretched?
>>
>> Do this again with these parameters:
>>
>> -mg_levels_ksp_type chebyshev
>> -mg_levels_esteig_ksp_type cg
>> -mg_levels_esteig_ksp_max_it 10
>> ​​
>> -mg_levels_ksp_chebyshev_esteig 0,.1,0,1.05
>> -gamg_est_ksp_type cg
>>
>>
>> On Fri, Oct 28, 2016 at 9:48 AM, Jeremy Theler 
>> wrote:
>>
>> On Fri, 2016-10-28 at 09:46 -0400, Mark Adams wrote:
>> > Please run with -info and grep on GAMG.
>> >
>> [0] PCSetUp_GAMG(): level 0) N=120726, n data rows=3, n data cols=6,
>> nnz/row (ave)=41, np=1
>> [0] PCGAMGFilterGraph(): 99.904% nnz after filtering, with
>> threshold 0., 13.7468 nnz ave. (N=40242)
>> [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>> [0] PCGAMGProlongator_AGG(): New grid 1894 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.726852e+00
>> min=1.330683e-01 PC=jacobi
>> [0] PCSetUp_GAMG(): 1) N=11364, n data cols=6, nnz/row (ave)=196, 1
>> active pes
>> [0] PCGAMGFilterGraph(): 99.9839% nnz after filtering, with
>> threshold 0., 32.7656 nnz ave. (N=1894)
>> [0] PCGAMGProlongator_AGG(): New grid 155 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.09e+01
>> min=1.832878e-04 PC=jacobi
>> [0] PCSetUp_GAMG(): 2) N=930, n data cols=6, nnz/row (ave)=196, 1 active
>> pes
>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with
>> threshold 0., 32.7806 nnz ave. (N=155)
>> [0] PCGAMGProlongator_AGG(): New grid 9 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.116373e+00
>> min=6.337173e-03 PC=jacobi
>> [0] PCSetUp_GAMG(): 3) N=54, n data cols=6, nnz/row (ave)=34, 1 active
>> pes
>> [0] PCGAMGFilterGraph(): 100.% nnz after filtering, with
>> threshold 0., 5.7 nnz ave. (N=9)
>> [0] PCGAMGProlongator_AGG(): New grid 2 nodes
>> [0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.984549e+00
>> min=8.582767e-03 PC=jacobi
>> [0] PCSetUp_GAMG(): 4) N=12, n data cols=6, nnz/row (ave)=12, 1 active
>> pes
>> [0] PCSetUp_GAMG(): 5 levels, grid complexity = 1.48586
>> error: PETSc error 77-0 'Eigen estimator failed: DIVERGED_NANORINF at
>> iteration 0'
>> in /home/gtheler/libs/petsc-3.7.4/src/ksp/ksp/impls/cheby/cheby.c
>> KSPSolve_Chebyshev:440
>>
>>
>>
>> >
>> >
>> >
>> >
>>
>>
>>
>>
>>


  1   2   3   >