Mark, I re-do the -pc_type hypre experiment without openmp. Now the job finishes instead of running out of time. I have results with 216 processors (see below). The 1728-processor job is still in the queue so I don't know how it scales. But for the 216-processor one, the execution time is 245 seconds. With -pc_type gamg, the time is 107 seconds. My options are
-ksp_norm_type unpreconditioned -ksp_rtol 1E-6 -ksp_type cg -log_view -mesh_size 1E-4 -mg_levels_esteig_ksp_max_it 10 -mg_levels_esteig_ksp_type cg -mg_levels_ksp_max_it 1 -mg_levels_ksp_norm_type none -mg_levels_ksp_type richardson -mg_levels_pc_sor_its 1 -mg_levels_pc_type sor -nodes_per_proc 30 -pc_type hypre It is a 7-point stencil code. Do you know other hypre options that I can try to improve it? Thanks. --- Event Stage 2: Remaining Solves KSPSolve 1000 1.0 2.4574e+02 1.0 4.48e+09 1.0 7.6e+06 7.2e+03 2.0e+04 97100100100100 100100100100100 3928 VecTDot 12000 1.0 6.5646e+00 2.2 6.48e+08 1.0 0.0e+00 0.0e+00 1.2e+04 2 14 0 0 60 2 14 0 0 60 21321 VecNorm 8000 1.0 9.7144e-01 1.2 4.32e+08 1.0 0.0e+00 0.0e+00 8.0e+03 0 10 0 0 40 0 10 0 0 40 96055 VecCopy 1000 1.0 7.9706e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 6000 1.0 1.7941e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 12000 1.0 7.5738e-01 1.2 6.48e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 14 0 0 0 0 14 0 0 0 184806 VecAYPX 6000 1.0 4.6802e-01 1.3 2.97e+08 1.0 0.0e+00 0.0e+00 0.0e+00 0 7 0 0 0 0 7 0 0 0 137071 VecScatterBegin 7000 1.0 4.7924e-01 2.3 0.00e+00 0.0 7.6e+06 7.2e+03 0.0e+00 0 0100100 0 0 0100100 0 0 VecScatterEnd 7000 1.0 7.9303e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatMult 7000 1.0 6.0762e+00 1.1 2.46e+09 1.0 7.6e+06 7.2e+03 0.0e+00 2 55100100 0 2 55100100 0 86894 PCApply 6000 1.0 2.3429e+02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 92 0 0 0 0 95 0 0 0 0 0 --Junchao Zhang On Thu, Jun 14, 2018 at 5:45 PM, Junchao Zhang <[email protected]> wrote: > I tested -pc_gamg_repartition with 216 processors again. First I tested > with these options > > -log_view \ > -ksp_rtol 1E-6 \ > -ksp_type cg \ > -ksp_norm_type unpreconditioned \ > -mg_levels_ksp_type richardson \ > -mg_levels_ksp_norm_type none \ > -mg_levels_pc_type sor \ > -mg_levels_ksp_max_it 1 \ > -mg_levels_pc_sor_its 1 \ > -mg_levels_esteig_ksp_type cg \ > -mg_levels_esteig_ksp_max_it 10 \ > -pc_type gamg \ > -pc_gamg_type agg \ > -pc_gamg_threshold 0.05 \ > -pc_gamg_type classical \ > -gamg_est_ksp_type cg \ > -pc_gamg_square_graph 10 \ > -pc_gamg_threshold 0.0 > > > then I tested with an extra -pc_gamg_repartition. With repartition, the > time increased from 120s to 140s. The code measures first KSPSolve and > the remaining in separate stages, so the repartition time was not counted > in the stage of interest. Actually, log_view says GMAG :repartition time > (in the first event stage) is about 1.5 sec., so it is not a big deal. I > also tested -pc_gamg_square_graph 4. It did not change the time. > I tested hypre with options "-log_view -ksp_rtol 1E-6 -ksp_type cg > -ksp_norm_type unpreconditioned -pc_type hypre" and nothing else. The code > ran out of time. In old tests, a job (1000 KSPSolve with 7 KSP iterations > each) took 4 minutes. With hypre, 1 KSPSolve + 6 KSP iterations each, takes > 6 minutes. > I will test and profile the code on a single node, and apply some > vecscatter optimizations I recently did to see what happens. > > > --Junchao Zhang > > On Thu, Jun 14, 2018 at 11:03 AM, Mark Adams <[email protected]> wrote: > >> And with 7-point stensils and no large material discontinuities you >> probably want -pc_gamg_square_graph 10 -pc_gamg_threshold 0.0 and you >> could test the square graph parameter (eg, 1,2,3,4). >> >> And I would definitely test hypre. >> >> On Thu, Jun 14, 2018 at 8:54 AM Mark Adams <[email protected]> wrote: >> >>> >>>> Just -pc_type hypre instead of -pc_type gamg. >>>> >>>> >>> And you need to have configured PETSc with hypre. >>> >>> >> >
