Mark, Fixed https://bitbucket.org/petsc/petsc/commits/68eacb73b84ae7f3fd7363217d47f23a8f967155
Run ex56 gives mpiexec -n 8 ./ex56 -ne 13 ... -h |grep via -mattransposematmult_via <scalable> Algorithmic approach (choose one of) scalable nonscalable matmatmult (MatTransposeMatMult) -matmatmult_via <nonscalable> Algorithmic approach (choose one of) scalable nonscalable hypre (MatMatMult) -matptap_via <nonscalable> Algorithmic approach (choose one of) scalable nonscalable hypre (MatPtAP) ... I'll merge it to master after regression tests. Hong On Thu, May 4, 2017 at 10:33 AM, Hong <[email protected]> wrote: > Mark: >> >> I am not seeing these options with -help ... >> > Hmm, this might be a bug - I'll check it. > Hong > > >> >> On Wed, May 3, 2017 at 10:05 PM, Hong <[email protected]> wrote: >> >>> I basically used 'runex56' and set '-ne' be compatible with np. >>> Then I used option >>> '-matptap_via scalable' >>> '-matptap_via hypre' >>> '-matptap_via nonscalable' >>> >>> I attached a job script below. >>> >>> In master branch, I set default as 'nonscalable' for small - medium size >>> matrices, and automatically switch to 'scalable' when matrix size gets >>> larger. >>> >>> Petsc solver uses MatPtAP, which does local RAP to reduce communication >>> and accelerate computation. >>> I suggest you simply use default setting. Let me know if you encounter >>> trouble. >>> >>> Hong >>> >>> job.ne174.n8.np125.sh: >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via scalable > >>> log.ne174.n8.np125.scalable >>> >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via hypre > >>> log.ne174.n8.np125.hypre >>> >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view -matptap_via nonscalable > >>> log.ne174.n8.np125.nonscalable >>> >>> runjob --np 125 -p 16 --block $COBALT_PARTNAME --verbose=INFO : ./ex56 >>> -ne 174 -alpha 1.e-3 -ksp_type cg -pc_type gamg -pc_gamg_agg_nsmooths 1 >>> -pc_gamg_reuse_interpolation true -ksp_converged_reason >>> -use_mat_nearnullspace -mg_levels_esteig_ksp_type cg >>> -mg_levels_esteig_ksp_max_it 10 -pc_gamg_square_graph 1 >>> -mg_levels_ksp_max_it 1 -mg_levels_ksp_type chebyshev >>> -mg_levels_ksp_chebyshev_esteig 0,0.2,0,1.05 -gamg_est_ksp_type cg >>> -gamg_est_ksp_max_it 10 -pc_gamg_asm_use_agg true -mg_levels_sub_pc_type lu >>> -mg_levels_pc_asm_overlap 0 -pc_gamg_threshold -0.01 >>> -pc_gamg_coarse_eq_limit 200 -pc_gamg_process_eq_limit 30 >>> -pc_gamg_repartition false -pc_mg_cycle_type v >>> -pc_gamg_use_parallel_coarse_grid_solver -mg_coarse_pc_type jacobi >>> -mg_coarse_ksp_type cg -ksp_monitor -log_view > log.ne174.n8.np125 >>> >>> On Wed, May 3, 2017 at 2:08 PM, Mark Adams <[email protected]> wrote: >>> >>>> Hong,the input files do not seem to be accessible. What are the command >>>> line option? (I don't see a "rap" or "scale" in the source). >>>> >>>> >>>> >>>> On Wed, May 3, 2017 at 12:17 PM, Hong <[email protected]> wrote: >>>> >>>>> Mark, >>>>> Below is the copy of my email sent to you on Feb 27: >>>>> >>>>> I implemented scalable MatPtAP and did comparisons of three >>>>> implementations using ex56.c on alcf cetus machine (this machine has >>>>> small memory, 1GB/core): >>>>> - nonscalable PtAP: use an array of length PN to do dense axpy >>>>> - scalable PtAP: do sparse axpy without use of PN array >>>>> - hypre PtAP. >>>>> >>>>> The results are attached. Summary: >>>>> - nonscalable PtAP is 2x faster than scalable, 8x faster than hypre >>>>> PtAP >>>>> - scalable PtAP is 4x faster than hypre PtAP >>>>> - hypre uses less memory (see job.ne399.n63.np1000.sh) >>>>> >>>>> Based on above observation, I set the default PtAP algorithm as >>>>> 'nonscalable'. >>>>> When PN > local estimated nonzero of C=PtAP, then switch default to >>>>> 'scalable'. >>>>> User can overwrite default. >>>>> >>>>> For the case of np=8000, ne=599 (see job.ne599.n500.np8000.sh), I get >>>>> MatPtAP 3.6224e+01 (nonscalable for small mats, >>>>> scalable for larger ones) >>>>> scalable MatPtAP 4.6129e+01 >>>>> hypre 1.9389e+02 >>>>> >>>>> This work in on petsc-master. Give it a try. If you encounter any >>>>> problem, let me know. >>>>> >>>>> Hong >>>>> >>>>> On Wed, May 3, 2017 at 10:01 AM, Mark Adams <[email protected]> wrote: >>>>> >>>>>> (Hong), what is the current state of optimizing RAP for scaling? >>>>>> >>>>>> Nate, is driving 3D elasticity problems at scaling with GAMG and we >>>>>> are working out performance problems. They are hitting problems at ~1.5B >>>>>> dof problems on a basic Cray (XC30 I think). >>>>>> >>>>>> Thanks, >>>>>> Mark >>>>>> >>>>> >>>>> >>>> >>> >> >
