Hi Mark, Sure, I will try a 3D Lid-driven case by combining OpenFOAM, PETSc and HYPRE, let's see what would happen.
Kind regards, Qi On Mon, Mar 28, 2022 at 11:04 PM Mark Adams <mfad...@lbl.gov> wrote: > Hi Qi, these are good discussions and data and we like to share, so let's > keep this on the list. > > * I would suggest you use a 3D test. This is more relevant to what HPC > applications do. > * In my experience, hypre's default parameters are tuned for 2D low order > problems like this so I would start with the defaults. I think they should > be fine for 3D also. > * As I think I said before we have an AMGx interface under development and > I heard yesterday that it should not be long until it is available. It > would be great if you could test that and we can work with the NVIDIA > developer to optimize it. We will let you know when its available. > > Cheers, > Mark > > > On Mon, Mar 28, 2022 at 10:44 AM Qi Yang <qiy...@oakland.edu> wrote: > >> Hi Mark and Barry, >> >> Really appreciate your explanation about the setup process, those days I >> tried to use the HYPRE amg solver to replace the original amg solver in >> PETSc. >> >> The solver settings of HYPRE are as follows: >> mpiexec -n 1 ./ex50 -da_grid_x 3000 -da_grid_y 3000 -ksp_type cg >> -pc_type hypre -pc_hypre_type boomeramg -pc_hypre_boomeramg_max_iter 1 >> -pc_hypre_boomeramg_strong_threshold 0.7 >> -pc_hypre_boomeramg_grid_sweeps_up 1 -pc_hypre_boomeramg_grid_sweeps_down 1 >> -pc_hypre_boomeramg_agg_nl 2 -pc_hypre_boomeramg_agg_num_paths 1 >> -pc_hypre_boomeramg_max_levels 25 *-pc_hypre_boomeramg_coarsen_type PMIS* >> -pc_hypre_boomeramg_interp_type ext+i -pc_hypre_boomeramg_P_max 2 >> -pc_hypre_boomeramg_truncfactor 0.2 -vec_type cuda -mat_type aijcusparse >> -ksp_monitor -ksp_view -log-view >> >> [image: PMIS.PNG] >> >> The interesting part is that I choose the coarsen type as PMIS, through >> the code, you can find only PMIS has GPU codes(Host and Device). >> * HYPRE does reduce the solution time from 20s to 8s >> * The memory mapping process is found inside the solver process, which >> causes several gaps in the following NVIDIA Nsight System profile, I am not >> sure what does it mean, >> [image: image.png] >> I am really interested to do some benchmarks by using hypre amg solver, >> actually, I already connected OpenFOAM, PETSc, HYPRE and AMGX together by >> using the API >> petsc4foam( >> https://develop.openfoam.com/modules/external-solver/-/tree/amgxwrapper/src/petsc4Foam), >> I prefer to use PETSc as the base matrix solver for possible HIP code >> implementation in the future, that way, I can compare the difference >> between NVIDIA and AMD GPU. It seems there are many benchmark cases I can >> do in the future. >> >> Regards, >> Qi >> >> >> >> >> On Wed, Mar 23, 2022 at 9:39 AM Mark Adams <mfad...@lbl.gov> wrote: >> >>> A few points, but first this is a nice start. If you are interested in >>> working on benchmarking that would be great. If so, read on. >>> >>> * Barry pointed out the SOR issues that are thrashing the memory system. >>> This solve would run faster on the CPU (maybe, 9M eqs is a lot). >>> * Most applications run for some time doing 100-1,000 and more solves >>> with one configuration and this amortizes the setup costs for each mesh. >>> What I call "mesh setup" cost. >>> * Many applications are nonlinear and use a full Newton solver that does >>> a "matrix setup" for each solve, but many applications can also amortize >>> this matrix setup (PtAP stuff in the output, which is small for 2D problem >>> but can be large for 3D problems) >>> * Now hypre's mesh setup is definitely better that GAMG's and AMGx is >>> out of this world. >>> - AMGx is the result of a serious development effort by NVIDIA about >>> 15 years ago with many 10's of NVIDIA developer years in it (I am guessing >>> but I know it was a serious effort for a few years) >>> + We are currently working with the current AMG developer, Matt, to >>> provide an AMGx interface in PETSc, like hypre (DOE does not like us >>> working with non-portable solvers but AMGx is very good) >>> * Hypre and AMGx use "classic" AMG, which is like geometric multigrid >>> (fast) for M-matrices (very low order Laplacians, like ex50). >>> * GAMG uses "smoothed aggregation" AMG because this algorithm has >>> better theoretical properties for high order and elasticity problems and >>> the algorithm's implementations and default parameters have been optimized >>> for these types of problems. >>> >>> It would be interesting to add Hypre to your study (Ex50) and add a high >>> order 3D elasticity problem (eg, snes/tests/ex13, or Jed Brown has some >>> nice elasticity problems). >>> If you are interested we can give you Hypre parameters for elasticity >>> problems. >>> I have no experience with AMGx on elasticity but the NVIDIA developer is >>> available and can be looped in. >>> For that matter we could bring the main hypre developer, Ruipeng, in as >>> well. >>> I would also suggest timing the setup (you can combine mesh and matrix >>> if you like) and solve phase separately. ex13 does this and we should find >>> another 5-point stencil example that does this if ex50 does not. >>> >>> BTW, I have been intending to write a benchmarking paper this year with >>> Matt and Ruipeng, but I am just not getting around to it ... >>> If you want to lead a paper and the experiments, we can help optimize >>> and tune our solvers, setup tests, write background material, etc. >>> >>> Cheers, >>> Mark >>> >>> >>> >>> >>> >>> >>> >>> >>> On Tue, Mar 22, 2022 at 12:30 PM Barry Smith <bsm...@petsc.dev> wrote: >>> >>>> >>>> Indeed PCSetUp is taking most of the time (79%). In the version of >>>> PETSc you are running it is doing a great deal of the setup work on the >>>> CPU. You can see there is a lot of data movement between the CPU and GPU >>>> (in both directions) during the setup; 64 1.91e+03 54 1.21e+03 90 >>>> >>>> Clearly, we need help in porting all the parts of the GAMG setup that >>>> still occur on the CPU to the GPU. >>>> >>>> Barry >>>> >>>> >>>> >>>> >>>> On Mar 22, 2022, at 12:07 PM, Qi Yang <qiy...@oakland.edu> wrote: >>>> >>>> Dear Barry, >>>> >>>> Your advice is helpful, now the total time reduce from 30s to 20s(now >>>> all matrix run on gpu), actually I have tried other settings for amg >>>> predicontioner, seems not help that a lot, like -pc_gamg_threshold 0.05 >>>> -pc_gamg_threshold_scale 0.5. >>>> it seems the key point is the PCSetup process, from the log, it takes >>>> the most time, and we can find from the new nsight system analysis, there >>>> is a big gap before the ksp solver starts, seems like the PCSetup process, >>>> not sure, am I right? >>>> <3.png> >>>> >>>> PCSetUp 2 1.0 1.5594e+01 1.0 3.06e+09 1.0 0.0e+00 >>>> 0.0e+00 0.0e+00 79 78 0 0 0 79 78 0 0 0 196 8433 64 >>>> 1.91e+03 54 1.21e+03 90 >>>> >>>> >>>> Regards, >>>> Qi >>>> >>>> On Tue, Mar 22, 2022 at 10:44 PM Barry Smith <bsm...@petsc.dev> wrote: >>>> >>>>> >>>>> It is using >>>>> >>>>> MatSOR 369 1.0 9.1214e+00 1.0 7.32e+09 1.0 0.0e+00 >>>>> 0.0e+00 0.0e+00 29 27 0 0 0 29 27 0 0 0 803 0 0 >>>>> 0.00e+00 565 1.35e+03 0 >>>>> >>>>> which runs on the CPU not the GPU hence the large amount of time in >>>>> memory copies and poor performance. We are switching the default to be >>>>> Chebyshev/Jacobi which runs completely on the GPU (may already be switched >>>>> in the main branch). >>>>> >>>>> You can run with -mg_levels_pc_type jacobi You should then see almost >>>>> the entire solver running on the GPU. >>>>> >>>>> You may need to tune the number of smoothing steps or other parameters >>>>> of GAMG to get the faster solution time. >>>>> >>>>> Barry >>>>> >>>>> >>>>> On Mar 22, 2022, at 10:30 AM, Qi Yang <qiy...@oakland.edu> wrote: >>>>> >>>>> To whom it may concern, >>>>> >>>>> I have tried petsc ex50(Possion) with cuda, ksp cg solver and >>>>> gamg precondition, however, it run for about 30s. I also tried NVIDIA AMGX >>>>> with the same solver and same grid (3000*3000), it only took 2s. I used >>>>> nsight system software to analyze those two cases, found petsc took much >>>>> time in the memory process (63% of total time, however, amgx only took >>>>> 19%). Attached are screenshots of them. >>>>> >>>>> The petsc command is : mpiexec -n 1 ./ex50 -da_grid_x 3000 -da_grid_y >>>>> 3000 -ksp_type cg -pc_type gamg -pc_gamg_type agg -pc_gamg_agg_nsmooths 1 >>>>> -vec_type cuda -mat_type aijcusparse -ksp_monitor -ksp_view -log-view >>>>> >>>>> The log file is also attached. >>>>> >>>>> Regards, >>>>> Qi >>>>> >>>>> <1.png> >>>>> <2.png> >>>>> <log.PETSc_cg_amg_ex50_gpu_cuda> >>>>> >>>>> >>>>> <log.PETSc_cg_amg_jacobi_ex50_gpu_cuda> >>>> >>>> >>>>