On Wed, Jan 18, 2023 at 6:03 PM Mark Lohry <[email protected]> wrote: > Thanks Mark, I'll try the kokkos bit. Any other suggestions for minimizing > memory besides the obvious use less levels? > > Unfortunately Jacobi does poorly compared to ILU on these systems. > > I'm seeing grid complexity 1.48 and operator complexity 1.75 with > pc_gamg_square_graph 0, and 1.15/1.25 with it at 1. >
That looks good. Use 1. > Additionally the convergence rate is pretty healthy with 5 gmres+asm > smooths but very bad with 5 Richardson+asm. > > Yea, it needs to be damped and GMRES does that automatically. > > On Wed, Jan 18, 2023, 4:48 PM Mark Adams <[email protected]> wrote: > >> cusparse matrix triple product takes a lot of memory. We usually use >> Kokkos, configured with TPL turned off. >> >> If you have a complex problem different parts of the domain can coarsen >> at different rates. >> Jacobi instead of asm will save a fair amount od memory. >> If you run with -ksp_view you will see operator/matrix complexity from >> GAMG. These should be < 1.5, >> >> Mark >> >> On Wed, Jan 18, 2023 at 3:42 PM Mark Lohry <[email protected]> wrote: >> >>> With asm I see a range of 8GB-13GB, slightly smaller ratio but that >>> probably explains it (does this still seem like a lot of memory to you for >>> the problem size?) >>> >>> In general I don't have the same number of blocks per row, so I suppose >>> it makes sense there's some memory imbalance. >>> >>> >>> >>> On Wed, Jan 18, 2023 at 3:35 PM Mark Adams <[email protected]> wrote: >>> >>>> Can your problem have load imbalance? >>>> >>>> You might try '-pc_type asm' (and/or jacobi) to see your baseline load >>>> imbalance. >>>> GAMG can add some load imbalance but start by getting a baseline. >>>> >>>> Mark >>>> >>>> On Wed, Jan 18, 2023 at 2:54 PM Mark Lohry <[email protected]> wrote: >>>> >>>>> Q0) does -memory_view trace GPU memory as well, or is there another >>>>> method to query the peak device memory allocation? >>>>> >>>>> Q1) I'm loading a aijcusparse matrix with MatLoad, and running with >>>>> -ksp_type fgmres -pc_type gamg -mg_levels_pc_type asm with mat info >>>>> 27,142,948 rows and cols, bs=4, total nonzeros 759,709,392. Using 8 ranks >>>>> on 8x80GB GPUs, and during the setup phase before crashing with >>>>> CUSPARSE_STATUS_INSUFFICIENT_RESOURCES nvidia-smi shows the below pasted >>>>> content. >>>>> >>>>> GPU memory usage spanning from 36GB-50GB but with one rank at 77GB. Is >>>>> this expected? Do I need to manually repartition this somehow? >>>>> >>>>> Thanks, >>>>> Mark >>>>> >>>>> >>>>> >>>>> +-----------------------------------------------------------------------------+ >>>>> >>>>> | Processes: >>>>> | >>>>> >>>>> | GPU GI CI PID Type Process name GPU >>>>> Memory | >>>>> >>>>> | ID ID >>>>> Usage | >>>>> >>>>> >>>>> |=============================================================================| >>>>> >>>>> | 0 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 0 N/A N/A 1696543 C ./petsc_solver_test >>>>> 38407MiB | >>>>> >>>>> | 0 N/A N/A 1696544 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696545 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696546 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696548 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696550 C ./petsc_solver_test >>>>> 471MiB | >>>>> >>>>> | 0 N/A N/A 1696551 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 0 N/A N/A 1696552 C ./petsc_solver_test >>>>> 467MiB | >>>>> >>>>> | 1 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 1 N/A N/A 1696544 C ./petsc_solver_test >>>>> 35849MiB | >>>>> >>>>> | 2 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 2 N/A N/A 1696545 C ./petsc_solver_test >>>>> 36719MiB | >>>>> >>>>> | 3 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 3 N/A N/A 1696546 C ./petsc_solver_test >>>>> 37343MiB | >>>>> >>>>> | 4 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 4 N/A N/A 1696548 C ./petsc_solver_test >>>>> 36935MiB | >>>>> >>>>> | 5 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 5 N/A N/A 1696550 C ./petsc_solver_test >>>>> 49953MiB | >>>>> >>>>> | 6 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 6 N/A N/A 1696551 C ./petsc_solver_test >>>>> 47693MiB | >>>>> >>>>> | 7 N/A N/A 1630309 C >>>>> nvidia-cuda-mps-server 27MiB | >>>>> >>>>> | 7 N/A N/A 1696552 C ./petsc_solver_test >>>>> 77331MiB | >>>>> >>>>> >>>>> +-----------------------------------------------------------------------------+ >>>>> >>>>
