Re: [petsc-users] Help with input construction hang on 2-GPU CG Solve

2022-12-17 Thread Rohan Yadav
Thanks Jed. I had tried just over-preallocating the matrix (using 10 nnz
per row) and that solved the problem. I'm not sure what was wrong with my
initial preallocation, but it's probably likely that things weren't hanging
but just moving very slowly.

Rohan

On Sat, Dec 17, 2022 at 9:57 PM Jed Brown  wrote:

> I ran your code successfully with and without GPU-aware MPI. I see a bit
> of time in MatSetValue -- you can make it a bit faster using one
> MatSetValues call per row, but it's typical that assembling a matrix like
> this (sequentially on the host) will be more expensive than some
> unpreconditioned CG iterations (that don't come close to solving the
> problem -- use multigrid if you want to actually solve this problem).
>
> Rohan Yadav  writes:
>
> > Hi,
> >
> > I'm developing a microbenchmark that runs a CG solve using PETSc on a
> mesh
> > using a 5-point stencil matrix. My code (linked here:
> > https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp, only
> 120
> > lines) works on 1 GPU and has great performance. When I move to 2 GPUs,
> the
> > program appears to get stuck in the input generation. I've literred the
> > code with print statements and have found out the following clues:
> >
> > * The first rank progresses through this loop:
> > https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp#L44,
> but
> > then does not exit (it seems to get stuck right before rowStart ==
> rowEnd)
> > * The second rank makes very few iterations through the loop for its
> > allotted rows.
> >
> > Therefore, neither rank makes it to the call to MatAssemblyBegin.
> >
> > I'm running the code using the following command line on the Summit
> > supercomputer:
> > ```
> > jsrun -n 2 -g 1 -c 1 -b rs -r 2
> > /gpfs/alpine/scratch/rohany/csc335/petsc-pde-benchmark/main -ksp_max_it
> 200
> > -ksp_type cg -pc_type none -ksp_atol 1e-10 -ksp_rtol 1e-10 -vec_type cuda
> > -mat_type aijcusparse -use_gpu_aware_mpi 0 -nx 8485 -ny 8485
> > ```
> >
> > Any suggestions will be appreciated! I feel that I have applied many of
> the
> > common petsc optimizations of preallocating my matrix row counts, so I'm
> > not sure what's going on with this input generation.
> >
> > Thanks,
> >
> > Rohan Yadav
>


Re: [petsc-users] Help with input construction hang on 2-GPU CG Solve

2022-12-17 Thread Jed Brown
I ran your code successfully with and without GPU-aware MPI. I see a bit of 
time in MatSetValue -- you can make it a bit faster using one MatSetValues call 
per row, but it's typical that assembling a matrix like this (sequentially on 
the host) will be more expensive than some unpreconditioned CG iterations (that 
don't come close to solving the problem -- use multigrid if you want to 
actually solve this problem).

Rohan Yadav  writes:

> Hi,
>
> I'm developing a microbenchmark that runs a CG solve using PETSc on a mesh
> using a 5-point stencil matrix. My code (linked here:
> https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp, only 120
> lines) works on 1 GPU and has great performance. When I move to 2 GPUs, the
> program appears to get stuck in the input generation. I've literred the
> code with print statements and have found out the following clues:
>
> * The first rank progresses through this loop:
> https://github.com/rohany/petsc-pde-benchmark/blob/main/main.cpp#L44, but
> then does not exit (it seems to get stuck right before rowStart == rowEnd)
> * The second rank makes very few iterations through the loop for its
> allotted rows.
>
> Therefore, neither rank makes it to the call to MatAssemblyBegin.
>
> I'm running the code using the following command line on the Summit
> supercomputer:
> ```
> jsrun -n 2 -g 1 -c 1 -b rs -r 2
> /gpfs/alpine/scratch/rohany/csc335/petsc-pde-benchmark/main -ksp_max_it 200
> -ksp_type cg -pc_type none -ksp_atol 1e-10 -ksp_rtol 1e-10 -vec_type cuda
> -mat_type aijcusparse -use_gpu_aware_mpi 0 -nx 8485 -ny 8485
> ```
>
> Any suggestions will be appreciated! I feel that I have applied many of the
> common petsc optimizations of preallocating my matrix row counts, so I'm
> not sure what's going on with this input generation.
>
> Thanks,
>
> Rohan Yadav