Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Barry Smith
So the MPI is killing you in going from 8 to 64. (The GPU flop rate scales almost perfectly, but the overall flop rate is only half of what it should be at 64). > On Jan 25, 2022, at 9:24 PM, Mark Adams wrote: > > It looks like we have our instrumentation and job configuration in decent >

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
> > > Note that Mark's logs have been switching back and forth between > -use_gpu_aware_mpi and changing number of ranks -- we won't have that > information if we do manual timing hacks. This is going to be a routine > thing we'll need on the mailing list and we need the provenance to go with > it.

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
Here are two runs, without and with -log_view, respectively. My new timer is "Solve time = " About 10% difference On Tue, Jan 25, 2022 at 12:53 PM Mark Adams wrote: > BTW, a -device_view would be great. > > On Tue, Jan 25, 2022 at 12:30 PM Mark Adams wrote: > >> >> >> On Tue, Jan 25, 2022 a

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Jed Brown
Barry Smith writes: >> What is the command line option to turn >> PetscLogGpuTimeBegin/PetscLogGpuTimeEnd into a no-op even when -log_view is >> on? I know it'll mess up attribution, but it'll still tell us how long the >> solve took. > > We don't have an API for this yet. It is slightly tri

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
BTW, a -device_view would be great. On Tue, Jan 25, 2022 at 12:30 PM Mark Adams wrote: > > > On Tue, Jan 25, 2022 at 11:56 AM Jed Brown wrote: > >> Barry Smith writes: >> >> > Thanks Mark, far more interesting. I've improved the formatting to >> make it easier to read (and fixed width font f

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Barry Smith
> On Jan 25, 2022, at 12:25 PM, Jed Brown wrote: > > Barry Smith writes: > >>> On Jan 25, 2022, at 11:55 AM, Jed Brown wrote: >>> >>> Barry Smith writes: >>> Thanks Mark, far more interesting. I've improved the formatting to make it easier to read (and fixed width font for ema

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
On Tue, Jan 25, 2022 at 11:56 AM Jed Brown wrote: > Barry Smith writes: > > > Thanks Mark, far more interesting. I've improved the formatting to > make it easier to read (and fixed width font for email reading) > > > > * Can you do same run with say 10 iterations of Jacobi PC? > > > > * PC

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Jed Brown
Barry Smith writes: >> On Jan 25, 2022, at 11:55 AM, Jed Brown wrote: >> >> Barry Smith writes: >> >>> Thanks Mark, far more interesting. I've improved the formatting to make it >>> easier to read (and fixed width font for email reading) >>> >>> * Can you do same run with say 10 iteration

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Barry Smith
> On Jan 25, 2022, at 11:55 AM, Jed Brown wrote: > > Barry Smith writes: > >> Thanks Mark, far more interesting. I've improved the formatting to make it >> easier to read (and fixed width font for email reading) >> >> * Can you do same run with say 10 iterations of Jacobi PC? >> >> * P

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Jed Brown
Barry Smith writes: > Thanks Mark, far more interesting. I've improved the formatting to make it > easier to read (and fixed width font for email reading) > > * Can you do same run with say 10 iterations of Jacobi PC? > > * PCApply performance (looks like GAMG) is terrible! Problems too sm

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Barry Smith
Thanks Mark, far more interesting. I've improved the formatting to make it easier to read (and fixed width font for email reading) * Can you do same run with say 10 iterations of Jacobi PC? * PCApply performance (looks like GAMG) is terrible! Problems too small? * VecScatter time is com

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
> > > > > VecPointwiseMult 201 1.0 1.0471e-02 1.1 3.09e+08 1.0 0.0e+00 0.0e+00 > 0.0e+00 0 1 0 0 0 1 1 0 0 0 235882 290088 0 0.00e+000 > 0.00e+00 100 > > VecScatterBegin 200 1.0 1.8458e-01 1.1 0.00e+00 0.0 1.1e+04 6.6e+04 > 1.0e+00 2 0 99 79 0 19 0100100 0 0

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Jed Brown
Mark Adams writes: > adding Suyash, > > I found the/a problem. Using ex56, which has a crappy decomposition, using > one MPI process/GPU is much faster than using 8 (64 total). (I am looking > at ex13 to see how much of this is due to the decomposition) > If you only use 8 processes it seems that

Re: [petsc-dev] Kokkos/Crusher perforance

2022-01-25 Thread Mark Adams
adding Suyash, I found the/a problem. Using ex56, which has a crappy decomposition, using one MPI process/GPU is much faster than using 8 (64 total). (I am looking at ex13 to see how much of this is due to the decomposition) If you only use 8 processes it seems that all 8 are put on the first GPU,