So the MPI is killing you in going from 8 to 64. (The GPU flop rate scales
almost perfectly, but the overall flop rate is only half of what it should be
at 64).
> On Jan 25, 2022, at 9:24 PM, Mark Adams wrote:
>
> It looks like we have our instrumentation and job configuration in decent
>
>
>
> Note that Mark's logs have been switching back and forth between
> -use_gpu_aware_mpi and changing number of ranks -- we won't have that
> information if we do manual timing hacks. This is going to be a routine
> thing we'll need on the mailing list and we need the provenance to go with
> it.
Here are two runs, without and with -log_view, respectively.
My new timer is "Solve time = "
About 10% difference
On Tue, Jan 25, 2022 at 12:53 PM Mark Adams wrote:
> BTW, a -device_view would be great.
>
> On Tue, Jan 25, 2022 at 12:30 PM Mark Adams wrote:
>
>>
>>
>> On Tue, Jan 25, 2022 a
Barry Smith writes:
>> What is the command line option to turn
>> PetscLogGpuTimeBegin/PetscLogGpuTimeEnd into a no-op even when -log_view is
>> on? I know it'll mess up attribution, but it'll still tell us how long the
>> solve took.
>
> We don't have an API for this yet. It is slightly tri
BTW, a -device_view would be great.
On Tue, Jan 25, 2022 at 12:30 PM Mark Adams wrote:
>
>
> On Tue, Jan 25, 2022 at 11:56 AM Jed Brown wrote:
>
>> Barry Smith writes:
>>
>> > Thanks Mark, far more interesting. I've improved the formatting to
>> make it easier to read (and fixed width font f
> On Jan 25, 2022, at 12:25 PM, Jed Brown wrote:
>
> Barry Smith writes:
>
>>> On Jan 25, 2022, at 11:55 AM, Jed Brown wrote:
>>>
>>> Barry Smith writes:
>>>
Thanks Mark, far more interesting. I've improved the formatting to make it
easier to read (and fixed width font for ema
On Tue, Jan 25, 2022 at 11:56 AM Jed Brown wrote:
> Barry Smith writes:
>
> > Thanks Mark, far more interesting. I've improved the formatting to
> make it easier to read (and fixed width font for email reading)
> >
> > * Can you do same run with say 10 iterations of Jacobi PC?
> >
> > * PC
Barry Smith writes:
>> On Jan 25, 2022, at 11:55 AM, Jed Brown wrote:
>>
>> Barry Smith writes:
>>
>>> Thanks Mark, far more interesting. I've improved the formatting to make it
>>> easier to read (and fixed width font for email reading)
>>>
>>> * Can you do same run with say 10 iteration
> On Jan 25, 2022, at 11:55 AM, Jed Brown wrote:
>
> Barry Smith writes:
>
>> Thanks Mark, far more interesting. I've improved the formatting to make it
>> easier to read (and fixed width font for email reading)
>>
>> * Can you do same run with say 10 iterations of Jacobi PC?
>>
>> * P
Barry Smith writes:
> Thanks Mark, far more interesting. I've improved the formatting to make it
> easier to read (and fixed width font for email reading)
>
> * Can you do same run with say 10 iterations of Jacobi PC?
>
> * PCApply performance (looks like GAMG) is terrible! Problems too sm
Thanks Mark, far more interesting. I've improved the formatting to make it
easier to read (and fixed width font for email reading)
* Can you do same run with say 10 iterations of Jacobi PC?
* PCApply performance (looks like GAMG) is terrible! Problems too small?
* VecScatter time is com
>
>
>
> > VecPointwiseMult 201 1.0 1.0471e-02 1.1 3.09e+08 1.0 0.0e+00 0.0e+00
> 0.0e+00 0 1 0 0 0 1 1 0 0 0 235882 290088 0 0.00e+000
> 0.00e+00 100
> > VecScatterBegin 200 1.0 1.8458e-01 1.1 0.00e+00 0.0 1.1e+04 6.6e+04
> 1.0e+00 2 0 99 79 0 19 0100100 0 0
Mark Adams writes:
> adding Suyash,
>
> I found the/a problem. Using ex56, which has a crappy decomposition, using
> one MPI process/GPU is much faster than using 8 (64 total). (I am looking
> at ex13 to see how much of this is due to the decomposition)
> If you only use 8 processes it seems that
adding Suyash,
I found the/a problem. Using ex56, which has a crappy decomposition, using
one MPI process/GPU is much faster than using 8 (64 total). (I am looking
at ex13 to see how much of this is due to the decomposition)
If you only use 8 processes it seems that all 8 are put on the first GPU,
14 matches
Mail list logo