Re: [petsc-dev] Kokkos/Crusher perforance

Justin Chang Mon, 24 Jan 2022 11:57:35 -0800

My name has been called.

Mark, if you're having issues with Crusher, please contact Veronica Vergara
(vergar...@ornl.gov). You can cc me (justin.ch...@amd.com) in those emails


On Mon, Jan 24, 2022 at 1:49 PM Barry Smith <bsm...@petsc.dev> wrote:

>
>
> On Jan 24, 2022, at 2:46 PM, Mark Adams <mfad...@lbl.gov> wrote:
>
> Yea, CG/Jacobi is as close to a benchmark code as we could want. I could
> run this on one processor to get cleaner numbers.
>
> Is there a designated ECP technical support contact?
>
>
>    Mark, you've forgotten you work for DOE. There isn't a non-ECP
> technical support contact.
>
>    But if this is an AMD machine then maybe contact Matt's student Justin
> Chang?
>
>
>
>
>
> On Mon, Jan 24, 2022 at 2:18 PM Barry Smith <bsm...@petsc.dev> wrote:
>
>>
>>   I think you should contact the crusher ECP technical support team and
>> tell them you are getting dismel performance and ask if you should expect
>> better. Don't waste time flogging a dead horse.
>>
>> On Jan 24, 2022, at 2:16 PM, Matthew Knepley <knep...@gmail.com> wrote:
>>
>> On Mon, Jan 24, 2022 at 2:11 PM Junchao Zhang <junchao.zh...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Mon, Jan 24, 2022 at 12:55 PM Mark Adams <mfad...@lbl.gov> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Jan 24, 2022 at 1:38 PM Junchao Zhang <junchao.zh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Mark, I think you can benchmark individual vector operations, and once
>>>>> we get reasonable profiling results, we can move to solvers etc.
>>>>>
>>>>
>>>> Can you suggest a code to run or are you suggesting making a vector
>>>> benchmark code?
>>>>
>>> Make a vector benchmark code, testing vector operations that would be
>>> used in your solver.
>>> Also, we can run MatMult() to see if the profiling result is reasonable.
>>> Only once we get some solid results on basic operations, it is useful to
>>> run big codes.
>>>
>>
>> So we have to make another throw-away code? Why not just look at the
>> vector ops in Mark's actual code?
>>
>>    Matt
>>
>>
>>>
>>>>
>>>>>
>>>>> --Junchao Zhang
>>>>>
>>>>>
>>>>> On Mon, Jan 24, 2022 at 12:09 PM Mark Adams <mfad...@lbl.gov> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 24, 2022 at 12:44 PM Barry Smith <bsm...@petsc.dev>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Here except for VecNorm the GPU is used effectively in that most
>>>>>>> of the time is time is spent doing real work on the GPU
>>>>>>>
>>>>>>> VecNorm              402 1.0 4.4100e-01 6.1 1.69e+09 1.0 0.0e+00
>>>>>>> 0.0e+00 4.0e+02  0  1  0  0 20   9  1  0  0 33 30230   225393      0
>>>>>>> 0.00e+00    0 0.00e+00 100
>>>>>>>
>>>>>>> Even the dots are very effective, only the VecNorm flop rate over
>>>>>>> the full time is much much lower than the vecdot. Which is somehow due 
>>>>>>> to
>>>>>>> the use of the GPU or CPU MPI in the allreduce?
>>>>>>>
>>>>>>
>>>>>> The VecNorm GPU rate is relatively high on Crusher and the CPU rate
>>>>>> is about the same as the other vec ops. I don't know what to make of 
>>>>>> that.
>>>>>>
>>>>>> But Crusher is clearly not crushing it.
>>>>>>
>>>>>> Junchao: Perhaps we should ask Kokkos if they have any experience
>>>>>> with Crusher that they can share. They could very well find some low 
>>>>>> level
>>>>>> magic.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Jan 24, 2022, at 12:14 PM, Mark Adams <mfad...@lbl.gov> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Mark, can we compare with Spock?
>>>>>>>>
>>>>>>>
>>>>>>>  Looks much better. This puts two processes/GPU because there are
>>>>>>> only 4.
>>>>>>> <jac_out_001_kokkos_Spock_6_1_notpl.txt>
>>>>>>>
>>>>>>>
>>>>>>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> <http://www.cse.buffalo.edu/~knepley/>
>>
>>
>>
>

Re: [petsc-dev] Kokkos/Crusher perforance

Reply via email to