On Fri, Feb 21, 2020 at 4:38 PM Mark Adams <mfad...@lbl.gov> wrote:

>
>
> On Fri, Feb 21, 2020 at 4:51 PM Junchao Zhang via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>> Hello,
>>
>> I want to evaluate MatMult on GPU.  I took a 2M x 2M matrix and ran with
>> 6 mpi ranks and 6 GPUs.  It took about 0.9 seconds.  A kernel launch or a
>> stream synchronization took about 10us.
>>
>
> Your call, but you should run the code once and then run it in a new
> timer. I've seen some big "warm up costs" GPUs today.
>

Yes, I usually run hundreds iterations and skip the first few. Thanks.

>
>
>> Compared with MatMult, they are tiny. Does it mean we can ignore them?
>> What is a proper size to evaluate MatMult?
>>
>
> It depends on the purpose/audience for the study. There is no right size
> other than being much larger than the launch cost, perhaps.
>
>
>> I heard it is a few thousand rows per MPI rank.  Why?
>> Thanks.
>> --Junchao Zhang
>>
>

Reply via email to