On Sat, Feb 22, 2020 at 11:05 PM Karl Rupp wrote:
> Hi Junchao,
>
> > I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with
> > 6 mpi ranks and 6 GPUs. It took about 0.9 seconds.
>
> How many nonzeros per row? With 0.9 seconds you should either have many
> runs of MatMult, or
Hi Junchao,
I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with
6 mpi ranks and 6 GPUs. It took about 0.9 seconds.
How many nonzeros per row? With 0.9 seconds you should either have many
runs of MatMult, or a fairly dense matrix; or a really slow MatMult
kernel ;-)
On Fri, Feb 21, 2020 at 6:41 PM Matthew Knepley wrote:
> I think Karl goes into these issues here:
> https://arxiv.org/pdf/1410.4054.pdf
>
Wonderful, thanks.
>
> Thanks,
>
> Matt
>
> On Fri, Feb 21, 2020 at 5:58 PM Junchao Zhang via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>>
>>
>>
I think Karl goes into these issues here:
https://arxiv.org/pdf/1410.4054.pdf
Thanks,
Matt
On Fri, Feb 21, 2020 at 5:58 PM Junchao Zhang via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:
>
>
> On Fri, Feb 21, 2020 at 4:38 PM Mark Adams wrote:
>
>>
>>
>> On Fri, Feb 21, 2020 at 4:51 PM
On Fri, Feb 21, 2020 at 4:38 PM Mark Adams wrote:
>
>
> On Fri, Feb 21, 2020 at 4:51 PM Junchao Zhang via petsc-dev <
> petsc-dev@mcs.anl.gov> wrote:
>
>> Hello,
>>
>> I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with
>> 6 mpi ranks and 6 GPUs. It took about 0.9 seconds.
On Fri, Feb 21, 2020 at 4:51 PM Junchao Zhang via petsc-dev <
petsc-dev@mcs.anl.gov> wrote:
> Hello,
>
> I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with 6
> mpi ranks and 6 GPUs. It took about 0.9 seconds. A kernel launch or a
> stream synchronization took about 10us.
>
Hello,
I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with 6
mpi ranks and 6 GPUs. It took about 0.9 seconds. A kernel launch or a
stream synchronization took about 10us. Compared with MatMult, they are
tiny. Does it mean we can ignore them? What is a proper size to