On Fri, Feb 21, 2020 at 4:38 PM Mark Adams <mfad...@lbl.gov> wrote: > > > On Fri, Feb 21, 2020 at 4:51 PM Junchao Zhang via petsc-dev < > petsc-dev@mcs.anl.gov> wrote: > >> Hello, >> >> I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with >> 6 mpi ranks and 6 GPUs. It took about 0.9 seconds. A kernel launch or a >> stream synchronization took about 10us. >> > > Your call, but you should run the code once and then run it in a new > timer. I've seen some big "warm up costs" GPUs today. >
Yes, I usually run hundreds iterations and skip the first few. Thanks. > > >> Compared with MatMult, they are tiny. Does it mean we can ignore them? >> What is a proper size to evaluate MatMult? >> > > It depends on the purpose/audience for the study. There is no right size > other than being much larger than the launch cost, perhaps. > > >> I heard it is a few thousand rows per MPI rank. Why? >> Thanks. >> --Junchao Zhang >> >