Re: [petsc-dev] Proper matrix size to choose when evaluating MatMult?

Karl Rupp Sat, 22 Feb 2020 21:06:46 -0800

Hi Junchao,

I want to evaluate MatMult on GPU. I took a 2M x 2M matrix and ran with6 mpi ranks and 6 GPUs. It took about 0.9 seconds.

How many nonzeros per row? With 0.9 seconds you should either have manyruns of MatMult, or a fairly dense matrix; or a really slow MatMultkernel ;-)

A 2M-by-2M matrix for a 5-point stencil is probably still on the smallside (I'm assuming that you run 2M-by-2M for *each* GPU), but shouldsuffice. Expect that communication cost are significant (i.e. thebookkeeping and data exchange between GPUs is on the order of the costsfor running the MatMult kernel for the respective diagonal block).

A kernel launch ora stream synchronization took about 10us. Compared with MatMult, theyare tiny. Does it mean we can ignore them? What is a proper size toevaluate MatMult? I heard it is a few thousand rows per MPI rank. Why?

That would be a typical strong scaling limit for a CPU-based run awell-tuned BlueGene-type system. With GPUs you will probably need atleast 100k unknowns (or ~1M nonzeros) per rank in the strong scalinglimit. Add a factor of ~10 to make latency costs small in comparison.


Best regards,
Karli

Re: [petsc-dev] Proper matrix size to choose when evaluating MatMult?

Reply via email to