Re: [petsc-dev] Feed back on report on performance of vector operations on Summit requested

Karl Rupp via petsc-dev Wed, 09 Oct 2019 22:35:14 -0700

Hi,

Table 2 reports negative latencies. This doesn't look right to me ;-)

If it's the outcome of a parameter fit to the performance model, thenuse a parameter name (e.g. alpha) instead of the term 'latency'.

Figure 11 has a very narrow range in the y-coordinate and thusexaggerates the variation greatly. "GPU performance" should be adjustedto something like "execution time" to explain the meaning of the y-axis.

Page 12: The latency for VecDot is higher than for VecAXPY becauseVecDot requires the result to be copied back to the host. This is anadditional operation.

Regarding performance measurements: Did you synchronize after eachkernel launch? I.e. did you run (approach A)

 for (many times) {
   synchronize();
   start_timer();
   kernel_launch();
   synchronize();
   stop_timer();
 }
and then take averages over the timings obtained, or did you (approach B)
 synchronize();
 start_timer();
 for (many times) {
   kernel_launch();
 }
 synchronize();
 stop_timer();
and then divide the obtained time by the number of runs?

Approach A will report a much higher latency than the latter, becausesynchronizations are expensive (i.e. your latency consists of kernellaunch latency plus device synchronization latency). Approach B isslightly over-optimistic, but I've found it to better match what oneobserves for an algorithm involving several kernel launches.


Best regards,
Karli



On 10/10/19 12:34 AM, Smith, Barry F. via petsc-dev wrote:

We've prepared a short report on the performance of vectoroperations on Summit and would appreciate any feed back including:inconsistencies, lack of clarity, incorrect notation or terminology, etc.
    Thanks

     Barry, Hannah, and Richard

Re: [petsc-dev] Feed back on report on performance of vector operations on Summit requested

Reply via email to