[petsc-dev] VecMDot_SeqCUSP improved

Karl Rupp Fri, 29 Mar 2013 06:51:47 -0500

Hi Jose,

 > Here are my numbers for this size. They are similar to yours (a bit 
worse, though). Also, I tried with ViennaCL which gave very poor 
performance (is this normal?).


Ok, good, this is consistent then. The ViennaCL-performance is indeed 
rather poor in this setting. This is partially because of the mdot() 
operation not being optimized yet (just using an iterated application of 
dot()). Nevertheless, the factor of ~2 compared to CUSP is due to some 
additional latency overhead when doing CPU<->GPU transfer of OpenCL vs. 
CUDA. I also see this in full CG benchmarks, cf. the two green curves here:
http://viennacl.sourceforge.net/uploads/pics/cg-timings.png
Only at sizes of 10^6 the CUDA and OpenCL curves finally meet.

I assume that NVIDIA could do better with their OpenCL driver if they 
wanted, but it's more beneficial for them to drive their customers 
towards CUDA - just speculation, of course...

 >
> I tried a full SLEPc computation, with a matrix of order 256,000 and making 
> VecMDot operate on 40 vectors. The gain from 'master' to 'next' is 91 seconds 
> to 53 seconds. So, yes it is good improvement. Thanks.

Cool, glad to see that this pays off well :-)

> However, I still see only a modest speedup (about 4) with respect to CPU 
> (since we do some optimizations for the CPU). Also, performance depends a lot 
> on the different matrix dimensions. I have to figure out how to optimize it 
> more for the GPU as well.

I think that the current mdot() implementation can be tweaked by a few 
additional percent for larger values of k. On the other hand, I would 
not expect more than a factor of ~5 performance gain over the CPU, 
particularly if you really compare against a well-optimized CPU 
implementation. The memory bandwidth on GPUs is (usually) less than a 
factor of 10 larger, particularly if you have three or four DRAM 
channels for the CPU, plus it's harder to get full bandwidth utilization.

Best regards,
Karli

[petsc-dev] VecMDot_SeqCUSP improved

Reply via email to