Dear Hande, Although not specifically related to QE you may find the following information useful.
I have recently attended to a workshop in high performance computing, and a series of lectures/hands-on sessions co-organized by NVIDIA. The good news is that CUDA (the NVIDIA accelarated GPU computing "platform" (i.e. modified compilers etc.) ) now has a LAPACK/BLAS implementation that can be used "out of the box" (like intel MKL, no modification to QE mandatory in principle, just download and link like usual) but the bad news is that current blend of GPUs are not so fantastic when performing general operations and/or when double precision is required. My -personal- opinion and -very limited- experience on the matter is that I do not find the prospect particularly promising at the moment, at least until the new GPU "fermi" and attached "general computing platform revision of CUDA" is released. My -personal- reasoning goes like this: -There is a huge bottleneck when moving data through the pci-x bridge between the card memory and the system memory, especially in run-of-the-mill servers , and unless you do low level tricks in your code to optimize these copies, the code runs slower not faster when GPU is involved. Sadly, CUDA do not provide all these low level optimizations in the wrappers they have for Fortran, and it seems not a straightforward problem to implement C wrappers that will do this optimization in QE. -The NVIDIA linux drivers are not open-source and taint the kernel. Their insist on not making their driver open-source like the others has some nasty consequences in performance in some HPC environments. You can browse HPC forums for details. -The GPU is stupidly fast for some select float operations (as the brain does not require that much precison when interpretting scenes in a screen) but not-so-fast for precision operations involving double. Most, if not all, of the variables in QE are double precison. -Same thing, GPU is stupidly fast for things like multiplication and addition, but not-so-fast for divison. Also one must use higly GPU specific tricks like "fused multiply add" to get to the posterity results they are showing around. -Since the 3rd party companies do nasty things like overclocking etc. to the GPU cards they supply, in the same sense that the precision is not mandatory for visual applications, the platform is highly volatile for scientific applications. Personally I wouldn't trust the outcome of a run unless proven otherwise for a specific machine (should be tested very well). I will start modfying my code if I have access to a Fermi GPU just to give it a try, but otherwise I prefer to spend my time on more pressing topics. Best, Baris 2010/3/8 Hande Ustunel <hande at newton.physics.metu.edu.tr>: > Dear Quantum-Espresso community, > > Following the acquisition of a couple of GPU servers by our national > computer cluster, I decided to try and see if I can compile QE on > them. Through some web searching I found some promising mention of current > progress on the porting of QE to GPUs. I was wondering if I could get from > any of you perhaps a more up-to-date idea of the current status, if it's > not too much trouble. > > Thank you very much in advance. > Hande > > -- > Hande Toffoli > Department of Physics > Office 439 > Middle East Technical University > Ankara 06531, Turkey > Tel : +90 312 210 3264 > http://www.physics.metu.edu.tr/~hande > > _______________________________________________ > Pw_forum mailing list > Pw_forum at pwscf.org > http://www.democritos.it/mailman/listinfo/pw_forum >
