Dear Ivan, Can you please proveide any benchmarks and comparison of Hybrid CPU/GPU vs pure CPU computation? With best regards, Victor.
Ivan Girotto <ivan.girotto at ichec.ie> Thursday 05 May 2011 > Dear QE users & developers, > > We are happy to announce that the first beta GPU-enabled release of > Quantum ESPRESSO (QE) has been committed today in the official repository. > > You can download the new version of the code using the following command: > > $ svn checkout > svn://scm.qe-forge.org/scmrepos/svn/q-e/branches/espresso-PRACE > > The Irish Centre for High-End Computing (ICHEC, www.ichec.ie > <http://www.ichec.ie>) has been mainly responsible for extending the QE > suite to enhance calculations on NVIDIA GPUs. The porting activity has > been supported within the PRACE 1st Implementation Phase project. It is > currently carried out through the Sub-task "Accelerator", led by ICHEC, > within the Work-Package "Programming Techniques for High-Performance > Applications" in collaboration with CINECA. > > The porting activity is concerned mainly with the PWscf package. But > ICHEC and the Irish QE user community are interested in exploring any > other initiatives which come forward from QE users or developers > interested in porting on GPGPU architecture any of the QE suite related > codes. > > We have successfully accelerated the linear algebra part of the QE suite > using a library called phiGEMM, some explicit computational kernels > (newd, addusdense, vloc_psi) and the 3D FFT for the single CPU/GPU > version. Both linear algebra (matrix multiplication) and the FFT > accelerated version make use of CUDA libraries. The porting is mainly > based on wrappers that permit the use of libraries for accelerators. The > distributed 3D FFT version is still in progress, since this porting > requires important changes of the current structure of the code and data > distribution. While running the parallel and distributed multi-GPUs > version it still uses the original 3D FFT implementations. > > The phiGEMM library is distributed as an independent open-source > external package together with the Quantum ESPRESSO suite. It aims to > perform matrix multiplication ([SDZ]GEMM) taking advantage of the > underlying BLAS kernel functions on both CPU and NVIDIA CUDA-based GPU, > mixing and overlapping computation between the host (CPU) and the > accelerator (GPU). Whatever code makes intensive use of GEMM it can > derive a significant advantage linking this library when running on a > CPU/GPU hybrid system. > > Even if the 3D FFT is accelerated only for a single CPU process (not > when using MPI), other parts are already enabled to take advantage of a > distributed parallel hybrid system. All of this allows PWscf to > potentially use distributed message-passing parallelization (MPI) plus > multi-threading (OpenMP) plus accelerators (NVIDIA GPUs) all together > and produce good performance enhancement using the latest version of > NVIDIA GPUs (high performance double precision is needed). This porting > activity is still in progress, especially the parallel 3D FFT component > that represents a bottleneck for large calculations. We have been > testing this beta release using some small/medium benchmarks used in the > DEISA official bench-suite and several GPU hardware (Tesla and Fermi > architectures). Special thanks goes to both E4 Computer Engineering and > CEA for providing access to hybrid GPU systems with differing > configurations to those available at ICHEC. > > We look forward with interest to receiving any suggestions for > improvement, feedback or request for collaboration by anyone who is > interested to try and validate PWscf CUDA version on different platforms > using different scientific cases.For additional information please > contact qe-gpu at ichec.ie or replay at this mail. We'll be shortly > available a dedicated forum q-e-gpgpu at qe-forge.org > <http://qe-forge.org/mail/?group_id=10>. Please use this list for bug > report and any other issue related to the use of the PWscf GPU version. > > Although compilation of the GPU implementation is fairly > straight-forward, we kindly suggest that users carefully read the > README.GPU that is included. The intrinsic characteristics of hybrid > multi- and many-core systems require careful consideration to best > exploit the available computing power. > > Any and all suggestions are welcome and will be very much appreciated. > > Ivan Girotto & Filippo Spiga > > --- > > ICHEC GPU developer team > > The Tower - 7th floor > Trinity Technology& Enterprise Campus > Grand Canal Quay - Dublin 2 - Ireland > > +353-1-5241608 (ph) / +353-1-7645845 (fax)
