On 05/04/13 21:18, Geoffrey Anderson wrote: > Hello, > > I have a question about multiple GPU devices. I finished my first > original pycuda application today. Pycuda is excellent for the > simplicity improvement of the programming as provided by the > ElementwiseKernel. The ElementwiseKernel is much, much better than > fiddling with the memory hierarchies within the GPU device. > Elementwise is excellent because I prefer to focus more of my > development effort on my application's logic and its parallel > decomposition of work and internal synchronization. > > [SNIP]
The CUDA API is not particularly well suited to using multiple devices concurrently. It is doable (just about!) but is not pleasant. Without a doubt the best way to use multiple GPU devices is indirectly by parallelising your application with MPI (for example, by using the excellent mpi4py library). Doing so will not only allow you to take advantage of multiple GPUs inside of a single system but will also allow you to split your work across multiple *physical* systems connected via Ethernet/IB/etc. More recent MPI implementations have near complete support for GPU Direct allowing CUDA device pointers to be passed directly to MPI_Send/Recv functions. (And if GPU Direct is not applicable falling back to a CUDA memcpy and regular MPI_Send/Recv.) While it is indeed possible to archive a similar result using multiple threads (with all of the caveats that entails) I would recommend against any such approach. Not only is it more limited than the MPI methodology described above but it often results in inferior real-world performance. (Welcome to the world of NUMA, where threaded applications come to die.) Regards, Freddie.
signature.asc
Description: OpenPGP digital signature
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
