On 05/04/13 21:18, Geoffrey Anderson wrote:
> Hello,
> 
> I have a question about multiple GPU devices.  I finished my first 
> original pycuda application today.  Pycuda is excellent for the 
> simplicity improvement of the programming as provided by the 
> ElementwiseKernel.  The ElementwiseKernel is much, much better than 
> fiddling with the memory hierarchies within the GPU device.
> Elementwise is excellent because I prefer to focus more of my
> development effort on my application's logic and its parallel
> decomposition of work and internal synchronization.
> 
> [SNIP]

The CUDA API is not particularly well suited to using multiple devices
concurrently.  It is doable (just about!) but is not pleasant.  Without
a doubt the best way to use multiple GPU devices is indirectly by
parallelising your application with MPI (for example, by using the
excellent mpi4py library).

Doing so will not only allow you to take advantage of multiple GPUs
inside of a single system but will also allow you to split your work
across multiple *physical* systems connected via Ethernet/IB/etc.  More
recent MPI implementations have near complete support for GPU Direct
allowing CUDA device pointers to be passed directly to MPI_Send/Recv
functions.  (And if GPU Direct is not applicable falling back to a CUDA
memcpy and regular MPI_Send/Recv.)

While it is indeed possible to archive a similar result using multiple
threads (with all of the caveats that entails) I would recommend against
any such approach.  Not only is it more limited than the MPI methodology
described above but it often results in inferior real-world performance.
 (Welcome to the world of NUMA, where threaded applications come to die.)

Regards, Freddie.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to