Hello, I have a question about multiple GPU devices. I finished my first original pycuda application today. Pycuda is excellent for the simplicity improvement of the programming as provided by the ElementwiseKernel. The ElementwiseKernel is much, much better than fiddling with the memory hierarchies within the GPU device. Elementwise is excellent because I prefer to focus more of my development effort on my application's logic and its parallel decomposition of work and internal synchronization.
That said, how could my application use *both* of the GPU devices that I have installed on my workstation (or server)? I am using a collection of GPU devices in the larger context of research in parallelizing application software for the CPU as well as for the GPU. I already have developed an inventory of applications that are parallelizable on CPUs. Therefore naturally it would make quite a lot of sense for me to partition the independent parts of the computations within my applications, and send one work partition to each of these GPU devices. I envision that the results may be computed at the same time on my two CUDA devices, and then the results returned to the host application for final aggregation of the partial results after a parallel barrier, or after the completion of a sequence of blocking join calls one per CUDA device. Here's the thing: The current API of the PyCUDA system seems to be designed or intended to use just one CUDA device at a time by any given application. The current API looks at the environment variable CUDA_DEVICE or the special disk file, to decide which GPU device to the send the work to. I am having a hard time to think of a way to use such an API to drive both GPU devices in a reliable or predictable manner. I would love to see the existing work of others in this regard, if you would like to share it with us on the mailing list. However, I could also imagine, in a near future, an improvement or an extension of the PyCUDA API where the application code might resemble something like this fictional sketch: ... d0 = pycuda.autoinit.context.get_device(0) d1 = pycuda.autoinit.context.get_device(1) ga0 = gpuarray.to_gpu_async(d0, a[first[0]:last[0]]) # start copying data to d0 ga1 = gpuarray.to_gpu_async(d1, a[first[1]:last[1]]) # start copying some other data to d1 fn = ElementwiseKernel(...) # compile the kernel into GPU code ga0.waitcomplete() # wait for the async array copy to d0 to complete fn.start(d0, colsb, rowsb, ga0, gb, gc0) # get the computation started on d0 using the data copied to d0 ga1.waitcomplete() # wait for the async array copy to d1 to complete fn.start(d1, colsb, rowsb, ga1, gb, gc1) # get the computation started on d1 using the data copied to d1 fn.join(d0) # wait for d0 to complete its computation fn.join(d1) # wait for d1 to complete its computation CombineResults(gc0,gc1) ... The pseudocode above resembles the pthreads API (you may have noticed). The join() calls are blocking functions. The start() calls are nonblocking. It's not necessary to use this syntax, it's just convenient for this example. There are of course other concurrent syntax models that could also work fine, like OpenMP/OpenACC style, or message passing (send/receive). To anyone that may have ideas on this topic: What is the possibility of getting PyCUDA to concurrently use multiple GPU devices concurrently from a Python app? Is it currently possible to achieve similar results using the existing API, that is, have you *actually tried to do it already*? Again, the PyCUDA is so nice to use. I expect PyCUDA could draw a lot more people to adopt some GPU programming than the bone-stock Nvidia CUDA API (it did for me!!!) Probably, many parallel programmers don't want to fiddle with hardware-vendor specific memory hierarchies, because concurrent applications programming can be quite difficult enough, without all that additional proprietary hardware code to write in addition. That's true for me at least. Pthreads is well known to concurrent programmers so that's one way to do this. Thank you for reading. Geoffrey
_______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
