Thanks Andreas for the hint. Actually, what I am trying is little bit
complex than that. I have two python processes running on two GPUs. In a
simpler setting, I have array x in gpu0's Python process to be transferred
to gpu1's process and vice versa.
I solved it in this schema:
* alloc-host-memory
* memcpy from device to host (gpu0 to host; gpu1 to host)
* send/receive objects in host memory to the Python process in the other
gpu
* memcpy from host to device within respective gpu
The solution and output from a sample run follow. Now, I wonder if it is
possible to improve this further. One possibility is whether the device to
host copy can be eliminated. Because, I need to transfer several theano
tensors between multiple (up to 4) gpus and I need to do this quite
frequently (say every nth mini batch) during training.
Note: Not all gpus are P2P capable and so memcpy_peer wouldn't work.
import multiprocessing as mp
import numpy as np
import zmq
import time
import pycuda
import pycuda.driver as drv
import pycuda.gpuarray as gpuarray
def proc1():
import theano
sock = zmq.Context().socket(zmq.PAIR)
sock.connect('tcp://localhost:5003')
drv.init()
ctx = drv.Context.attach()
x_gpu = gpuarray.to_gpu(np.random.rand(8))
y_gpu_copy = gpuarray.zeros_like(x_gpu)
x_host = drv.pagelocked_zeros_like(x_gpu)
drv.memcpy_dtoh_async(x_host, x_gpu.ptr)
sock.send_pyobj(x_host)
y_host_copy = sock.recv_pyobj()
drv.memcpy_htod_async(y_gpu_copy.ptr, y_host_copy)
print "Proc-1: value before transfer\n", x_gpu
print "Proc-1: value after transfer\n", y_gpu_copy
print "Proc-1: sum after transfer\n", x_gpu + y_gpu_copy
ctx.detach()
def proc2():
import theano
sock = zmq.Context().socket(zmq.PAIR)
sock.bind('tcp://*:5003')
drv.init()
ctx = drv.Context.attach()
y_gpu = gpuarray.to_gpu(np.random.rand(8) * 0.9)
x_gpu_copy = gpuarray.zeros_like(y_gpu)
y_host = drv.pagelocked_zeros_like(y_gpu)
drv.memcpy_dtoh_async(y_host, y_gpu.ptr)
sock.send_pyobj(y_host)
x_host_copy = sock.recv_pyobj()
drv.memcpy_htod_async(x_gpu_copy.ptr, x_host_copy)
time.sleep(10)
print "\nProc-2: value before transfer\n", y_gpu
print "Proc-2: value after transfer\n", x_gpu_copy
print "Proc-2: sum after transfer\n", y_gpu + x_gpu_copy
ctx.detach()
if __name__ == '__main__':
p1 = mp.Process(target=proc1)
p2 = mp.Process(target=proc2)
p1.start()
p2.start()
Here is the output from a sample run. As expected, the sum value in both
processes are same in the end.
[dccxc090] ~/multi-GPUs $ /opt/share/Python-2.7.9/bin/python
multi_pycuda_d2d_demo.py
Using gpu device 0: Tesla K40m (CNMeM is disabled)
Using gpu device 1: Tesla K40m (CNMeM is disabled)
Proc-1: value before transfer
[ 0.64424104 0.98413032 0.46654151 0.40943486 0.6895878 0.81006672
0.00907435 0.88727554]
Proc-1: value after transfer
[ 0.57981693 0.88571729 0.41988736 0.36849138 0.62062902 0.72906005
0.00816691 0.79854798]
Proc-1: sum after transfer
[ 1.22405797 1.86984761 0.88642887 0.77792624 1.31021682 1.53912676
0.01724126 1.68582352]
Proc-2: value before transfer
[ 0.57981693 0.88571729 0.41988736 0.36849138 0.62062902 0.72906005
0.00816691 0.79854798]
Proc-2: value after transfer
[ 0.64424104 0.98413032 0.46654151 0.40943486 0.6895878 0.81006672
0.00907435 0.88727554]
Proc-2: sum after transfer
[ 1.22405797 1.86984761 0.88642887 0.77792624 1.31021682 1.53912676
0.01724126 1.68582352]
- Baskaran
On Wed, Nov 11, 2015 at 2:40 AM, Andreas Kloeckner <[email protected]>
wrote:
> Baskaran Sankaran <[email protected]> writes:
>
> > Hi all,
> >
> > I am looking for a solution for exchanging some tensors between two gpus,
> > that do not have P2P enabled. Assuming two GPUs on the same node, I
> guess I
> > have to do it in two steps; first copy to host memory from GPU (gpu-0)
> and
> > then copy from host memory to the other GPU (gpu-1). However it is not
> > exactly clear to me as to how I can go about this.
>
> (1) Allocate memory on host
> (2) memcpy(host mem, gpu0_mem)
> (3) memcpy(gpu1_mem, host_mem)
> (4) (Optionally) free host mem
>
> Not sure what you're asking...
>
> Andreas
>
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda