Received from Baskaran Sankaran on Tue, Nov 17, 2015 at 03:08:10PM EST: > @Lev, thanks for the tip; I will look into it. > > In the meanwhile, I am running into some speed issues. I notice that it > slows down progressively almost by a factor of 0.5, in just 7000 updates. > It starts with about 2.6 sec/ mini-batch (average speed), but after 7000 > mini-batches, the time increases to 3.7 secs/ mini-batch. > > I suspect that I may not be sending the host memory pointers but the actual > arrays, serialized by zmq's send_pyobj (see below in the code). Could > someone confirm whether I am doing it correctly? Should I just be sending/ > receiving host memory pointers?
You are transmitting the array contents. If you use IPC to send the GPU array pointers to both processes [1], you should be able to perform a device-to-device copy between the two memory locations even if you can't use P2P [2] (assuming that UVA is supported on both devices). [1] https://gist.github.com/e554b3985e196b07f93b [2] https://gist.github.com/3078644 -- Lev Givon Bionet Group | Neurokernel Project http://lebedov.github.io/ http://neurokernel.github.io/ _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
