On 30/01/14 22:40, Andreas Kloeckner wrote:
> Freddie Witherden <[email protected]> writes:
>> This came up a while ago on the mpi4py mailing list [1] and with CUDA 6
>> brining unified virtual memory it may become more important in the future.
>>
>> It would be nice if PyCUDA device allocations provided a method for
>> creating a suitable Python buffer object: e.g, myalloc.asbuf(offset=0,
>> sz=...).  This would provide a clean way for PyCUDA and mpi4py to
>> interoperate (as the wrapped MPI functions expect Python buffer objects
>> as opposed to pointers).
>>
>> Currently one must either resort to writing a C extension or hacks such as:
>>
>>     import ctypes as ct
>>
>>     make_buf = ct.pythonapi.PyBuffer_FromMemory
>>     make_buf.argtypes = [ct.c_void_p, ct.c_ssize_t]
>>     make_buf.restype = ct.py_object
>>
>>     cubuf = cuda.mem_alloc(SZ*8)
>>     pybuf = make_buf(long(cubuf), SZ*8)
> 
> Done:
> 
> http://documen.tician.de/pycuda/driver.html#pycuda.driver.DeviceAllocation.as_buffer

Thank you for this.  However, toying around with the following example:

from mpi4py import MPI
import numpy as np

import ctypes as ct

make_buf = ct.pythonapi.PyBuffer_FromMemory
make_buf.argtypes = [ct.c_void_p, ct.c_ssize_t]
make_buf.restype = ct.py_object

comm = MPI.COMM_WORLD

SZ = 100

if comm.rank == 0:
    import pycuda.autoinit
    import pycuda.driver as cuda

    cubuf = cuda.mem_alloc(SZ*8)
    pybuf = cubuf.as_buffer(SZ*8)
    #pybuf = make_buf(long(cubuf), SZ*8)

    cuda.memcpy_htod(cubuf, np.arange(SZ, dtype=np.float64))

    comm.Send([pybuf, MPI.BYTE], dest=1, tag=1)
else:
    npbuf = np.empty(SZ)
    comm.Recv(npbuf, source=0, tag=1)

    print npbuf

with OpenMPI 1.7.3 (running as mpirun -n 2 python file.py) I find that
the version using cubuf.as_buffer fails with a segmentation fault due to
invalid permissions whereas the variant using my make_buf hack works as
expected.

I believe the problem is on L1450 of cuda.hpp where:

  PyBuffer_FromMemory((void *) (get_pointer() + size), size)));

which should be:

  PyBuffer_FromMemory((void *) get_pointer(), size)));

(and analogously on L1498 -- which fixes the above issue) also it may be
better to use PyBuffer_FromReadWriteMemory as this will permit CUDA
allocations to be on the receiving end of MPI communications.

Regards, Freddie.

Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to