I have 4 54 megabyte buffers which I want to perform byte by byte
analysis on. I can copy the data in roughly 100msec, this seems like
decent tranfer time ~2gbyte/ second. However, when I go to execute my
kernel the overhead passing in my device pointers is huge. Something
like 500msec even on a no-op kernel.
Data transfer and call code:
ueblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=ueb)
leblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=leb)
urblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=urb)
lrblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=lrb)
eb_errors_d = cl.Buffer(self.ctx, mf.WRITE_ONLY |
mf.COPY_HOST_PTR, hostbuf=eb_errors)
num_bytes = self.geom.block_max_bytes / 2
self.cl_funcs.calc_charge_lvl(self.queue, (num_bytes ,), None,
ueblock_d, leblock_d, urblock_d, lrblock_d, eb_errors_d)
no-op kernel:
__kernel void calc_charge_lvl(__global uchar* ueblock,
__global uchar* leblock,
__global uchar* urblock,
__global uchar* lrblock,
__global uint* errors)
{
uint i = get_global_id(0);
}
Am I doing something absurdly broken? If I only pass in errors which
is a 393 kbyte buffer the overhead is almost nil.
Thanks,
Ryan
_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl