I have 4 54 megabyte buffers which I want to perform byte by byte
analysis on.  I can copy the data in roughly 100msec, this seems like
decent tranfer time ~2gbyte/ second.  However, when I go to execute my
kernel the overhead passing in my device pointers is huge.  Something
like 500msec even on a no-op kernel.


Data transfer and call code:


        ueblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=ueb)
        leblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=leb)
        urblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=urb)
        lrblock_d = cl.Buffer(self.ctx, mf.READ_ONLY |
mf.COPY_HOST_PTR, hostbuf=lrb)
        eb_errors_d = cl.Buffer(self.ctx, mf.WRITE_ONLY |
mf.COPY_HOST_PTR, hostbuf=eb_errors)
        num_bytes = self.geom.block_max_bytes / 2
        self.cl_funcs.calc_charge_lvl(self.queue, (num_bytes ,), None,
                   ueblock_d, leblock_d, urblock_d, lrblock_d, eb_errors_d)


no-op kernel:


        __kernel void calc_charge_lvl(__global uchar* ueblock,
__global uchar* leblock,
                                      __global uchar* urblock,
__global uchar* lrblock,
                                      __global uint* errors)
        {
            uint i = get_global_id(0);
        }


Am I doing something absurdly broken?  If I only pass in errors which
is a 393 kbyte buffer the overhead is almost nil.

Thanks,
Ryan

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to