On Sun, 12 Feb 2012 13:28:52 +0100, Daniele Pianu <[email protected]> wrote: > Hi all, > > I'm trying to speed up my code for AES encryption overlapping the > encryption phase with the reading/writing into device memory as > explained here > http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/OpenCL_Best_Practices_Guide.pdf, > paragraph 3.1.3. Basically, I use two separate queues with the same > context and split my data in two halves: while the first half is being > encrypted by the first queue, the second one is moved into device > memory. After the first half gets encrypted, I start reading the > result on the first queue, and launch the encryption of the second > half on the second queue. Finally, I read the encrypted second half > from the second queue. Here's the code:
I know that to get true overlapping on Nv, those buffers have to be what's called "page-locked" on the Nvidia side. This requires CL_MEM_ALLOC_HOST_PTR (which has a different meaning, as you may know). Also, it seems you're using CUDA 3.2? The Nv CL drivers have matured significantly since 3.2, I'd advise you to use something newer. HTH, Andreas
pgpGgVtagMen3.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
