On Sun, 12 Feb 2012 13:28:52 +0100, Daniele Pianu <[email protected]> wrote:
> Hi all,
> 
> I'm trying to speed up my code for AES encryption overlapping the
> encryption phase with the reading/writing into device memory as
> explained here
> http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/OpenCL_Best_Practices_Guide.pdf,
> paragraph 3.1.3. Basically, I use two separate queues with the same
> context and split my data in two halves: while the first half is being
> encrypted by the first queue, the second one is moved into device
> memory. After the first half gets encrypted, I start reading the
> result on the first queue, and launch the encryption of the second
> half on the second queue. Finally, I read the encrypted second half
> from the second queue. Here's the code:

I know that to get true overlapping on Nv, those buffers have to be
what's called "page-locked" on the Nvidia side. This requires
CL_MEM_ALLOC_HOST_PTR (which has a different meaning, as you may
know). Also, it seems you're using CUDA 3.2? The Nv CL drivers have
matured significantly since 3.2, I'd advise you to use something newer.

HTH,
Andreas

Attachment: pgpGgVtagMen3.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to