Hi all,

I'm trying to speed up my code for AES encryption overlapping the
encryption phase with the reading/writing into device memory as
explained here
http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/OpenCL_Best_Practices_Guide.pdf,
 paragraph 3.1.3. Basically, I use two separate queues with the same context 
and split my data in two halves: while the first half is being encrypted by the 
first queue, the second one is moved into device memory. After the first half 
gets encrypted, I start reading the result on the first queue, and launch the 
encryption of the second half on the second queue. Finally, I read the 
encrypted second half from the second queue. Here's the code:
            
halveSize=len(data)/2
memSize = len(data)-halveSize

T0buff = cl.Buffer(self.context, clmem.READ_ONLY|clmem.COPY_HOST_PTR, 
                   hostbuf=T0)
T1buff = cl.Buffer(self.context, clmem.READ_ONLY|clmem.COPY_HOST_PTR,
                   hostbuf=T1)
T2buff = cl.Buffer(self.context, clmem.READ_ONLY|clmem.COPY_HOST_PTR, 
                   hostbuf=T2)
T3buff = cl.Buffer(self.context, clmem.READ_ONLY|clmem.COPY_HOST_PTR, 
                   hostbuf=T3)

            
# Create input and output buffers
inBuffers = [cl.Buffer(self.context, clmem.READ_ONLY, halveSize),
             cl.Buffer(self.context, clmem.READ_ONLY, remSize)]
outBuffers = [cl.Buffer(self.context, clmem.WRITE_ONLY, halveSize),
              cl.Buffer(self.context, clmem.WRITE_ONLY, remSize)]


# Non-blocking copy of the first halve
cl.enqueue_copy(self.cmdQueues[0],
                inBuffers[0], np.fromstring(data[0:halveSize],
                                            dtype=np.uint8),
                is_blocking=False)
self.cmdQueues[0].flush()

# Launch kernel on the first halve
program.aes_ecb(self.cmdQueues[0], (halveSize>>4,), (256,), 
                keyschedBuffer,
                inBuffers[0], outBuffers[0],
                T0buff, T1buff, T2buff, T3buff)
            
# Start copying the second halve
cl.enqueue_copy(self.cmdQueues[1],
                inBuffers[1], np.fromstring(data[halveSize:], 
                                            dtype=np.uint8),
                is_blocking=False)
            
self.cmdQueues[0].flush()
self.cmdQueues[1].flush()
            
# Launch kernel on the second halve
program.aes_ecb(self.cmdQueues[1], (remSize>>4,), (256,), 
                keyschedBuffer,
                inBuffers[1], outBuffers[1],
                T0buff, T1buff, T2buff, T3buff)

# Non-blocking read of the first halve
result = np.empty(len(data), dtype=np.uint8)
cl.enqueue_copy(self.cmdQueues[0], result, outBuffers[0],       
                is_blocking=False)
            
self.cmdQueues[0].flush()
self.cmdQueues[1].flush()

# Finally, read the second halve
cl.enqueue_copy(self.cmdQueues[1], result, outBuffers[1], 
                device_offset=halveSize)

When I try to read the result of the first encryption (that is, I
execute the enqueue_copy of outBuffers[0]) I get this error:

Traceback (most recent call last):
  File "./pyclaes.py", line 40, in <module>
    exit(main())
  File "./pyclaes.py", line 31, in main
    cipherText = enc.encrypt(args.key, data)
  File
"/home/muogoro/Univ/sicurezza/tesina_gpu/pyclaes/pyclaes/pyclaes_ecb.py", line 
401, in encrypt
    cl.enqueue_copy(self.cmdQueues[1], result, outBuffers[1],
device_offset=halveSize)
  File
"/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
 line 780, in enqueue_copy
    return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
pyopencl.LogicError: clEnqueueReadBuffer failed: invalid value


I cannot figure out what I'm doing wrong. Any hint would be really
appreciated :)

Daniele




_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to