Hi Pekka,

On Mon, Mar 18, 2013 at 10:07:26PM +0200, Pekka Jääskeläinen wrote:
> It should be doable with the CUDA API and the LLVM NVPTX backend.
> I took a look at the CUDA API some time ago with this exact idea in
> mind, but didn't have the time to move forward with it.
> 
> How much work it is, I'm not sure, as I haven't tested the LLVM NVPTX
> backend nor the API. But my guess is it shouldn't be too hard to get
> something running because we have the previous drivers for
> heterogeneous device setups as examples.
> 
> If you are up for the task, take a look at the pocl device
> drivers for cellspu, TCE (ttasim), or the Tom Stellard's
> unfinished Gallium compute / AMD R600 driver.

Thanks for your quick response! I took a tour of the code in lib/CL
and lib/CL/devices, and the addition of a CUDA driver device seems
feasible.

If I read the clFinish code correctly, the queue is synchronous, i.e.,
items in the queue are processed one at a time, and the host thread
blocks during processing.

Did you already think about asynchronous/non-blocking processing of
the queue? This would be useful for computation and memory transfer
overlap, and CPU+GPU or multi-GPU computation.

Regards,
Peter

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to