On Sat, Mar 23, 2013 at 12:57:47PM +0200, Pekka Jääskeläinen wrote:
> Thus, maybe we need to collect all the local allocations to a single pointer
> and allocate it once and in the kernel reassign the variables to point parts
> of this region. Should not be a difficult addition to the LLVM pass we already
> use for processing the automatic locals.

If this transformation needs to be done at launch time anyway, we
could convert __local and __constant kernel arguments to automatic
arrays. This means that the LLVM IR -> NVPTX compilation should
remain in pocl_cuda_run().

Could you give some pointers for an LLVM newbie how to best achieve
these transformations? I would also need to substitute work_dim and
global_offset in the LLVM IR, if get_work_dim() or get_global_offset()
are used in the OpenCL code.

Peter

------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to