On 08/07/2013 07:28 AM, Alun Evans wrote: > i.e. I'm compiling pocl natively on x86_64, but trying to add a new > device that is an ARM based platform. > > I've actually got a few more questions on that, but I think I'm making > some progress. I just thought about checking whether this has been > successfully attempted before?
Yes we use the basic heterogeneous setup all the time: the host is a x86_64 Linux system and the device is something else. If you want to make it run you need to write a device-layer implementation that defines how to interact with your ARM device from the host. See the earlier device layer implementations under lib/CL/devices/. The current heterogeneous devices there are the experimental cellspu and tce. The former offloads the kernels in a Cell processor to SPUs and the latter offloads the kernels to a simulator that simulates TTA-based accelerators designed using TCE. There are a set of functions you need to implement for this, it should be straightforward. > So far I've managed to get a pocl binary (example1) to spit out some arm > .so's. Then you need to get the work-group function to the device, run it, read/write buffers to it, etc. using the device layer implementation. > Well infact the device is a bit space limited, so holding a a > toolchain out there would have been a bit of a pain. You do not need the toolchain in the device with the standalone setup. Everything (the program + the kernels) will be (cross-)compiled offline to a single binary. This is how it works in TCE now (currently using its own host API stubs though): http://tce.cs.tut.fi/user_manual/TCE/node21.html So, like Kalle wrote, in this setup you need to precompile the kernel for the work group size you will need. An alternative would be to create a kernel compiler mode which creates work-item loops with variable iteration counts (the WG dimensions), but then fine-grained parallelization of multiple work-items (e.g., vectorization) gets more challenging. Yet another improvement would be to be able to compile multiple work-group sizes of the kernel using the kernel compiler without using the attribute. -- --Pekka ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
