Hi, This has been prototyped with the initial AMD GPU work and I've also played around with it locally, but not yet committed anything related in the master repo. Patches to add the required infra for "SPMD-optimized" hardware are warmly welcomed -- it should not be much work.
I propose a following approach: 1) Add a new hook function to the driver layer that allows overriding the work-group generation phase of the kernel compiler. Here you can then skip the passes that generate the WG function (WorkitemHandlers), just generate the single WI function. See pocl_cl.h's pocl_device_ops. Override this new function in your target's device layer implementation. 2) Add new fields to the pocl_context struct that are passed at run time to the function (additional argument added by Workgroup.cc) that contains the local ids and local sizes (in case you do not have some specific mechanism for getting the thread ids etc. in your device). See include/pocl_device.h and how the basic/pthread drivers populate these structs when they launch the WG functions. 3) Add overridden implementation of the get_local_id() for your device's kernel built-in lib, that uses the above struct indirectly via global variables that are "privatized" in Workgroup.cc. Similarly to the other ids such as group_ids now. It now refers to global variables in the kernel library and the device driver populates a struct with them set. The kernel compiler converts the magic global variables to point to the struct fields in Workgroup.cc HTH, Pekka On 02/05/2014 01:07 PM, Panagiotis Apostolou wrote: > Hello. > > I'm trying to get pocl to work on a custom multicore processor. > > As far as i understand pocl expands all work items of a work group into > a single binary (either by expanding the code or with loops) in the > call_pocl_workgroup (or in pocl-workgroup script alternately) and then > it is executed serially. This is done by setting > pocl::LocalSize.addValue(local_x) and respectively for y and z (or with > the opt's -local-size parameter in the script). > > My intention is to be able to override this functionality and avoid > expanding the code and run a unique thread for each work item. My > question is which is the cleanest way to do this, with minimal > modification in the code. Any guidelines or hints would be appreciated. > > Thanks for your time, > Panayiotis Apostolou. > ------------------------------------------------------------------------------ Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121051231&iu=/4140/ostg.clktrk _______________________________________________ pocl-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pocl-devel
