Hi,

Summarizing the discussion around the issues regarding the bitcode
libs for the OpenCL C builtin library. The discussion was mostly
taking place in the pull request:
https://github.com/pocl/pocl/pull/12#issuecomment-25010864
and the IRC channel.

At least these problems with the current approach of shipping pre-built
bitcode libs have been identified:

The LLVM bitcode libs are dependent on the target flags in use. In
particular, the different instruction set extensions such as vector
extensions affect the bitcode. They affect the calling
convention, and to access the extensions one needs to
use target-specific LLVM builtin calls which are
not portable to variants of the same CPU family that do not support
the extension.

In practice, one generic x86_64 or ARM built-in bitcode library does
not work for all x86_64 and ARM variants if not built for a safe
(least common denominator) target that cannot exploit the special features.

This affects at least the scenario of binary-distributed pocl. In
this case we have to build and distribute a generic bitcode lib as we
do not know the real variants of the devices the end users have.

Moreover, when we have some more widely useful heterogeneous devices
supported (e.g. via the libcuda or gallium) we need to ship a bitcode
lib for each of the potential devices in the binary distributions.
It will be auto-probed which devices one has installed in the system,
and the device list is populated dynamically accordingly. In this case
one has to ship all possible bitcode libs in the distribution, just
in case.

Not to mention the use case of customizable processors of TCE. There
we can have various combinations of operations supported by the
device at hand. The kernel lib implementations could exploit those
operations in different combinations, chosen by preprocessor macros
set by the TCE compiler. This scenario is currently not possible and we
ship a generic TCE lib that cannot exploit any special operations
explicitly.

One solution I proposed was to distribute and install the sources of
the kernel lib and build the optimized kernel bitcode libs on-demand.
Then these would be cached to the user's home dir, so only the first use
would get the performance hit (there could be a separate population
step in the installation of the binary that does it). I think this could
work alright, especially if we still allow installing bitcodes too as
"sources" for the kernels: then we can case-by-case use compiling from
sources and prebuilt bitcodes.

What do you think?
-- 
Pekka

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from 
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to