On Sun, Feb 3, 2013 at 12:25 PM, Pekka Jääskeläinen <
[email protected]> wrote:
> On 02/03/2013 03:56 PM, Erik Schnetter wrote:
> > In my mind, the vectorizer would never look into sqrt() or any other
> functions
> > defined in the language standard, but would simply expect efficient
> vector
> > implementations of these. Instead of looking into the language standard
> we could
> > also add a respective attribute to the function definitions. This
> attribute
> > would then confirm that e.g. double2 sqrt(double2) is equivalent to
> double
> > sqrt(double). __attribute__((__vector_equivalence__)) could be a name.
>
> OK. The "known" functions should not be inlined but the vectorizer should
> recognize them (if we do not go towards the intrinsics approach). In the
> end,
> the autovectorized work group function and an explicitly vectorized kernel
> will
> call the same vector-optimized function in this scheme.
>
> For starters we might just use a "white list" for the known vectorizable
> functions, and assume a trivial scalar to vector mapping for the arguments
> and the return value. Or use intrinsics for the known ones.
>
> Looking at the code of LLVM's LoopVectorize, it seems to be able to
> vectorize some intrinsics already:
>
> case Intrinsic::sqrt:
> case Intrinsic::sin:
> case Intrinsic::cos:
> case Intrinsic::exp:
> case Intrinsic::exp2:
> case Intrinsic::log:
> case Intrinsic::log10:
> case Intrinsic::log2:
> case Intrinsic::fabs:
> case Intrinsic::floor:
> case Intrinsic::ceil:
> case Intrinsic::trunc:
> case Intrinsic::rint:
> case Intrinsic::nearbyint:
> case Intrinsic::pow:
> case Intrinsic::fma:
> case Intrinsic::fmuladd:
>
> Is there some important ones missing? If not, then we could think of going
> the intrinsics route for these calls. I.e., call the intrinsics from
> the kernel lib and expand them to calls to your functions+inline after
> autovectorization.
>
"Important" probably depends on how frequently they are used in real-world
code, or in benchmarks. The actual list of intrinsics (as listed e.g. in
the OpenCL or C standard) is probably three of four times as long. I would
also add the various convert* and as* (i.e. cast) functions to the list.
I could create a longer list if that would be helpful.
These functions should still be inlined, but only after vectorization.
-erik
--
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel