On Sun, Feb 3, 2013 at 12:25 PM, Pekka Jääskeläinen <
[email protected]> wrote:

> On 02/03/2013 03:56 PM, Erik Schnetter wrote:
> > In my mind, the vectorizer would never look into sqrt() or any other
> functions
> > defined in the language standard, but would simply expect efficient
> vector
> > implementations of these. Instead of looking into the language standard
> we could
> > also add a respective attribute to the function definitions. This
> attribute
> > would then confirm that e.g. double2 sqrt(double2) is equivalent to
> double
> > sqrt(double). __attribute__((__vector_equivalence__)) could be a name.
>
> OK. The "known" functions should not be inlined but the vectorizer should
> recognize them (if we do not go towards the intrinsics approach). In the
> end,
> the autovectorized work group function and an explicitly vectorized kernel
> will
> call the same vector-optimized function in this scheme.
>
> For starters we might just use a "white list" for the known vectorizable
> functions, and assume a trivial scalar to vector mapping for the arguments
> and the return value. Or use intrinsics for the known ones.
>
> Looking at the code of LLVM's LoopVectorize, it seems to be able to
> vectorize some intrinsics already:
>
>    case Intrinsic::sqrt:
>    case Intrinsic::sin:
>    case Intrinsic::cos:
>    case Intrinsic::exp:
>    case Intrinsic::exp2:
>    case Intrinsic::log:
>    case Intrinsic::log10:
>    case Intrinsic::log2:
>    case Intrinsic::fabs:
>    case Intrinsic::floor:
>    case Intrinsic::ceil:
>    case Intrinsic::trunc:
>    case Intrinsic::rint:
>    case Intrinsic::nearbyint:
>    case Intrinsic::pow:
>    case Intrinsic::fma:
>    case Intrinsic::fmuladd:
>
> Is there some important ones missing? If not, then we could think of going
> the intrinsics route for these calls. I.e., call the intrinsics from
> the kernel lib and expand them to calls to your functions+inline after
> autovectorization.
>

"Important" probably depends on how frequently they are used in real-world
code, or in benchmarks. The actual list of intrinsics (as listed e.g. in
the OpenCL or C standard) is probably three of four times as long. I would
also add the various convert* and as* (i.e. cast) functions to the list.

I could create a longer list if that would be helpful.

These functions should still be inlined, but only after vectorization.

-erik

-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_jan
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to