Ralf
Much of vecmathlib comes from another project where I needed this
functionality. In particular, I am using finite differences on
multi-dimensional arrays that can benefit greatly from vectorisation.
I now extracted from there and added to vecmathlib intrinsics to load and
store numbers from/to memory, i.e. arrays. These functions are mostly
equivalent to vload* and vstore* in OpenCL. This provides two important
capabilities:
(1) The load/store functions accept a mask parameter, allowing vectorising
loops that are not an even multiple of the vector length.
(2) The load/store functions distinguish between aligned and unaligned
memory accesses, where aligned accesses are faster. This may require
adjusting the lower loop bound to start on an aligned memory location.
The number of loop iterations is in general not a multiple of the vector
size. Also, using scalar loop iterations for the left-over iterations does
not work since this is much slower and increases the code size.
-erik
On Tue, Feb 5, 2013 at 6:55 AM, Ralf Karrenberg <[email protected]> wrote:
> Hi,
>
> I haven't had a look at the code, but from what you are writing, this
> sounds like exactly what I would need to integrate into libWFV. The
> vectorizer has an API to specify mappings of functions to SIMD equivalents,
> which is all that you need if all the implementations are there already.
> So, WFV should be able to work with your library within a few hours of
> integration work. I'll look into that later.
>
> By the way, I recall a discussion on integrating such a library (possibly
> as a .bc file) into LLVM. You may want to have a look at the thread and
> respond:
> http://llvm.1065342.n5.nabble.**com/SIMD-trigonometry-**
> logarithms-tt54215.html#none<http://llvm.1065342.n5.nabble.com/SIMD-trigonometry-logarithms-tt54215.html#none>
>
> Cheers,
> Ralf
>
>
> On 2/3/13 7:02 PM, Erik Schnetter wrote:
>
>> On Sun, Feb 3, 2013 at 12:25 PM, Pekka Jääskeläinen
>> <[email protected]
>> <mailto:pekka.jaaskelainen@**tut.fi<[email protected]>>>
>> wrote:
>>
>> On 02/03/2013 03:56 PM, Erik Schnetter wrote:
>> > In my mind, the vectorizer would never look into sqrt() or any
>> other functions
>> > defined in the language standard, but would simply expect
>> efficient vector
>> > implementations of these. Instead of looking into the language
>> standard we could
>> > also add a respective attribute to the function definitions. This
>> attribute
>> > would then confirm that e.g. double2 sqrt(double2) is equivalent
>> to double
>> > sqrt(double). __attribute__((__vector_**equivalence__)) could be a
>> name.
>>
>> OK. The "known" functions should not be inlined but the vectorizer
>> should
>> recognize them (if we do not go towards the intrinsics approach). In
>> the end,
>> the autovectorized work group function and an explicitly vectorized
>> kernel will
>> call the same vector-optimized function in this scheme.
>>
>> For starters we might just use a "white list" for the known
>> vectorizable
>> functions, and assume a trivial scalar to vector mapping for the
>> arguments
>> and the return value. Or use intrinsics for the known ones.
>>
>> Looking at the code of LLVM's LoopVectorize, it seems to be able to
>> vectorize some intrinsics already:
>>
>> case Intrinsic::sqrt:
>> case Intrinsic::sin:
>> case Intrinsic::cos:
>> case Intrinsic::exp:
>> case Intrinsic::exp2:
>> case Intrinsic::log:
>> case Intrinsic::log10:
>> case Intrinsic::log2:
>> case Intrinsic::fabs:
>> case Intrinsic::floor:
>> case Intrinsic::ceil:
>> case Intrinsic::trunc:
>> case Intrinsic::rint:
>> case Intrinsic::nearbyint:
>> case Intrinsic::pow:
>> case Intrinsic::fma:
>> case Intrinsic::fmuladd:
>>
>> Is there some important ones missing? If not, then we could think of
>> going
>> the intrinsics route for these calls. I.e., call the intrinsics from
>> the kernel lib and expand them to calls to your functions+inline after
>> autovectorization.
>>
>>
>> "Important" probably depends on how frequently they are used in
>> real-world code, or in benchmarks. The actual list of intrinsics (as
>> listed e.g. in the OpenCL or C standard) is probably three of four times
>> as long. I would also add the various convert* and as* (i.e. cast)
>> functions to the list.
>>
>> I could create a longer list if that would be helpful.
>>
>> These functions should still be inlined, but only after vectorization.
>>
>> -erik
>>
>> --
>> Erik Schnetter
>> <eschnetter@**perimeterinstitute.ca<[email protected]>
>> <mailto:eschnetter@**perimeterinstitute.ca<[email protected]>
>> >>
>>
>> http://www.perimeterinstitute.**ca/personal/eschnetter/<http://www.perimeterinstitute.ca/personal/eschnetter/>
>> AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
>> <mailto:[email protected]>
>>
>>
>>
>> ------------------------------**------------------------------**
>> ------------------
>> Everyone hates slow websites. So do we.
>> Make your web apps faster with AppDynamics
>> Download AppDynamics Lite for free today:
>> http://p.sf.net/sfu/appdyn_**d2d_jan <http://p.sf.net/sfu/appdyn_d2d_jan>
>>
>>
>>
>> ______________________________**_________________
>> pocl-devel mailing list
>> [email protected].**net <[email protected]>
>> https://lists.sourceforge.net/**lists/listinfo/pocl-devel<https://lists.sourceforge.net/lists/listinfo/pocl-devel>
>>
>>
--
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/
AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel