Hi Erik,

have you done any measurements, e.g. how does your implementation 
compare against the code of Julien Pommier (google "SSE math fun")?
This is what I am currently using, but unfortunately the list of 
implemented functions is a lot shorter even than what Pekka posted...

Best,
Ralf

On 2/5/13 2:55 PM, Erik Schnetter wrote:
> Ralf
>
> Much of vecmathlib comes from another project where I needed this
> functionality. In particular, I am using finite differences on
> multi-dimensional arrays that can benefit greatly from vectorisation.
>
> I now extracted from there and added to vecmathlib intrinsics to load
> and store numbers from/to memory, i.e. arrays. These functions are
> mostly equivalent to vload* and vstore* in OpenCL. This provides two
> important capabilities:
>
> (1) The load/store functions accept a mask parameter, allowing
> vectorising loops that are not an even multiple of the vector length.
> (2) The load/store functions distinguish between aligned and unaligned
> memory accesses, where aligned accesses are faster. This may require
> adjusting the lower loop bound to start on an aligned memory location.
>
> The number of loop iterations is in general not a multiple of the vector
> size. Also, using scalar loop iterations for the left-over iterations
> does not work since this is much slower and increases the code size.
>
> -erik
>
>
>
> On Tue, Feb 5, 2013 at 6:55 AM, Ralf Karrenberg <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi,
>
>     I haven't had a look at the code, but from what you are writing,
>     this sounds like exactly what I would need to integrate into libWFV.
>     The vectorizer has an API to specify mappings of functions to SIMD
>     equivalents, which is all that you need if all the implementations
>     are there already.
>     So, WFV should be able to work with your library within a few hours
>     of integration work. I'll look into that later.
>
>     By the way, I recall a discussion on integrating such a library
>     (possibly as a .bc file) into LLVM. You may want to have a look at
>     the thread and respond:
>     
> http://llvm.1065342.n5.nabble.__com/SIMD-trigonometry-__logarithms-tt54215.html#none
>     
> <http://llvm.1065342.n5.nabble.com/SIMD-trigonometry-logarithms-tt54215.html#none>
>
>     Cheers,
>     Ralf
>
>
>     On 2/3/13 7:02 PM, Erik Schnetter wrote:
>
>         On Sun, Feb 3, 2013 at 12:25 PM, Pekka Jääskeläinen
>         <[email protected] <mailto:[email protected]>
>         <mailto:pekka.jaaskelainen@__tut.fi
>         <mailto:[email protected]>>> wrote:
>
>              On 02/03/2013 03:56 PM, Erik Schnetter wrote:
>               > In my mind, the vectorizer would never look into sqrt()
>         or any
>              other functions
>               > defined in the language standard, but would simply expect
>              efficient vector
>               > implementations of these. Instead of looking into the
>         language
>              standard we could
>               > also add a respective attribute to the function
>         definitions. This
>              attribute
>               > would then confirm that e.g. double2 sqrt(double2) is
>         equivalent
>              to double
>               > sqrt(double). __attribute__((__vector___equivalence__))
>         could be a
>              name.
>
>              OK. The "known" functions should not be inlined but the
>         vectorizer
>              should
>              recognize them (if we do not go towards the intrinsics
>         approach). In
>              the end,
>              the autovectorized work group function and an explicitly
>         vectorized
>              kernel will
>              call the same vector-optimized function in this scheme.
>
>              For starters we might just use a "white list" for the known
>         vectorizable
>              functions, and assume a trivial scalar to vector mapping
>         for the
>              arguments
>              and the return value. Or use intrinsics for the known ones.
>
>              Looking at the code of LLVM's LoopVectorize, it seems to be
>         able to
>              vectorize some intrinsics already:
>
>                  case Intrinsic::sqrt:
>                  case Intrinsic::sin:
>                  case Intrinsic::cos:
>                  case Intrinsic::exp:
>                  case Intrinsic::exp2:
>                  case Intrinsic::log:
>                  case Intrinsic::log10:
>                  case Intrinsic::log2:
>                  case Intrinsic::fabs:
>                  case Intrinsic::floor:
>                  case Intrinsic::ceil:
>                  case Intrinsic::trunc:
>                  case Intrinsic::rint:
>                  case Intrinsic::nearbyint:
>                  case Intrinsic::pow:
>                  case Intrinsic::fma:
>                  case Intrinsic::fmuladd:
>
>              Is there some important ones missing? If not, then we could
>         think of
>              going
>              the intrinsics route for these calls. I.e., call the
>         intrinsics from
>              the kernel lib and expand them to calls to your
>         functions+inline after
>              autovectorization.
>
>
>         "Important" probably depends on how frequently they are used in
>         real-world code, or in benchmarks. The actual list of intrinsics (as
>         listed e.g. in the OpenCL or C standard) is probably three of
>         four times
>         as long. I would also add the various convert* and as* (i.e. cast)
>         functions to the list.
>
>         I could create a longer list if that would be helpful.
>
>         These functions should still be inlined, but only after
>         vectorization.
>
>         -erik
>
>         --
>         Erik Schnetter <eschnetter@__perimeterinstitute.ca
>         <mailto:[email protected]>
>         <mailto:eschnetter@__perimeterinstitute.ca
>         <mailto:[email protected]>>>
>
>         http://www.perimeterinstitute.__ca/personal/eschnetter/
>         <http://www.perimeterinstitute.ca/personal/eschnetter/>
>         AIM: eschnett247, Skype: eschnett, Google Talk:
>         [email protected] <mailto:[email protected]>
>         <mailto:[email protected] <mailto:[email protected]>>
>
>
>
>         
> ------------------------------__------------------------------__------------------
>         Everyone hates slow websites. So do we.
>         Make your web apps faster with AppDynamics
>         Download AppDynamics Lite for free today:
>         http://p.sf.net/sfu/appdyn___d2d_jan
>         <http://p.sf.net/sfu/appdyn_d2d_jan>
>
>
>
>         _________________________________________________
>         pocl-devel mailing list
>         [email protected].__net
>         <mailto:[email protected]>
>         https://lists.sourceforge.net/__lists/listinfo/pocl-devel
>         <https://lists.sourceforge.net/lists/listinfo/pocl-devel>
>
>
>
>
> --
> Erik Schnetter <[email protected]
> <mailto:[email protected]>>
> http://www.perimeterinstitute.ca/personal/eschnetter/
> AIM: eschnett247, Skype: eschnett, Google Talk: [email protected]
> <mailto:[email protected]>
>
>
> ------------------------------------------------------------------------------
> Free Next-Gen Firewall Hardware Offer
> Buy your Sophos next-gen firewall before the end March 2013
> and get the hardware for free! Learn more.
> http://p.sf.net/sfu/sophos-d2d-feb
>
>
>
> _______________________________________________
> pocl-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
>

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Reply via email to