Can you share details eg small reproducer? Or post the OpenCL kernel source?
You can use Linux perf or Vtune to access hardware counters to get an idea what
vector code is being generated.
Jeff
(I work for Intel)
Sent from my iPhone
> On Feb 6, 2018, at 4:20 PM, Timo Betcke <[email protected]> wrote:
>
> Hi,
>
> we noticed for one of our OpenCL kernels that pocl is over 4 times slower
> than the Intel OpenCL runtime on a Xeon W processor. I am assuming it is the
> auto vectorizer. How can I debug this and figure out if vectorization across
> work items is being performed with pocl? The kernels are running under
> PyOpenCL on Ubuntu 16.04 with LLVM 4 and pocl 1.0.
>
> We are planning to distribute our software and would prefer to have good
> performance on pocl and not have to rely on the Intel environment.
>
> Best wishes
>
> Timo
>
> --
> Dr. Timo Betcke
> Reader in Mathematics
> University College London
> Department of Mathematics
> E-Mail: [email protected]
> Tel.: +44 (0) 20-3108-4068
> Fax.: +44 (0) 20-7383-5519
> ------------------------------------------------------------------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> pocl-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pocl-devel
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel