Hi,
we noticed for one of our OpenCL kernels that pocl is over 4 times slower
than the Intel OpenCL runtime on a Xeon W processor. I am assuming it is
the auto vectorizer. How can I debug this and figure out if vectorization
across work items is being performed with pocl? The kernels are running
under PyOpenCL on Ubuntu 16.04 with LLVM 4 and pocl 1.0.
We are planning to distribute our software and would prefer to have good
performance on pocl and not have to rely on the Intel environment.
Best wishes
Timo
--
Dr. Timo Betcke
Reader in Mathematics
University College London
Department of Mathematics
E-Mail: [email protected]
Tel.: +44 (0) 20-3108-4068
Fax.: +44 (0) 20-7383-5519
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel