Re: [pocl-devel] Debugging auto vectorizer

Pekka Jääskeläinen Tue, 06 Feb 2018 23:12:08 -0800

Hello Timo,

I'm glad to hear you are willing to contribute to the cause of
open and performance portable OpenCL.

Beware, though, some of the kernel compiler needs major rewrites forclarity, and unfortunately there are only a few people working on the kernel

compiler. But hopefully soon we can count you in as one :)

This reminds me that I should really write the "how to tune and hack the
pocl kernel compiler" document.

Maybe this is a starter for that:

There are several useful environment variables for debugging and analyzing
the kernel compiler optimizations:
http://portablecl.org/docs/html/env_variables.html

First, you can make pocl to dump more debug output from LLVM and its vectorizer:

* POCL_DEBUG_LLVM_PASSES
When set to 1, enables debug output from LLVM passes during optimization.

* POCL_VECTORIZER_REMARKS

When set to 1, prints out remarks produced by the loop vectorizer of LLVMduring kernel compilation.



To debug and analyze the kernel compiler intermediate results closer,
you can instruct pocl to leave the temporary LLVM bitcode files (normally it
deletes them after they are not needed).

POCL_CACHE_DIR, it's useful to set this to a local temp dir which you can
clear up between trials.

POCL_LEAVE_KERNEL_COMPILER_TEMP_FILES=1

Then after executing your OpenCL app, under your temp dir, you will
find .bc files, the most interesting one being parallel.bc which is
the final IR produced by pocl and LLVM before codegen.  If you don't
see vector LLVM IR there, it won't likely appear in your final
binary either.

To start hacking:

http://portablecl.org/docs/html/kernel_compiler.html

Also our pocl paper might provide additional help, but the above link should
give a good overview although it might be outdated (I've added it to my
task list to update it).

The LLVM passes are under lib/llvmopencl. The layer between OpenCL
runtime and the kernel compiler is in files lib/CL/pocl_llvm*.c

Please don't hesitate to ask for further instructions here or in IRC.

BR,
Pekka


On 02/07/2018 02:20 AM, Timo Betcke wrote:

Hi,
we noticed for one of our OpenCL kernels that pocl is over 4 times slowerthan the Intel OpenCL runtime on a Xeon W processor. I am assuming it is theauto vectorizer. How can I debug this and figure out if vectorization acrosswork items is being performed with pocl? The kernels are running underPyOpenCL on Ubuntu 16.04 with LLVM 4 and pocl 1.0.
We are planning to distribute our software and would prefer to have goodperformance on pocl and not have to rely on the Intel environment.
Best wishes

Timo

--
Dr. Timo Betcke
Reader in Mathematics
University College London
Department of Mathematics
E-Mail: [email protected] <mailto:[email protected]>
Tel.: +44 (0) 20-3108-4068
Fax.: +44 (0) 20-7383-5519


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot



_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel


--
Pekka

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel

Re: [pocl-devel] Debugging auto vectorizer

Reply via email to