Hi,
I just managed to integrate libWFV into pocl and got the first few results.
Be aware that these are just first measurements under non-reproducible
circumstances, no final conclusion should be drawn from them!
The benchmarks I chose were simply those that worked immediately, I
didn't take a look at the generated code whatsoever. Be aware that those
are mostly benchmarks that are not really well suited for vectorization.
I don't recall which of those below the Intel driver refuses to
vectorize, but it's pretty clear that for many of those, vectorization
may better be disabled.
| pocl-orig | pocl-wfv | Intel | AMD | WFVOpenCL |
------------|-----------|----------|-------|-----|-----------|
BitonicSort |0.22 |0.38 |0.67 |0.84 |0.12 |
------------|-----------|----------|-------|-----|-----------|
DCT |0.72 |0.39 |0.31 |0.55 |0.42 |
------------|-----------|----------|-------|-----|-----------|
FastWalshTr.|1.0 |1.1 |1.1 |1.3 |1.1 |
------------|-----------|----------|-------|-----|-----------|
FloydWarsh. |0.4 |0.59 |0.49 |2.1 |0.55 |
------------|-----------|----------|-------|-----|-----------|
Histogram |0.31 |0.26 |0.29 |0.33 |0.36 |
------------|-----------|----------|-------|-----|-----------|
There's one other bad thing that I just noticed: These numbers are
kernel times with pocl reusing the compilation results. If I only
measure one run after deleting the temporary files, pocl is *really*
slow (roughly 1.5-3 times slower). This suggests that the implementation
suffers a lot from using scripts, command line tools like opt, and thus
disk I/O.
Still, the raw kernel performance looks really really good.
Now I'm going to try to clean up the code and make the implementation
recognize llvm.muladd and the builtin intrinsics (e.g. for sqrt) which
currently result in a crash for benchmarks like Mandelbrot,
BlackScholes, NBody, etc. :p.
On a side note: It was really pretty easy to integrate my stuff, I just
run a wrapper pass that invokes WFV on the kernel before all your custom
transformations start, and adjust the loop induction variable increment
of WILoops. It's currently only a hack but shouldn't be hard to make
that code depend on an environment variable or build flag.
Cheers,
Ralf
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
pocl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pocl-devel