On 11/08/15 22:36, Andreas Kloeckner wrote:
Henry Gomersall<[email protected]>  writes:
>I've noticed that using e.g. clmath._atan2(out, in1, in2, queue) with a
>pre-allocated `out` array is nearly twice as fast as using
>clmath.atan2(in1, in2, queue), even when a memory pool is used to
>allocate the Array.
Oops. Thanks for reporting this.

It turns out that the Python reimplementation of the memory pool was
pretty broken--it didn't actually do much. With the fixed version now in
git, things look considerably better. In particular, when I try your
test code (on the AMD CPU implementation), the time difference between
the explicit out argument and the mempool version is now only about 10%.


I'm struggling to realise your improvements. Simply installing from master with `pip install -e .` (or similar) is actually substantially slower than the pypi installation.

I noticed that the pypi installation (`pip install pyopencl`) uses boost, the config for which seems to have been stripped out of the master version. Am I missing something regarding the use of libffi in building from source that yields the slowdown I've noticed?

(it's something like half the speed now, so not insignificant!).

Cheers,

Henry

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to