A Monday 10 January 2011 19:29:33 Mark Wiebe escrigué:
so, the new code is just 5% slower. I suppose that removing the
NPY_ITER_ALIGNED flag would give us a bit more performance, but
that's great as it is now. How did you do that? Your new_iter
branch in NumPy already deals with
A Tuesday 11 January 2011 06:45:28 Mark Wiebe escrigué:
On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe mwwi...@gmail.com
wrote:
I'm a bit curious why the jump from 1 to 2 threads is scaling so
poorly.
Your timings have improvement factors of 1.85, 1.68, 1.64, and
1.79. Since
the
A Sunday 09 January 2011 23:45:02 Mark Wiebe escrigué:
As a benchmark of C-based iterator usage and to make it work properly
in a multi-threaded context, I've updated numexpr to use the new
iterator. In addition to some performance improvements, this also
made it easy to add optional out= and
A Monday 10 January 2011 11:05:27 Francesc Alted escrigué:
Also, I'd like to try out the new thread scheduling that you
suggested to me privately (i.e. T0T1T0T1... vs T0T0...T1T1...).
I've just implemented the new partition schema in numexpr
(T0T0...T1T1..., being the original T0T1T0T1...).
On Mon, Jan 10, 2011 at 2:05 AM, Francesc Alted fal...@pytables.org wrote:
snip
Your patch looks mostly fine to my eyes; good job! Unfortunately, I've
been unable to compile your new_iterator branch of NumPy:
numpy/core/src/multiarray/multiarraymodule.c:45:33: fatal error:
A Monday 10 January 2011 17:54:16 Mark Wiebe escrigué:
Apparently, you forgot to add the new_iterator_pywrap.h file.
Oops, that's added now.
Excellent. It works now.
The aligned case should just be a matter of conditionally removing
the NPY_ITER_ALIGNED flag in two places.
Wow, the
On Mon, Jan 10, 2011 at 9:47 AM, Francesc Alted fal...@pytables.org wrote:
snip
so, the new code is just 5% slower. I suppose that removing the
NPY_ITER_ALIGNED flag would give us a bit more performance, but that's
great as it is now. How did you do that? Your new_iter branch in NumPy
I'm a bit curious why the jump from 1 to 2 threads is scaling so poorly.
Your timings have improvement factors of 1.85, 1.68, 1.64, and 1.79. Since
the computation is trivial data parallelism, and I believe it's still pretty
far off the memory bandwidth limit, I would expect a speedup of 1.95 or
On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe mwwi...@gmail.com wrote:
I'm a bit curious why the jump from 1 to 2 threads is scaling so poorly.
Your timings have improvement factors of 1.85, 1.68, 1.64, and 1.79. Since
the computation is trivial data parallelism, and I believe it's still
As a benchmark of C-based iterator usage and to make it work properly in a
multi-threaded context, I've updated numexpr to use the new iterator. In
addition to some performance improvements, this also made it easy to add
optional out= and order= parameters to the evaluate function. The numexpr
Is evaluate_iter basically numpexpr but using your numpy branch or are there
other changes?
On Sun, Jan 9, 2011 at 2:45 PM, Mark Wiebe mwwi...@gmail.com wrote:
As a benchmark of C-based iterator usage and to make it work properly in a
multi-threaded context, I've updated numexpr to use the new
That's right, essentially all I've done is replaced the code that handled
preparing the arrays and producing blocks of values for the inner loops.
There are three new parameters to evaluate_iter as well. It has an out=
parameter just like ufuncs do, an order= parameter which controls the
layout
12 matches
Mail list logo