Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-27 Thread Stéfan van der Walt
Hey, Ondřej

2012/1/21 Ondřej Čertík :
> I read the Mandelbrot code using NumPy at this page:
>
> http://mentat.za.net/numpy/intro/intro.html

I wrote this as a tutorial for beginners, so the emphasis is on
simplicity.  Do you have any suggestions on how to improve the code
without obfuscating the tutorial?

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-25 Thread Ondřej Čertík
On Tue, Jan 24, 2012 at 4:33 PM, Mark Wiebe  wrote:
> 2012/1/21 Ondřej Čertík 
>>
>> 
>>
>>
>> Let me know if you figure out something. I think the "mask" thing is
>> quite slow, but the problem is that it needs to be there, to catch
>> overflows (and it is there in Fortran as well, see the
>> "where" statement, which does the same thing). Maybe there is some
>> other way to write the same thing in NumPy?
>
>
> In the current master, you can replace
>
>     z[mask] *= z[mask]
>     z[mask] += c[mask]
> with
>     np.multiply(z, z, out=z, where=mask)
>     np.add(z, c, out=z, where=mask)

I am getting:

Traceback (most recent call last):
  File "b.py", line 19, in 
np.multiply(z, z, out=z, where=mask)
TypeError: 'where' is an invalid keyword to ufunc 'multiply'

I assume it is a new feature in numpy?

>
> The performance of this alternate syntax is still not great, but it is
> significantly faster than what it replaces. For a particular choice of mask,
> I get
>
> In [40]: timeit z[mask] *= z[mask]
>
> 10 loops, best of 3: 29.1 ms per loop
>
> In [41]: timeit np.multiply(z, z, out=z, where=mask)
>
> 100 loops, best of 3: 4.2 ms per loop

That looks like 7x faster to me. If it works for you, can
you run the mandelbrot example with and without your patch?

That way we'll know the actual speedup.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-24 Thread Mark Wiebe
2012/1/21 Ondřej Čertík 

> 
>
> Let me know if you figure out something. I think the "mask" thing is
> quite slow, but the problem is that it needs to be there, to catch
> overflows (and it is there in Fortran as well, see the
> "where" statement, which does the same thing). Maybe there is some
> other way to write the same thing in NumPy?
>

In the current master, you can replace

z[mask] *= z[mask]
z[mask] += c[mask]
with
np.multiply(z, z, out=z, where=mask)
np.add(z, c, out=z, where=mask)

The performance of this alternate syntax is still not great, but it is
significantly faster than what it replaces. For a particular choice of
mask, I get

In [40]: timeit z[mask] *= z[mask]

10 loops, best of 3: 29.1 ms per loop

In [41]: timeit np.multiply(z, z, out=z, where=mask)

100 loops, best of 3: 4.2 ms per loop


-Mark


> Ondrej
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Samuel John
I'd like to add 
http://git.tiker.net/pyopencl.git/blob/HEAD:/examples/demo_mandelbrot.py to the 
discussion, since I use pyopencl  (http://mathema.tician.de/software/pyopencl) 
with great success in my daily scientific computing. Install with pip.

PyOpenCL does understand numpy arrays. You write a kernel (small c-program) 
directly into a python triple quoted strings and get a pythonic way to program 
GPU and core i5 and i7 CPUs with python Exception if something goes wrong. 
Whenever I hit a speed bottleneck that I cannot solve with pure numpy, I code a 
little part of the computation for GPU. The compilation is done just in time 
when you run the python code.

Especially for the mandelbrot this may be a _huge_ gain in speed since its 
embarrassingly parallel.

Samuel


On 23.01.2012, at 14:02, Robert Cimrman wrote:

> On 01/23/12 13:51, Sturla Molden wrote:
>> Den 23.01.2012 13:09, skrev Sebastian Haase:
>>> 
>>> I would think that interactive zooming would be quite nice
>>> ("illuminating")   and for that 13 secs would not be tolerable
>>> Well... it's not at the top of my priority list ... ;-)
>>> 
>> 
>> Sure, that comes under the 'fast enough' issue. But even Fortran might
>> be too slow here?
>> 
>> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader
>> (which would be a text string in Python):
>> 
>> madelbrot_fragment_shader = """
>> 
>> uniform sampler1D tex;
>> uniform vec2 center;
>> uniform float scale;
>> uniform int iter;
>> void main() {
>>  vec2 z, c;
>>  c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x;
>>  c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
>>  int i;
>>  z = c;
>>  for(i=0; i>  float x = (z.x * z.x - z.y * z.y) + c.x;
>>  float y = (z.y * z.x + z.x * z.y) + c.y;
>>  if((x * x + y * y)>   4.0) break;
>>  z.x = x;
>>  z.y = y;
>>  }
>>  gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
>> }
>> 
>> """
>> 
>> The rest is just boiler-plate OpenGL...
>> 
>> Sources:
>> 
>> http://nuclear.mutantstargoat.com/articles/sdr_fract/
>> 
>> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml
> 
> Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos 
> had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) 
> years ago (-> on a laughable hardware by today's standards).
> 
> r.
> 
> [1] http://wmi.math.u-szeged.hu/xaos/doku.php
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Robert Cimrman
On 01/23/12 13:51, Sturla Molden wrote:
> Den 23.01.2012 13:09, skrev Sebastian Haase:
>>
>> I would think that interactive zooming would be quite nice
>> ("illuminating")   and for that 13 secs would not be tolerable
>> Well... it's not at the top of my priority list ... ;-)
>>
>
> Sure, that comes under the 'fast enough' issue. But even Fortran might
> be too slow here?
>
> For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader
> (which would be a text string in Python):
>
> madelbrot_fragment_shader = """
>
> uniform sampler1D tex;
> uniform vec2 center;
> uniform float scale;
> uniform int iter;
> void main() {
>   vec2 z, c;
>   c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x;
>   c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
>   int i;
>   z = c;
>   for(i=0; i   float x = (z.x * z.x - z.y * z.y) + c.x;
>   float y = (z.y * z.x + z.x * z.y) + c.y;
>   if((x * x + y * y)>   4.0) break;
>   z.x = x;
>   z.y = y;
>   }
>   gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
> }
>
> """
>
> The rest is just boiler-plate OpenGL...
>
> Sources:
>
> http://nuclear.mutantstargoat.com/articles/sdr_fract/
>
> http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml

Off-topic comment: Or use some algorithmic cleverness, see [1]. I recall Xaos 
had interactive, extremely fast a fluid fractal zooming more than 10 (or 15?) 
years ago (-> on a laughable hardware by today's standards).

r.

[1] http://wmi.math.u-szeged.hu/xaos/doku.php
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Sturla Molden
Den 23.01.2012 13:09, skrev Sebastian Haase:
>
> I would think that interactive zooming would be quite nice
> ("illuminating")   and for that 13 secs would not be tolerable
> Well... it's not at the top of my priority list ... ;-)
>

Sure, that comes under the 'fast enough' issue. But even Fortran might 
be too slow here?

For zooming Mandelbrot I'd use PyOpenGL and a GLSL fragment shader 
(which would be a text string in Python):

madelbrot_fragment_shader = """

uniform sampler1D tex;
uniform vec2 center;
uniform float scale;
uniform int iter;
void main() {
 vec2 z, c;
 c.x = 1. * (gl_TexCoord[0].x - 0.5) * scale - center.x;
 c.y = (gl_TexCoord[0].y - 0.5) * scale - center.y;
 int i;
 z = c;
 for(i=0; i  4.0) break;
 z.x = x;
 z.y = y;
 }
 gl_FragColor = texture1D(tex, (i == iter ? 0.0 : float(i)) / 100.0);
}

"""

The rest is just boiler-plate OpenGL...

Sources:

http://nuclear.mutantstargoat.com/articles/sdr_fract/

http://pyopengl.sourceforge.net/context/tutorials/shader_1.xhtml


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Dag Sverre Seljebotn
On 01/23/2012 12:23 PM, Sturla Molden wrote:
> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
>> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>>> Hi all,
>>>
>>> I was reading this while learning about Pytables in more details and the
>>> origin of its efficiency. This sounds like a problem where out of core
>>> computation using pytables would shine since the dataset doesn't fit
>>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>>> C/Cythonizing the problem would be another good way...
>> Well, since the data certainly fits in RAM, one would use numexpr
>> directly (which is what pytables also uses).
>>
>>
>
> Personally I feel this debate is asking the wrong question.
>
> It is not uncommon for NumPy code to be 16x slower than C or Fortran.
> But that is not really interesting.
>
> This is what I think matters:
>
> - Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If
> it's fast enough, then just leave it.
>
> In this case, it seems Python takes ~13 seconds compared to ~1 second
> for Fortran. Sure, those extra 12 seconds could be annoying. But how
> much coding time should we spend to avoid them? 15 minutes? An hour? Two
> hours?
>
> Taking the time spent optimizing into account, then perhaps Python is
> 'faster' anyway? It is common to ask what is fastest for the computer.
> But we should really be asking what is fastest for our selves.
>
> For example: I have a computation that will take a day in Fortran or a
> month in Python (estimated). And I am going to run this code several
> times (20 or so, I think). In this case, yes, coding the bottlenecks in
> Fortran matters to me. But 13 seconds versus 1 second? I find that
> hardly interesting.

You, me, Ondrej, and many more are happy to learn 4 languages and use 
them where they are most appropriate.

But most scientists only want to learn and use one tool. And most 
scientists have both problems where performance doesn't matter, and 
problems where it does. So as long as examples like this exists, many 
people will prefer Fortran for *all* their tasks.

(Of course, that's why I got involved in Cython...)

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Sebastian Haase
On Mon, Jan 23, 2012 at 12:23 PM, Sturla Molden  wrote:
> Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
>> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>>> Hi all,
>>>
>>> I was reading this while learning about Pytables in more details and the
>>> origin of its efficiency. This sounds like a problem where out of core
>>> computation using pytables would shine since the dataset doesn't fit
>>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>>> C/Cythonizing the problem would be another good way...
>> Well, since the data certainly fits in RAM, one would use numexpr
>> directly (which is what pytables also uses).
>>
>>
>
> Personally I feel this debate is asking the wrong question.
>
> It is not uncommon for NumPy code to be 16x slower than C or Fortran.
> But that is not really interesting.
>
> This is what I think matters:
>
> - Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If
> it's fast enough, then just leave it.
>
> In this case, it seems Python takes ~13 seconds compared to ~1 second
> for Fortran. Sure, those extra 12 seconds could be annoying. But how
> much coding time should we spend to avoid them? 15 minutes? An hour? Two
> hours?
>
> Taking the time spent optimizing into account, then perhaps Python is
> 'faster' anyway? It is common to ask what is fastest for the computer.
> But we should really be asking what is fastest for our selves.
>
> For example: I have a computation that will take a day in Fortran or a
> month in Python (estimated). And I am going to run this code several
> times (20 or so, I think). In this case, yes, coding the bottlenecks in
> Fortran matters to me. But 13 seconds versus 1 second? I find that
> hardly interesting.
>
> Sturla


I would think that interactive zooming would be quite nice
("illuminating")   and for that 13 secs would not be tolerable
Well... it's not at the top of my priority list ... ;-)

-Sebastian Haase
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Sturla Molden
Den 23.01.2012 10:04, skrev Dag Sverre Seljebotn:
> On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
>> Hi all,
>>
>> I was reading this while learning about Pytables in more details and the
>> origin of its efficiency. This sounds like a problem where out of core
>> computation using pytables would shine since the dataset doesn't fit
>> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
>> C/Cythonizing the problem would be another good way...
> Well, since the data certainly fits in RAM, one would use numexpr
> directly (which is what pytables also uses).
>
>

Personally I feel this debate is asking the wrong question.

It is not uncommon for NumPy code to be 16x slower than C or Fortran. 
But that is not really interesting.

This is what I think matters:

- Is the NumPy code FAST ENOUGH?  If not, then go ahead and optimize. If 
it's fast enough, then just leave it.

In this case, it seems Python takes ~13 seconds compared to ~1 second 
for Fortran. Sure, those extra 12 seconds could be annoying. But how 
much coding time should we spend to avoid them? 15 minutes? An hour? Two 
hours?

Taking the time spent optimizing into account, then perhaps Python is 
'faster' anyway? It is common to ask what is fastest for the computer. 
But we should really be asking what is fastest for our selves.

For example: I have a computation that will take a day in Fortran or a 
month in Python (estimated). And I am going to run this code several 
times (20 or so, I think). In this case, yes, coding the bottlenecks in 
Fortran matters to me. But 13 seconds versus 1 second? I find that 
hardly interesting.

Sturla

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-23 Thread Dag Sverre Seljebotn
On 01/23/2012 05:35 AM, Jonathan Rocher wrote:
> Hi all,
>
> I was reading this while learning about Pytables in more details and the
> origin of its efficiency. This sounds like a problem where out of core
> computation using pytables would shine since the dataset doesn't fit
> into CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
> C/Cythonizing the problem would be another good way...

Well, since the data certainly fits in RAM, one would use numexpr 
directly (which is what pytables also uses).

Dag Sverre

>
> HTH,
> Jonathan
>
> 2012/1/22 Ondřej Čertík  >
>
> On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase
> mailto:seb.ha...@gmail.com>> wrote:
>  > How does the algorithm and timing compare to this one:
>  >
>  >
> 
> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
> 
> 
>  >
>  > The author of original version is  Dan Goodman
>  > # FAST FRACTALS WITH PYTHON AND NUMPY
>
> Thanks Sebastian. This one is much faster  2.7s on my laptop with
> the same dimensions/iterations.
>
> It uses a better datastructures -- it only keeps track of points that
> still need to be iterated --- very clever.
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.
>
> Ondrej
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org 
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> --
> Jonathan Rocher, PhD
> Scientific software developer
> Enthought, Inc.
> jroc...@enthought.com 
> 1-512-536-1057
> http://www.enthought.com 
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-22 Thread Dan Goodman

On 22/01/2012 20:01, Ondřej Čertík wrote:

On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase  wrote:

How does the algorithm and timing compare to this one:

http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f

The author of original version is  Dan Goodman
# FAST FRACTALS WITH PYTHON AND NUMPY


Thanks Sebastian. This one is much faster  2.7s on my laptop with
the same dimensions/iterations.

It uses a better datastructures -- it only keeps track of points that
still need to be iterated --- very clever.
If I have time, I'll try to provide an equivalent Fortran version too,
for comparison.


I spent a little while trying to optimise my algorithm using only numpy 
and couldn't get it running much faster than that. Given the relatively 
low number of iterations it's probably not a problem of Python 
overheads, so I guess it is indeed memory access that is the problem. 
One way to get round this using numexpr would be something like this. 
Write f(z)=z^2+c and then f(n+1,z)=f(n,f(z)). Now try out instead of 
computing z->f(z) each iteration, write down the formula for z->f(n,z) 
for a few different n and use that in numexpr, e.g. z->f(2,z) or 
z->(z^2+c)^2+c. This amounts to doing several iterations per step, but 
it means that you'll be spending more time doing floating point ops and 
less time waiting for memory operations so it might get closer to 
fortran/C speeds.


Actually, my curiosity was piqued so I tried it out. On my laptop I get 
that using the idea above gives a maximum speed increase for n=8, and 
after that you start to get overflow errors so it runs slower. At n=8 it 
runs about 4.5x faster than the original version. So if you got the same 
speedup it would be running in about 0.6s compared to your fortran 0.7s. 
However it's not a fair comparison as numexpr is using multiple cores 
(but only about 60% peak on my dual core laptop), but still nice to see 
what can be achieved with numexpr. :)


Code attached.

Dan
from numpy import *
import numexpr as ne

def mandel(n, m, itermax, xmin, xmax, ymin, ymax):
ix, iy = mgrid[0:n, 0:m]
x = linspace(xmin, xmax, n)[ix]
y = linspace(ymin, ymax, m)[iy]
c = x+complex(0,1)*y
del x, y # save a bit of memory, we only need z
img = zeros(c.shape, dtype=int)
ix.shape = n*m
iy.shape = n*m
c.shape = n*m
z = copy(c)
for i in xrange(itermax):
if not len(z): break # all points have escaped
multiply(z, z, z)
add(z, c, z)
rem = abs(z)>2.0
img[ix[rem], iy[rem]] = i+1
rem = -rem
z = z[rem]
ix, iy = ix[rem], iy[rem]
c = c[rem]
return img

def nemandel(n, m, itermax, xmin, xmax, ymin, ymax,
 depth=1):
expr = 'z**2+c'
for _ in xrange(depth-1):
expr = '({expr})**2+c'.format(expr=expr)
itermax = itermax/depth
print 'Expression used:', expr
ix, iy = mgrid[0:n, 0:m]
x = linspace(xmin, xmax, n)[ix]
y = linspace(ymin, ymax, m)[iy]
c = x+complex(0,1)*y
del x, y # save a bit of memory, we only need z
img = zeros(c.shape, dtype=int)
ix.shape = n*m
iy.shape = n*m
c.shape = n*m
z = copy(c)
for i in xrange(itermax):
if not len(z): break # all points have escaped
z = ne.evaluate(expr)
rem = abs(z)>2.0
img[ix[rem], iy[rem]] = i+1
rem = -rem
z = z[rem]
ix, iy = ix[rem], iy[rem]
c = c[rem]
img[img==0] = itermax+1
return img

if __name__=='__main__':
from pylab import *
import time
doplot = True
args = (1000, 1000, 100, -2, .5, -1.25, 1.25)
start = time.time()
I = mandel(*args)
print 'Mandel time taken:', time.time()-start
start = time.time()
I2 = nemandel(*args)
print 'Nemandel time taken:', time.time()-start
start = time.time()
I3 = nemandel(*args, depth=2)
print 'Nemandel 2 time taken:', time.time()-start
start = time.time()
I4 = nemandel(*args, depth=3)
print 'Nemandel 3 time taken:', time.time()-start
for d in xrange(4, 9):
start = time.time()
I4 = nemandel(*args, depth=d)
print 'Nemandel', d, 'time taken:', time.time()-start

if doplot:
subplot(221)
img = imshow(I.T, origin='lower left')
subplot(222)
img = imshow(I2.T, origin='lower left')
subplot(223)
img = imshow(I3.T, origin='lower left')
subplot(224)
img = imshow(I4.T, origin='lower left')
show()
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-22 Thread Jonathan Rocher
Hi all,

I was reading this while learning about Pytables in more details and the
origin of its efficiency. This sounds like a problem where out of core
computation using pytables would shine since the dataset doesn't fit into
CPU cache: http://www.pytables.org/moin/ComputingKernel. Of course
C/Cythonizing the problem would be another good way...

HTH,
Jonathan

2012/1/22 Ondřej Čertík 

> On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase 
> wrote:
> > How does the algorithm and timing compare to this one:
> >
> >
> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
> >
> > The author of original version is  Dan Goodman
> > # FAST FRACTALS WITH PYTHON AND NUMPY
>
> Thanks Sebastian. This one is much faster  2.7s on my laptop with
> the same dimensions/iterations.
>
> It uses a better datastructures -- it only keeps track of points that
> still need to be iterated --- very clever.
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.
>
> Ondrej
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Jonathan Rocher, PhD
Scientific software developer
Enthought, Inc.
jroc...@enthought.com
1-512-536-1057
http://www.enthought.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-22 Thread Chris Barker
2012/1/22 Ondřej Čertík :
> If I have time, I'll try to provide an equivalent Fortran version too,
> for comparison.
>
> Ondrej

here is a Cython example:

http://wiki.cython.org/examples/mandelbrot

I haven't looked to see if it's the same algorithm, but it may be
instructive, none the less.

-Chris

-- 
--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

chris.bar...@noaa.gov
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-22 Thread Ondřej Čertík
On Sun, Jan 22, 2012 at 3:13 AM, Sebastian Haase  wrote:
> How does the algorithm and timing compare to this one:
>
> http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f
>
> The author of original version is  Dan Goodman
> # FAST FRACTALS WITH PYTHON AND NUMPY

Thanks Sebastian. This one is much faster  2.7s on my laptop with
the same dimensions/iterations.

It uses a better datastructures -- it only keeps track of points that
still need to be iterated --- very clever.
If I have time, I'll try to provide an equivalent Fortran version too,
for comparison.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-22 Thread Dag Sverre Seljebotn
On 01/22/2012 04:55 AM, Ondřej Čertík wrote:
> Hi,
>
> I read the Mandelbrot code using NumPy at this page:
>
> http://mentat.za.net/numpy/intro/intro.html
>
> but when I run it, it gives me integer overflows. As such, I have
> fixed the code, so that it doesn't overflow here:
>
> https://gist.github.com/1655320
>
> and I have also written an equivalent Fortran program.
>
> You can compare both source codes to see
> that that it is pretty much one-to-one translation.
> The main idea in the above gist is to take an
> algorithm written in NumPy, and translate
> it directly to Fortran, without any special
> optimizations. So the above is my first try
> in Fortran. You can plot the result
> using this simple script (you
> can also just click on this gist to
> see the image there):
>
> https://gist.github.com/1655377
>
> Here are my timings:
>
> Python  Fortran Speedup
> Calculation 12.749  00.784  16.3x
> Saving  01.904  01.456  1.3x
> Total  14.653   02.240  6.5x
>
> I save the matrices to disk in an ascii format,
> so it's quite slow in both cases. The pure computation
> is however 16x faster in Fortran (in gfortran,
> I didn't even try Intel Fortran, that will probably be
> even faster).
>
> As such, I wonder how the NumPy version could be sped up?
> I have compiled NumPy with Lapack+Blas from source.

This is a pretty well known weakness with NumPy. In the Python code at 
least, each of c and z are about 15 MB, and the mask about 1 MB. So that 
doesn't fit in CPU cache, and so each and every statement you do in the 
loop transfer that data in and out of CPU cache the memory bus.

There's no quick fix -- you can try to reduce the working set so that it 
fits in CPU cache, but then the Python overhead often comes into play. 
Solutions include numexpr and Theano -- and as often as not, Cython and 
Fortran.

It's a good example, thanks!,

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-22 Thread Sebastian Haase
How does the algorithm and timing compare to this one:

http://code.google.com/p/priithon/source/browse/Priithon/mandel.py?spec=svna6117f5e81ec00abcfb037f0f9da2937bb2ea47f&r=a6117f5e81ec00abcfb037f0f9da2937bb2ea47f

The author of original version is  Dan Goodman
# FAST FRACTALS WITH PYTHON AND NUMPY


-Sebastian Haase

2012/1/22 Ondřej Čertík 
>
> Hi,
>
> I read the Mandelbrot code using NumPy at this page:
>
> http://mentat.za.net/numpy/intro/intro.html
>
> but when I run it, it gives me integer overflows. As such, I have
> fixed the code, so that it doesn't overflow here:
>
> https://gist.github.com/1655320
>
> and I have also written an equivalent Fortran program.
>
> You can compare both source codes to see
> that that it is pretty much one-to-one translation.
> The main idea in the above gist is to take an
> algorithm written in NumPy, and translate
> it directly to Fortran, without any special
> optimizations. So the above is my first try
> in Fortran. You can plot the result
> using this simple script (you
> can also just click on this gist to
> see the image there):
>
> https://gist.github.com/1655377
>
> Here are my timings:
>
>               Python  Fortran Speedup
> Calculation     12.749  00.784  16.3x
> Saving  01.904  01.456  1.3x
> Total          14.653   02.240  6.5x
>
> I save the matrices to disk in an ascii format,
> so it's quite slow in both cases. The pure computation
> is however 16x faster in Fortran (in gfortran,
> I didn't even try Intel Fortran, that will probably be
> even faster).
>
> As such, I wonder how the NumPy version could be sped up?
> I have compiled NumPy with Lapack+Blas from source.
>
> Would anyone be willing to run the NumPy version? Just copy+paste
> should do it.
>
> If you want to run the Fortran version, the above gist uses
> some of my other modules that I use in my other programs, my goal
> was to see how much more complicated the Fortran code gets,
> compared to NumPy. As such, I put here
>
> https://gist.github.com/1655350
>
> a single file
> with all the dependencies, just compile it like this:
>
> gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90
>
> and run:
>
> $ ./a.out
> Iteration 1
> Iteration 2
> ...
> Iteration 100
>  Saving...
>  Times:
>  Calculation:  0.748045999
>  Saving:   1.36408502
>  Total:   2.11213102
>
>
> Let me know if you figure out something. I think the "mask" thing is
> quite slow, but the problem is that it needs to be there, to catch
> overflows (and it is there in Fortran as well, see the
> "where" statement, which does the same thing). Maybe there is some
> other way to write the same thing in NumPy?
>
> Ondrej
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] The NumPy Mandelbrot code 16x slower than Fortran

2012-01-21 Thread Ondřej Čertík
Hi,

I read the Mandelbrot code using NumPy at this page:

http://mentat.za.net/numpy/intro/intro.html

but when I run it, it gives me integer overflows. As such, I have
fixed the code, so that it doesn't overflow here:

https://gist.github.com/1655320

and I have also written an equivalent Fortran program.

You can compare both source codes to see
that that it is pretty much one-to-one translation.
The main idea in the above gist is to take an
algorithm written in NumPy, and translate
it directly to Fortran, without any special
optimizations. So the above is my first try
in Fortran. You can plot the result
using this simple script (you
can also just click on this gist to
see the image there):

https://gist.github.com/1655377

Here are my timings:

   Python  Fortran Speedup
Calculation 12.749  00.784  16.3x
Saving  01.904  01.456  1.3x
Total  14.653   02.240  6.5x

I save the matrices to disk in an ascii format,
so it's quite slow in both cases. The pure computation
is however 16x faster in Fortran (in gfortran,
I didn't even try Intel Fortran, that will probably be
even faster).

As such, I wonder how the NumPy version could be sped up?
I have compiled NumPy with Lapack+Blas from source.

Would anyone be willing to run the NumPy version? Just copy+paste
should do it.

If you want to run the Fortran version, the above gist uses
some of my other modules that I use in my other programs, my goal
was to see how much more complicated the Fortran code gets,
compared to NumPy. As such, I put here

https://gist.github.com/1655350

a single file
with all the dependencies, just compile it like this:

gfortran -fPIC -O3 -march=native -ffast-math -funroll-loops mandelbrot.f90

and run:

$ ./a.out
Iteration 1
Iteration 2
...
Iteration 100
 Saving...
 Times:
 Calculation:  0.748045999
 Saving:   1.36408502
 Total:   2.11213102


Let me know if you figure out something. I think the "mask" thing is
quite slow, but the problem is that it needs to be there, to catch
overflows (and it is there in Fortran as well, see the
"where" statement, which does the same thing). Maybe there is some
other way to write the same thing in NumPy?

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion