Re: [Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit

2009-11-04 Thread George Nurser
2009/11/3 Robin robi...@gmail.com:
 On Tue, Nov 3, 2009 at 6:14 PM, Robin robi...@gmail.com wrote:
 After some more pootling about I figured out a lot of the performance
 loss comes from using 32 bit integers by default when compiles 64 bit.
 I asked this question on stackoverflow:
 http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability

This seems surprising -- our HPC fortran codes use 32 bit integers on
64 bit linux. Do you get a performance hit in a pure fortran program?
Is it a problem with the gfortran compiler perhaps?

 is there any way to use fortran with f2py from python in a way that
 doesn't require the code to be changed depending on platform?

 Including the -DF2PY_REPORT_ON_ARRAY_COPY option showed that the big
 performance hit was from f2py copying the arrays to cast from 64 bit
 to 32 bit.

Fortran 90 introduced the INTERFACE block, which allows you to use
different variable types as arguments to what appears externally to be
the same routine. It then feeds the arguments to the appropriate
version of the routine. I don't think f2py supports this, but it would
be really useful if it could.

Regards, George Nurser.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit

2009-11-03 Thread Robin
Hi,

I'm not sure if this is of much interest but it's been really puzzling
me so I thought I'd ask.

In an earlier post I described how I was surprised a simple f2py
wrapped fortran bincount was 4x faster than np.bincount - but that
differential only seemed to be on my mac; on moving to linux they both
took more or less the same time. I'm trying to work out if it is worth
moving some of my bottlenecks to fortran (most of which are np
builtins). So far it looks like it is - but only on my mac and only
32bit (see below).
Well the only explanation I thought was that the gcc-4.0 used to build
numpy on a mac didn't perform so well, so after upgrading to snow
leopard I've been trying to look at this again. I was hoping I could
get the equivalent performance on my mac, like on linux, which would
result in the np c stuff being a couple of times faster.

So far, with Python 2.6.3 in 64 bit - numpy seems to be significantly
slower and my fortran code _much_ slower - even from the same
compiler. Can anyone help me understand what is going on?

I have only been able to build 32 bit numpy against 2.5.4 with apple
gcc-4.0 and 64 bit numpy against 2.6.3 universal with gcc-4.2. I
haven't been able to get a numpy I can import on 2.6.3 in 32 bit mode
( http://projects.scipy.org/numpy/ticket/1221 ).

Here are the results for python.org 32 bit 2.5.4, numpy compiled with
apple gcc 4.0, f2py using att gfortran 4.2:
In [2]: timeit x = np.random.random_integers(0,1023,1).astype(int)
1 loops, best of 3: 2.86 s per loop
In [3]: x = np.random.random_integers(0,1023,1).astype(int)
In [4]: timeit np.bincount(x)
1 loops, best of 3: 435 ms per loop
In [6]: timeit gf42.bincount(x,1024)
10 loops, best of 3: 129 ms per loop
In [7]: np.__version__
Out[7]: '1.4.0.dev7618'

And for self-built (apple gcc 4.2) 64 bit 2.6.3, numpy compiled with
apple gcc 4.2, f2py using the same att gfortran 4.2:
In [3]: timeit x = np.random.random_integers(0,1023,1).astype(int)
1 loops, best of 3: 3.91 s per loop  # 37% slower than 32bit
In [4]: x = np.random.random_integers(0,1023,1).astype(int)
In [5]: timeit np.bincount(x)
1 loops, best of 3: 582 ms per loop # 34 % slower than 32 bit
In [8]: timeit gf42_64.bincount(x,1024)
1 loops, best of 3: 803 ms per loop # 522% slower than 32 bit


So why is there this big difference in performance? I'd really like to
know why the fortran compiled with the same compiler is so much slower
in 64 bit mode. As far as I can tell the flags used are the same. Also
why is numpy slower. I was surprised the I was able to import the 64
bit universal module built with f2py from 2.6 inside 32 bit 3.5 and
there it was quick again - so it seems the x64_64 code generated by
the fortran compiler is much slower (rather than any wrappers or
such).

I tried using some more recent gfortrans from macports - but could
only use them to build modules against the 64 bit python/numpy since I
couldn't find a way to get f2py to force 32 bit output. But the
performance was more or less the same (always several times slower the
32 bit att gfortran).

Any advice appreciated.

Cheers

Robin


subroutine bincount (x,c,n,m)
implicit none
integer, intent(in) :: n,m
integer, dimension(0:n-1), intent(in) :: x
integer, dimension(0:m-1), intent(out) :: c
integer :: i

c = 0
do i = 0, n-1
c(x(i)) = c(x(i)) + 1
end do
end
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit

2009-11-03 Thread Robin
After some more pootling about I figured out a lot of the performance
loss comes from using 32 bit integers by default when compiles 64 bit.
I asked this question on stackoverflow:
http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability

is there any way to use fortran with f2py from python in a way that
doesn't require the code to be changed depending on platform?

Or should I just pack it all in and use weave?

Robin

On Tue, Nov 3, 2009 at 4:29 PM, Robin robi...@gmail.com wrote:
 Hi,

 I'm not sure if this is of much interest but it's been really puzzling
 me so I thought I'd ask.

 In an earlier post I described how I was surprised a simple f2py
 wrapped fortran bincount was 4x faster than np.bincount - but that
 differential only seemed to be on my mac; on moving to linux they both
 took more or less the same time. I'm trying to work out if it is worth
 moving some of my bottlenecks to fortran (most of which are np
 builtins). So far it looks like it is - but only on my mac and only
 32bit (see below).
 Well the only explanation I thought was that the gcc-4.0 used to build
 numpy on a mac didn't perform so well, so after upgrading to snow
 leopard I've been trying to look at this again. I was hoping I could
 get the equivalent performance on my mac, like on linux, which would
 result in the np c stuff being a couple of times faster.

 So far, with Python 2.6.3 in 64 bit - numpy seems to be significantly
 slower and my fortran code _much_ slower - even from the same
 compiler. Can anyone help me understand what is going on?

 I have only been able to build 32 bit numpy against 2.5.4 with apple
 gcc-4.0 and 64 bit numpy against 2.6.3 universal with gcc-4.2. I
 haven't been able to get a numpy I can import on 2.6.3 in 32 bit mode
 ( http://projects.scipy.org/numpy/ticket/1221 ).

 Here are the results for python.org 32 bit 2.5.4, numpy compiled with
 apple gcc 4.0, f2py using att gfortran 4.2:
 In [2]: timeit x = np.random.random_integers(0,1023,1).astype(int)
 1 loops, best of 3: 2.86 s per loop
 In [3]: x = np.random.random_integers(0,1023,1).astype(int)
 In [4]: timeit np.bincount(x)
 1 loops, best of 3: 435 ms per loop
 In [6]: timeit gf42.bincount(x,1024)
 10 loops, best of 3: 129 ms per loop
 In [7]: np.__version__
 Out[7]: '1.4.0.dev7618'

 And for self-built (apple gcc 4.2) 64 bit 2.6.3, numpy compiled with
 apple gcc 4.2, f2py using the same att gfortran 4.2:
 In [3]: timeit x = np.random.random_integers(0,1023,1).astype(int)
 1 loops, best of 3: 3.91 s per loop  # 37% slower than 32bit
 In [4]: x = np.random.random_integers(0,1023,1).astype(int)
 In [5]: timeit np.bincount(x)
 1 loops, best of 3: 582 ms per loop # 34 % slower than 32 bit
 In [8]: timeit gf42_64.bincount(x,1024)
 1 loops, best of 3: 803 ms per loop # 522% slower than 32 bit


 So why is there this big difference in performance? I'd really like to
 know why the fortran compiled with the same compiler is so much slower
 in 64 bit mode. As far as I can tell the flags used are the same. Also
 why is numpy slower. I was surprised the I was able to import the 64
 bit universal module built with f2py from 2.6 inside 32 bit 3.5 and
 there it was quick again - so it seems the x64_64 code generated by
 the fortran compiler is much slower (rather than any wrappers or
 such).

 I tried using some more recent gfortrans from macports - but could
 only use them to build modules against the 64 bit python/numpy since I
 couldn't find a way to get f2py to force 32 bit output. But the
 performance was more or less the same (always several times slower the
 32 bit att gfortran).

 Any advice appreciated.

 Cheers

 Robin

 
 subroutine bincount (x,c,n,m)
    implicit none
    integer, intent(in) :: n,m
    integer, dimension(0:n-1), intent(in) :: x
    integer, dimension(0:m-1), intent(out) :: c
    integer :: i

    c = 0
    do i = 0, n-1
        c(x(i)) = c(x(i)) + 1
    end do
 end

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit

2009-11-03 Thread Robin
On Tue, Nov 3, 2009 at 6:14 PM, Robin robi...@gmail.com wrote:
 After some more pootling about I figured out a lot of the performance
 loss comes from using 32 bit integers by default when compiles 64 bit.
 I asked this question on stackoverflow:
 http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability

 is there any way to use fortran with f2py from python in a way that
 doesn't require the code to be changed depending on platform?

Including the -DF2PY_REPORT_ON_ARRAY_COPY option showed that the big
performance hit was from f2py copying the arrays to cast from 64 bit
to 32 bit.

Is there a recommended way to easily write fortran extensions that
work on both 32bit and 64bit machines (something like using
-fdefault-int-8 and f2py not casting on a 64 bit platform, and not
using the option and not casting on a 32 bit platform)?

Cheers

Robin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion