Re: [Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit
2009/11/3 Robin robi...@gmail.com: On Tue, Nov 3, 2009 at 6:14 PM, Robin robi...@gmail.com wrote: After some more pootling about I figured out a lot of the performance loss comes from using 32 bit integers by default when compiles 64 bit. I asked this question on stackoverflow: http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability This seems surprising -- our HPC fortran codes use 32 bit integers on 64 bit linux. Do you get a performance hit in a pure fortran program? Is it a problem with the gfortran compiler perhaps? is there any way to use fortran with f2py from python in a way that doesn't require the code to be changed depending on platform? Including the -DF2PY_REPORT_ON_ARRAY_COPY option showed that the big performance hit was from f2py copying the arrays to cast from 64 bit to 32 bit. Fortran 90 introduced the INTERFACE block, which allows you to use different variable types as arguments to what appears externally to be the same routine. It then feeds the arguments to the appropriate version of the routine. I don't think f2py supports this, but it would be really useful if it could. Regards, George Nurser. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit
Hi, I'm not sure if this is of much interest but it's been really puzzling me so I thought I'd ask. In an earlier post I described how I was surprised a simple f2py wrapped fortran bincount was 4x faster than np.bincount - but that differential only seemed to be on my mac; on moving to linux they both took more or less the same time. I'm trying to work out if it is worth moving some of my bottlenecks to fortran (most of which are np builtins). So far it looks like it is - but only on my mac and only 32bit (see below). Well the only explanation I thought was that the gcc-4.0 used to build numpy on a mac didn't perform so well, so after upgrading to snow leopard I've been trying to look at this again. I was hoping I could get the equivalent performance on my mac, like on linux, which would result in the np c stuff being a couple of times faster. So far, with Python 2.6.3 in 64 bit - numpy seems to be significantly slower and my fortran code _much_ slower - even from the same compiler. Can anyone help me understand what is going on? I have only been able to build 32 bit numpy against 2.5.4 with apple gcc-4.0 and 64 bit numpy against 2.6.3 universal with gcc-4.2. I haven't been able to get a numpy I can import on 2.6.3 in 32 bit mode ( http://projects.scipy.org/numpy/ticket/1221 ). Here are the results for python.org 32 bit 2.5.4, numpy compiled with apple gcc 4.0, f2py using att gfortran 4.2: In [2]: timeit x = np.random.random_integers(0,1023,1).astype(int) 1 loops, best of 3: 2.86 s per loop In [3]: x = np.random.random_integers(0,1023,1).astype(int) In [4]: timeit np.bincount(x) 1 loops, best of 3: 435 ms per loop In [6]: timeit gf42.bincount(x,1024) 10 loops, best of 3: 129 ms per loop In [7]: np.__version__ Out[7]: '1.4.0.dev7618' And for self-built (apple gcc 4.2) 64 bit 2.6.3, numpy compiled with apple gcc 4.2, f2py using the same att gfortran 4.2: In [3]: timeit x = np.random.random_integers(0,1023,1).astype(int) 1 loops, best of 3: 3.91 s per loop # 37% slower than 32bit In [4]: x = np.random.random_integers(0,1023,1).astype(int) In [5]: timeit np.bincount(x) 1 loops, best of 3: 582 ms per loop # 34 % slower than 32 bit In [8]: timeit gf42_64.bincount(x,1024) 1 loops, best of 3: 803 ms per loop # 522% slower than 32 bit So why is there this big difference in performance? I'd really like to know why the fortran compiled with the same compiler is so much slower in 64 bit mode. As far as I can tell the flags used are the same. Also why is numpy slower. I was surprised the I was able to import the 64 bit universal module built with f2py from 2.6 inside 32 bit 3.5 and there it was quick again - so it seems the x64_64 code generated by the fortran compiler is much slower (rather than any wrappers or such). I tried using some more recent gfortrans from macports - but could only use them to build modules against the 64 bit python/numpy since I couldn't find a way to get f2py to force 32 bit output. But the performance was more or less the same (always several times slower the 32 bit att gfortran). Any advice appreciated. Cheers Robin subroutine bincount (x,c,n,m) implicit none integer, intent(in) :: n,m integer, dimension(0:n-1), intent(in) :: x integer, dimension(0:m-1), intent(out) :: c integer :: i c = 0 do i = 0, n-1 c(x(i)) = c(x(i)) + 1 end do end ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit
After some more pootling about I figured out a lot of the performance loss comes from using 32 bit integers by default when compiles 64 bit. I asked this question on stackoverflow: http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability is there any way to use fortran with f2py from python in a way that doesn't require the code to be changed depending on platform? Or should I just pack it all in and use weave? Robin On Tue, Nov 3, 2009 at 4:29 PM, Robin robi...@gmail.com wrote: Hi, I'm not sure if this is of much interest but it's been really puzzling me so I thought I'd ask. In an earlier post I described how I was surprised a simple f2py wrapped fortran bincount was 4x faster than np.bincount - but that differential only seemed to be on my mac; on moving to linux they both took more or less the same time. I'm trying to work out if it is worth moving some of my bottlenecks to fortran (most of which are np builtins). So far it looks like it is - but only on my mac and only 32bit (see below). Well the only explanation I thought was that the gcc-4.0 used to build numpy on a mac didn't perform so well, so after upgrading to snow leopard I've been trying to look at this again. I was hoping I could get the equivalent performance on my mac, like on linux, which would result in the np c stuff being a couple of times faster. So far, with Python 2.6.3 in 64 bit - numpy seems to be significantly slower and my fortran code _much_ slower - even from the same compiler. Can anyone help me understand what is going on? I have only been able to build 32 bit numpy against 2.5.4 with apple gcc-4.0 and 64 bit numpy against 2.6.3 universal with gcc-4.2. I haven't been able to get a numpy I can import on 2.6.3 in 32 bit mode ( http://projects.scipy.org/numpy/ticket/1221 ). Here are the results for python.org 32 bit 2.5.4, numpy compiled with apple gcc 4.0, f2py using att gfortran 4.2: In [2]: timeit x = np.random.random_integers(0,1023,1).astype(int) 1 loops, best of 3: 2.86 s per loop In [3]: x = np.random.random_integers(0,1023,1).astype(int) In [4]: timeit np.bincount(x) 1 loops, best of 3: 435 ms per loop In [6]: timeit gf42.bincount(x,1024) 10 loops, best of 3: 129 ms per loop In [7]: np.__version__ Out[7]: '1.4.0.dev7618' And for self-built (apple gcc 4.2) 64 bit 2.6.3, numpy compiled with apple gcc 4.2, f2py using the same att gfortran 4.2: In [3]: timeit x = np.random.random_integers(0,1023,1).astype(int) 1 loops, best of 3: 3.91 s per loop # 37% slower than 32bit In [4]: x = np.random.random_integers(0,1023,1).astype(int) In [5]: timeit np.bincount(x) 1 loops, best of 3: 582 ms per loop # 34 % slower than 32 bit In [8]: timeit gf42_64.bincount(x,1024) 1 loops, best of 3: 803 ms per loop # 522% slower than 32 bit So why is there this big difference in performance? I'd really like to know why the fortran compiled with the same compiler is so much slower in 64 bit mode. As far as I can tell the flags used are the same. Also why is numpy slower. I was surprised the I was able to import the 64 bit universal module built with f2py from 2.6 inside 32 bit 3.5 and there it was quick again - so it seems the x64_64 code generated by the fortran compiler is much slower (rather than any wrappers or such). I tried using some more recent gfortrans from macports - but could only use them to build modules against the 64 bit python/numpy since I couldn't find a way to get f2py to force 32 bit output. But the performance was more or less the same (always several times slower the 32 bit att gfortran). Any advice appreciated. Cheers Robin subroutine bincount (x,c,n,m) implicit none integer, intent(in) :: n,m integer, dimension(0:n-1), intent(in) :: x integer, dimension(0:m-1), intent(out) :: c integer :: i c = 0 do i = 0, n-1 c(x(i)) = c(x(i)) + 1 end do end ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] strange performance on mac 2.5/2.6 32/64 bit
On Tue, Nov 3, 2009 at 6:14 PM, Robin robi...@gmail.com wrote: After some more pootling about I figured out a lot of the performance loss comes from using 32 bit integers by default when compiles 64 bit. I asked this question on stackoverflow: http://stackoverflow.com/questions/1668899/fortran-32-bit-64-bit-performance-portability is there any way to use fortran with f2py from python in a way that doesn't require the code to be changed depending on platform? Including the -DF2PY_REPORT_ON_ARRAY_COPY option showed that the big performance hit was from f2py copying the arrays to cast from 64 bit to 32 bit. Is there a recommended way to easily write fortran extensions that work on both 32bit and 64bit machines (something like using -fdefault-int-8 and f2py not casting on a 64 bit platform, and not using the option and not casting on a 32 bit platform)? Cheers Robin ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion