Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
np.ones((2,N)).mean(1) produce different results with the implementation in
the current master? If so, I think that would be very much regrettable; and
if this is a minority opinion, I do hope that at least this gets documented
in a most explicit manner.


On Sun, Jul 27, 2014 at 8:26 PM, Sturla Molden sturla.mol...@gmail.com
wrote:

 Nathaniel Smith n...@pobox.com wrote:

  The problem here is that when summing up the values, the sum gets
  large enough that after rounding, x + 1 = x and the sum stops
  increasing.

 Interesting. That explains why the divide-and-conquer reduction is much
 more robust.

 Thanks :)


 Sturla

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote:
 To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
 np.ones((2,N)).mean(1) produce different results with the
 implementation in the current master? If so, I think that would be
 very much regrettable; and if this is a minority opinion, I do hope
 that at least this gets documented in a most explicit manner.
 

This will always give you different results. Though in master. the
difference is more likely to be large, since (often the second one)
maybe be less likely to run into bigger numerical issues.

 
 On Sun, Jul 27, 2014 at 8:26 PM, Sturla Molden
 sturla.mol...@gmail.com wrote:
 Nathaniel Smith n...@pobox.com wrote:
 
  The problem here is that when summing up the values, the sum
 gets
  large enough that after rounding, x + 1 = x and the sum
 stops
  increasing.
 
 
 Interesting. That explains why the divide-and-conquer
 reduction is much
 more robust.
 
 Thanks :)
 
 
 Sturla
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread alex
On Mon, Jul 28, 2014 at 8:46 AM, Sebastian Berg
sebast...@sipsolutions.net wrote:
 On Mo, 2014-07-28 at 14:37 +0200, Eelco Hoogendoorn wrote:
 To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
 np.ones((2,N)).mean(1) produce different results with the
 implementation in the current master? If so, I think that would be
 very much regrettable; and if this is a minority opinion, I do hope
 that at least this gets documented in a most explicit manner.


 This will always give you different results. Though in master. the
 difference is more likely to be large, since (often the second one)
 maybe be less likely to run into bigger numerical issues.

Are you sure they always give different results?  Notice that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so these
differences 'cancel out'.

My understanding of the question is to clarify how numpy reduction
algorithms are special-cased for the fast axis vs. other axes.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-07-28 Thread Carl Kleffner
Hi,

on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads I uploaded
7z-archives for mingw-w64 and for OpenBLAS-0.2.10 for 32 bit and for 64
bit.
To use mingw-w64 for Python = 3.3 you have to manually tweak the so called
specs file - see readme.txt in the archive.

Regards

Carl



2014-07-28 4:32 GMT+02:00 Robert McGibbon rmcgi...@gmail.com:

 I forked Olivier's example project to use the same infrastructure for
 building conda binaries and deploying them to binstar, which might also be
 useful for some projects.

 https://github.com/rmcgibbo/python-appveyor-conda-example

 -Robert


 On Wed, Jul 9, 2014 at 3:53 PM, Robert McGibbon rmcgi...@gmail.com
 wrote:

 This is an awesome resource for tons of projects.

 Thanks Olivier!

 -Robert


 On Wed, Jul 9, 2014 at 7:00 AM, Olivier Grisel olivier.gri...@ensta.org
 wrote:

 Feodor updated the AppVeyor nodes to have the Windows SDK matching
 MSVC 2008 Express for Python 2. I have updated my sample scripts and
 we now have a working example of a free CI system for:

 Python 2 and 3 both for 32 and 64 bit architectures.

 https://github.com/ogrisel/python-appveyor-demo

 Best,

 --
 Olivier
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Daπid
On 28 July 2014 14:46, Sebastian Berg sebast...@sipsolutions.net wrote:

  To rephrase my most pressing question: may np.ones((N,2)).mean(0) and
  np.ones((2,N)).mean(1) produce different results with the
  implementation in the current master? If so, I think that would be
  very much regrettable; and if this is a minority opinion, I do hope
  that at least this gets documented in a most explicit manner.
 

 This will always give you different results. Though in master. the
 difference is more likely to be large, since (often the second one)
 maybe be less likely to run into bigger numerical issues.



An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
float32):

for N in np.logspace(2, 6):
print N, (np.ones((N,2), dtype=np.float16).mean(0), np.ones((2,N),
dtype=np.float16).mean(1))

The first one gives correct results up to 2049, from where the values start
to fall. The second one, on the other hand, gives correct results up to
65519, where it blows to infinity.

Interestingly, in the second case there are fluctuations. For example, for
N = 65424, the mean is 0.99951172, but 1 for the next and previous numbers.
But I think they are just an effect of the rounding, as:

In [33]: np.ones(N+1, dtype=np.float16).sum() - N
Out[33]: 16.0

In [35]: np.ones(N+1, dtype=np.float16).sum() - (N +1)
Out[35]: 15.0

In [36]: np.ones(N-1, dtype=np.float16).sum() - (N -1)
Out[36]: -15.0

In [37]: N = 65519 - 20

In [38]: np.ones(N, dtype=np.float16).sum() - N
Out[38]: 5.0

In [39]: np.ones(N+1, dtype=np.float16).sum() - (N +1)
Out[39]: 4.0

In [40]: np.ones(N-1, dtype=np.float16).sum() - (N -1)
Out[40]: 6.0
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sturla Molden
On 28/07/14 15:21, alex wrote:

 Are you sure they always give different results?  Notice that
 np.ones((N,2)).mean(0)
 np.ones((2,N)).mean(1)
 compute means of different axes on transposed arrays so these
 differences 'cancel out'.

They will be if different algorithms are used. np.ones((N,2)).mean(0) 
will have larger accumulated rounding error than np.ones((2,N)).mean(1), 
if only the latter uses the divide-and-conquer summation.

I would suggest that in the first case we try to copy the array to a 
temporary contiguous buffer and use the same divide-and-conquer 
algorithm, unless some heuristics on memory usage fails.

Sturla



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Fabien
On 28.07.2014 15:30, Daπid wrote:
 An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
 float32):

Why aren't there differences between float16 and float32 ?

Could this be related to my earlier post in this thread where I 
mentioned summation problems occurring much earlier in numpy than in IDL?

Fabien

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote:
 On 28/07/14 15:21, alex wrote:
 
  Are you sure they always give different results?  Notice that
  np.ones((N,2)).mean(0)
  np.ones((2,N)).mean(1)
  compute means of different axes on transposed arrays so these
  differences 'cancel out'.
 
 They will be if different algorithms are used. np.ones((N,2)).mean(0) 
 will have larger accumulated rounding error than np.ones((2,N)).mean(1), 
 if only the latter uses the divide-and-conquer summation.
 

What I wanted to point out is that to some extend the algorithm does not
matter. You will not necessarily get identical results already if you
use a different iteration order, and we have been doing that for years
for speed reasons. All libs like BLAS do the same.
Yes, the new changes make this much more dramatic, but they only make
some paths much better, never worse. It might be dangerous, but only in
the sense that you test it with the good path and it works good enough,
but later (also) use the other one in some lib. I am not even sure if I

 I would suggest that in the first case we try to copy the array to a 
 temporary contiguous buffer and use the same divide-and-conquer 
 algorithm, unless some heuristics on memory usage fails.
 

Sure, but you have to make major changes to the buffered iterator to do
that without larger speed implications. It might be a good idea, but it
requires someone who knows this stuff to spend a lot of time and care in
the depths of numpy.

 Sturla
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 15:50 +0200, Fabien wrote:
 On 28.07.2014 15:30, Daπid wrote:
  An example using float16 on Numpy 1.8.1 (I haven't seen diferences with
  float32):
 
 Why aren't there differences between float16 and float32 ?
 

float16 calculations are actually float32 calculations. If done along
the fast axis they will not get rounded in between (within those 8192
elements chunks). Basically something like the difference we are talking
about for float32 and float64 has for years existed in float16.

 Could this be related to my earlier post in this thread where I 
 mentioned summation problems occurring much earlier in numpy than in IDL?
 
 Fabien
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
Sebastian:

Those are good points. Indeed iteration order may already produce different
results, even though the semantics of numpy suggest identical operations.
Still, I feel this different behavior without any semantical clues is
something to be minimized.

Indeed copying might have large speed implications. But on second thought,
does it? Either the data is already aligned and no copy is required, or it
isn't aligned, and we need one pass of cache inefficient access to the data
anyway. Infact, if we had one low level function which does
cache-intelligent transposition of numpy arrays (using some block
strategy), this might be faster even than the current simple reduction
operations when forced to work on awkwardly aligned data. Ideally, this
intelligent access and intelligent reduction would be part of a single pass
of course; but that wouldn't really fit within the numpy design, and merely
such an intelligent transpose would provide most of the benefit I think. Or
is the mechanism behind ascontiguousarray already intelligent in this sense?


On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote:
  On 28/07/14 15:21, alex wrote:
 
   Are you sure they always give different results?  Notice that
   np.ones((N,2)).mean(0)
   np.ones((2,N)).mean(1)
   compute means of different axes on transposed arrays so these
   differences 'cancel out'.
 
  They will be if different algorithms are used. np.ones((N,2)).mean(0)
  will have larger accumulated rounding error than np.ones((2,N)).mean(1),
  if only the latter uses the divide-and-conquer summation.
 

 What I wanted to point out is that to some extend the algorithm does not
 matter. You will not necessarily get identical results already if you
 use a different iteration order, and we have been doing that for years
 for speed reasons. All libs like BLAS do the same.
 Yes, the new changes make this much more dramatic, but they only make
 some paths much better, never worse. It might be dangerous, but only in
 the sense that you test it with the good path and it works good enough,
 but later (also) use the other one in some lib. I am not even sure if I

  I would suggest that in the first case we try to copy the array to a
  temporary contiguous buffer and use the same divide-and-conquer
  algorithm, unless some heuristics on memory usage fails.
 

 Sure, but you have to make major changes to the buffered iterator to do
 that without larger speed implications. It might be a good idea, but it
 requires someone who knows this stuff to spend a lot of time and care in
 the depths of numpy.

  Sturla
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-07-28 Thread Olivier Grisel
2014-07-28 15:25 GMT+02:00 Carl Kleffner cmkleff...@gmail.com:
 Hi,

 on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads I uploaded
 7z-archives for mingw-w64 and for OpenBLAS-0.2.10 for 32 bit and for 64 bit.
 To use mingw-w64 for Python = 3.3 you have to manually tweak the so called
 specs file - see readme.txt in the archive.

Have the patches to build numpy and scipy with mingw-w64 been merged
in the master branches of those projects?


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] 64-bit windows numpy / scipy wheels for testing

2014-07-28 Thread Carl Kleffner
I had to move my development enviroment on different windows box recently
(stilll in progress). On this box I don't have full access unfortunately.
The patch for scipy build was merged into scipy master some time ago, see
https://github.com/scipy/scipy/pull/3484 . I have some additional patches
for scipy.test.
The pull request for numpy build has not yet been made for the reasons I
mentioned.

Cheers,

Carl


2014-07-28 16:46 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org:

 2014-07-28 15:25 GMT+02:00 Carl Kleffner cmkleff...@gmail.com:
  Hi,
 
  on https://bitbucket.org/carlkl/mingw-w64-for-python/downloads I
 uploaded
  7z-archives for mingw-w64 and for OpenBLAS-0.2.10 for 32 bit and for 64
 bit.
  To use mingw-w64 for Python = 3.3 you have to manually tweak the so
 called
  specs file - see readme.txt in the archive.

 Have the patches to build numpy and scipy with mingw-w64 been merged
 in the master branches of those projects?


 --
 Olivier
 http://twitter.com/ogrisel - http://github.com/ogrisel
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Sebastian Berg
On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote:
 Sebastian:
 
 
 Those are good points. Indeed iteration order may already produce
 different results, even though the semantics of numpy suggest
 identical operations. Still, I feel this different behavior without
 any semantical clues is something to be minimized.
 
 Indeed copying might have large speed implications. But on second
 thought, does it? Either the data is already aligned and no copy is
 required, or it isn't aligned, and we need one pass of cache
 inefficient access to the data anyway. Infact, if we had one low level
 function which does cache-intelligent transposition of numpy arrays
 (using some block strategy), this might be faster even than the
 current simple reduction operations when forced to work on awkwardly
 aligned data. Ideally, this intelligent access and intelligent
 reduction would be part of a single pass of course; but that wouldn't
 really fit within the numpy design, and merely such an intelligent
 transpose would provide most of the benefit I think. Or is the
 mechanism behind ascontiguousarray already intelligent in this sense?
 

The iterator is currently smart in the sense that it will (obviously low
level), do something like [1]. Most things in numpy use that iterator,
ascontiguousarray does so as well. Such a blocked cache aware iterator
is what I mean by, someone who knows it would have to spend a lot of
time on it :).

[1] Appendix:

arr = np.ones((100, 100))
arr.sum(1)
# being equivalent (iteration order wise) to:
res = np.zeros(100)
for i in range(100):
res += arr[i, :]
# while arr.sum(0) would be:
for i in range(100):
res[i] = arr[i, :].sum()

 
 On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
 sebast...@sipsolutions.net wrote:
 On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote:
  On 28/07/14 15:21, alex wrote:
 
   Are you sure they always give different results?  Notice
 that
   np.ones((N,2)).mean(0)
   np.ones((2,N)).mean(1)
   compute means of different axes on transposed arrays so
 these
   differences 'cancel out'.
 
  They will be if different algorithms are used.
 np.ones((N,2)).mean(0)
  will have larger accumulated rounding error than
 np.ones((2,N)).mean(1),
  if only the latter uses the divide-and-conquer summation.
 
 
 
 What I wanted to point out is that to some extend the
 algorithm does not
 matter. You will not necessarily get identical results already
 if you
 use a different iteration order, and we have been doing that
 for years
 for speed reasons. All libs like BLAS do the same.
 Yes, the new changes make this much more dramatic, but they
 only make
 some paths much better, never worse. It might be dangerous,
 but only in
 the sense that you test it with the good path and it works
 good enough,
 but later (also) use the other one in some lib. I am not even
 sure if I
 
  I would suggest that in the first case we try to copy the
 array to a
  temporary contiguous buffer and use the same
 divide-and-conquer
  algorithm, unless some heuristics on memory usage fails.
 
 
 
 Sure, but you have to make major changes to the buffered
 iterator to do
 that without larger speed implications. It might be a good
 idea, but it
 requires someone who knows this stuff to spend a lot of time
 and care in
 the depths of numpy.
 
  Sturla
 
 
 
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Eelco Hoogendoorn
I see, thanks for the clarification. Just for the sake of argument, since
unfortunately I don't have the time to go dig in the guts of numpy myself:
a design which always produces results of the same (high) accuracy, but
only optimizes the common access patterns in a hacky way, and may be
inefficient in case it needs to fall back on dumb iteration or array
copying, is the best compromise between features and the ever limiting
amount of time available, I would argue, no? Its preferable if your code
works, but may be hacked to work more efficiently, than that it works
efficiently, but may need hacking to work correctly under all circumstances.

But fun as it is to think about what ought to be, i suppose the people who
do actually pour in the effort have thought about these things already. A
numpy 2.0 could probably borrow/integrate a lot from numexpr, I suppose.

By the way, the hierarchical summation would make it fairly easy to erase
(and in any case would minimize) summation differences due to differences
between logical and actual ordering in memory of the data, no?


On Mon, Jul 28, 2014 at 5:22 PM, Sebastian Berg sebast...@sipsolutions.net
wrote:

 On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote:
  Sebastian:
 
 
  Those are good points. Indeed iteration order may already produce
  different results, even though the semantics of numpy suggest
  identical operations. Still, I feel this different behavior without
  any semantical clues is something to be minimized.
 
  Indeed copying might have large speed implications. But on second
  thought, does it? Either the data is already aligned and no copy is
  required, or it isn't aligned, and we need one pass of cache
  inefficient access to the data anyway. Infact, if we had one low level
  function which does cache-intelligent transposition of numpy arrays
  (using some block strategy), this might be faster even than the
  current simple reduction operations when forced to work on awkwardly
  aligned data. Ideally, this intelligent access and intelligent
  reduction would be part of a single pass of course; but that wouldn't
  really fit within the numpy design, and merely such an intelligent
  transpose would provide most of the benefit I think. Or is the
  mechanism behind ascontiguousarray already intelligent in this sense?
 

 The iterator is currently smart in the sense that it will (obviously low
 level), do something like [1]. Most things in numpy use that iterator,
 ascontiguousarray does so as well. Such a blocked cache aware iterator
 is what I mean by, someone who knows it would have to spend a lot of
 time on it :).

 [1] Appendix:

 arr = np.ones((100, 100))
 arr.sum(1)
 # being equivalent (iteration order wise) to:
 res = np.zeros(100)
 for i in range(100):
 res += arr[i, :]
 # while arr.sum(0) would be:
 for i in range(100):
 res[i] = arr[i, :].sum()

 
  On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
  sebast...@sipsolutions.net wrote:
  On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote:
   On 28/07/14 15:21, alex wrote:
  
Are you sure they always give different results?  Notice
  that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so
  these
differences 'cancel out'.
  
   They will be if different algorithms are used.
  np.ones((N,2)).mean(0)
   will have larger accumulated rounding error than
  np.ones((2,N)).mean(1),
   if only the latter uses the divide-and-conquer summation.
  
 
 
  What I wanted to point out is that to some extend the
  algorithm does not
  matter. You will not necessarily get identical results already
  if you
  use a different iteration order, and we have been doing that
  for years
  for speed reasons. All libs like BLAS do the same.
  Yes, the new changes make this much more dramatic, but they
  only make
  some paths much better, never worse. It might be dangerous,
  but only in
  the sense that you test it with the good path and it works
  good enough,
  but later (also) use the other one in some lib. I am not even
  sure if I
 
   I would suggest that in the first case we try to copy the
  array to a
   temporary contiguous buffer and use the same
  divide-and-conquer
   algorithm, unless some heuristics on memory usage fails.
  
 
 
  Sure, but you have to make major changes to the buffered
  iterator to do
  that without larger speed implications. It might be a good
  idea, but it
  requires someone who knows this stuff to spend a lot of time
  and care in
  the depths of numpy.
 
   Sturla
  
  
  
   

Re: [Numpy-discussion] numpy.mean still broken for largefloat32arrays

2014-07-28 Thread Julian Taylor
On 28.07.2014 23:32, Eelco Hoogendoorn wrote:
 I see, thanks for the clarification. Just for the sake of argument,
 since unfortunately I don't have the time to go dig in the guts of numpy
 myself: a design which always produces results of the same (high)
 accuracy, but only optimizes the common access patterns in a hacky way,
 and may be inefficient in case it needs to fall back on dumb iteration
 or array copying, is the best compromise between features and the ever
 limiting amount of time available, I would argue, no? Its preferable if
 your code works, but may be hacked to work more efficiently, than that
 it works efficiently, but may need hacking to work correctly under all
 circumstances.

I don't see the inconsistency as such a big problem. If applications are
so sensitive to accurate summations over large uniform datasets they
will most likely implement their own algorithm instead of relying on the
black box in numpy (which never documented any accuracy bounds or used
algorithms on summation so far I know).
If they do they should add testsuites that will detect accidental use
the less accurate path in numpy and fix it before they even release.

General purpose libraries that may not be able to test every input third
party users may give them usually don't have the luxury of only
supporting the latest version of numpy to have pairwise summation
guaranteed in the first place, so they would just have to implement
their own algorithms anyway.


 
 But fun as it is to think about what ought to be, i suppose the people
 who do actually pour in the effort have thought about these things
 already. A numpy 2.0 could probably borrow/integrate a lot from numexpr,
 I suppose.
 
 By the way, the hierarchical summation would make it fairly easy to
 erase (and in any case would minimize) summation differences due to
 differences between logical and actual ordering in memory of the data, no?
 
 
 On Mon, Jul 28, 2014 at 5:22 PM, Sebastian Berg
 sebast...@sipsolutions.net mailto:sebast...@sipsolutions.net wrote:
 
 On Mo, 2014-07-28 at 16:31 +0200, Eelco Hoogendoorn wrote:
  Sebastian:
 
 
  Those are good points. Indeed iteration order may already produce
  different results, even though the semantics of numpy suggest
  identical operations. Still, I feel this different behavior without
  any semantical clues is something to be minimized.
 
  Indeed copying might have large speed implications. But on second
  thought, does it? Either the data is already aligned and no copy is
  required, or it isn't aligned, and we need one pass of cache
  inefficient access to the data anyway. Infact, if we had one low level
  function which does cache-intelligent transposition of numpy arrays
  (using some block strategy), this might be faster even than the
  current simple reduction operations when forced to work on awkwardly
  aligned data. Ideally, this intelligent access and intelligent
  reduction would be part of a single pass of course; but that wouldn't
  really fit within the numpy design, and merely such an intelligent
  transpose would provide most of the benefit I think. Or is the
  mechanism behind ascontiguousarray already intelligent in this sense?
 
 
 The iterator is currently smart in the sense that it will (obviously low
 level), do something like [1]. Most things in numpy use that iterator,
 ascontiguousarray does so as well. Such a blocked cache aware iterator
 is what I mean by, someone who knows it would have to spend a lot of
 time on it :).
 
 [1] Appendix:
 
 arr = np.ones((100, 100))
 arr.sum(1)
 # being equivalent (iteration order wise) to:
 res = np.zeros(100)
 for i in range(100):
 res += arr[i, :]
 # while arr.sum(0) would be:
 for i in range(100):
 res[i] = arr[i, :].sum()
 
 
  On Mon, Jul 28, 2014 at 4:06 PM, Sebastian Berg
  sebast...@sipsolutions.net mailto:sebast...@sipsolutions.net
 wrote:
  On Mo, 2014-07-28 at 15:35 +0200, Sturla Molden wrote:
   On 28/07/14 15:21, alex wrote:
  
Are you sure they always give different results?  Notice
  that
np.ones((N,2)).mean(0)
np.ones((2,N)).mean(1)
compute means of different axes on transposed arrays so
  these
differences 'cancel out'.
  
   They will be if different algorithms are used.
  np.ones((N,2)).mean(0)
   will have larger accumulated rounding error than
  np.ones((2,N)).mean(1),
   if only the latter uses the divide-and-conquer summation.
  
 
 
  What I wanted to point out is that to some extend the
  algorithm does not
  matter. You will not necessarily get identical results already