Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-20 Thread Rick White
On Sep 19, 2006, at 9:45 PM, Tim Hochberg wrote:

 Perhaps there's some use for the sort to end behaviour that I'm  
 missing,
 but the raise an exception behaviour sure looks a lot more  
 appealing to me.

FYI, in IDL the NaN values wind up at the end of the sorted array.   
That's true despite the fact that IDL does respect all the comparison  
properties of NaNs (i.e. ValueNaN, ValueNaN, and Value==NaN are all  
false for any value).  So clearly the sort behavior was created  
deliberately.  It's also the case that the median for arrays  
including NaN values is computed as the median of the defined values,  
ignoring the NaNs.

My view is that if the user has NaN values in the array, sort should  
respect the float exception flags and should only raise an exception  
if that is what the user has requested.

 Here's a strawman proposal:

 Sort the array. Then examine numpy.geterr()['invalid']. If it  
 is not
 'ignore', then check examine sometrue(isnan(thearray)). If the
 latter is true then raise and error, issue a warning or call the
 error reporting functioni as appropriate. Note that we always sort
 the array to be consistent with the behaviour of the ufuncs that
 proceed even when they end up raising an exception.

Here's another proposal: Make a first pass through the array,  
replacing NaN values with Inf and counting the NaNs (nancount).  
Raise an exception at this point if NaNs are not supposed to be  
allowed.  Otherwise sort the array, and then as the last step replace  
the trailing NaNcount values with NaN.

It seems to me that this would give predictable results while  
respecting the exception flags at little extra cost.
Rick

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-20 Thread Christopher Barker
Charles R Harris wrote:
 Thinking a bit, keeping the values in place isn't easy.

Why the heck would in place be desirable for sorted data anyway? I 
understand that it means that if there is a NaN in the nth position 
before sorting, there will be one in the nth position after sorting. 
However, I see absolutely no reason at all why that would be useful (or 
any more useful than putting them anywhere else)

A couple years ago, there was a long debate on this list about whether 
numpy should pass -inf, NaN, and +inf through all the ufuncs without 
error. there were two schools of thought:

1) They indicate a problem, the programmer should know about hat problem 
as soon as it occurs, not at the end of the computation, many steps 
later, when they might get presented with nothing but NaNs.

2) The whole point of vector computation is that you can act on a 
whole bunch of numbers at once. If only subset of those numbers are 
invalid, why stop the process. Raising an error when a single number has 
a problem defeats the purpose of vector operations.

It seems that numpy has settled on school of thought (2), at least by 
default. That being the case, it should apply to sorting also. If it 
does, then that means no exception will be raised, but it makes no 
difference where the heck the NaNs end up in the sorted array, as long 
as everything else is in order. NaN means exactly what it's called: it's 
not a number, so it doesn't matter what you do with them, as long as 
they are preserved and don't mess up other things. Let the coder decide 
what they want to so with them, and when they want to do it. Personally, 
I'd prefer that they all ended up at the beginning or end after sorting, 
but it really doesn't much matter.

That being said, if it's impossible to do a efficient sort with NaNs 
mixed in, then we'll just have to live with it. It really would be best 
if an exception was raised if the non-NaN values are not going to be 
sorted correctly -- that really would surprise people!

  It would probably also not be unreasonable to punt and document sort
  as failing in the presence of nans.

That would be one of the worst options, but may be the only one available.


-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

NOAA/ORR/HAZMAT (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
Keith Goodman wrote:
 In what order would you like argsort to sort the values -inf, nan, inf?
   
Ideally, -inf should sort first, inf should sort last and nan should 
raise an exception if present.

-tim


 In numpy 1.0b1 nan is always left where it began:

 EXAMPLE 1

   
 x
   

 matrix([[  inf],
 [ -inf],
 [  nan]])
   
 x[x.argsort(0),:]
   

 matrix([[ -inf],
 [  inf],
 [  nan]])

 EXAMPLE 2

   
 x
   

 matrix([[  nan],
 [  inf],
 [ -inf]])
   
 x[x.argsort(0),:]
   

 matrix([[  nan],
 [ -inf],
 [  inf]])


 I would like nan to be in the middle between -inf and inf.

 -
 Take Surveys. Earn Cash. Influence the Future of IT
 Join SourceForge.net's Techsay panel and you'll get the chance to share your
 opinions on IT  business topics through brief surveys -- and earn cash
 http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
 ___
 Numpy-discussion mailing list
 Numpy-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/numpy-discussion


   



-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Keith Goodman
On 9/19/06, A. M. Archibald [EMAIL PROTECTED] wrote:
 On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:
  Keith Goodman wrote:
   In what order would you like argsort to sort the values -inf, nan, inf?
  
  Ideally, -inf should sort first, inf should sort last and nan should
  raise an exception if present.
 
  -tim

 Mmm. Somebody who's working with NaNs has more or less already decided
 they don't want to be pestered with exceptions for invalid data. I'd
 be happy if they wound up at either end, but I'm not sure it's worth
 hacking up the sort algorithm when a simple isnan() can pull them out.

Is there an easy way to use isnan to pull out the nans if the matrix I
am sorting has more than one column?

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Sasha
On 9/19/06, Keith Goodman [EMAIL PROTECTED] wrote:
 Is there an easy way to use isnan to pull out the nans if the matrix I
 am sorting has more than one column?

There seems to be a nan_to_num function that converts nans to zeros,
but I would suggest just using fancy indexing to fill the nans with
appropriate values:

x[isnan(x)] = inf # if you want nans on the right

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
A. M. Archibald wrote:
 On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:
   
 Keith Goodman wrote:
 
 In what order would you like argsort to sort the values -inf, nan, inf?

   
 Ideally, -inf should sort first, inf should sort last and nan should
 raise an exception if present.

 -tim
 

 Mmm. Somebody who's working with NaNs has more or less already decided
 they don't want to be pestered with exceptions for invalid data.
Do you really think so? In my experience NaNs are nearly always just an 
indication of a mistake somewhere that didn't get trapped for one reason 
or another.

  I'd
 be happy if they wound up at either end, but I'm not sure it's worth
 hacking up the sort algorithm when a simple isnan() can pull them out.
   
Moving them to the end seems to be the worst choice to me. Leaving them 
alone is fine with me. Or raising an exception would be fine. Or doing 
one or the other depending on the error mode settings would be even 
better if it is practical.

 What's happening now is that NaNa, NaN==a, and NaNa are all false,
 which rather confuses the sorting algorithm. But as long as it doesn't
 actually *break* (that is, leave some of the non-NaNs incorrectly
 sorted) I don't care.
   
Is that true? Are all of numpy's sorting algorithms robust against 
nontransitive objects laying around? The answer to that appears to be 
no. Try running this a couple of times to see what I mean:

n = 10
for i in range(10):
for kind in 'quicksort', 'heapsort', 'mergesort':
 a = rand(n)
 b = a.copy()
 a[n//2] = nan
 a.sort(kind=kind)
 b.sort(kind=kind)
 assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b)


The values don't correctly cross the inserted NaN and the sort is incorrect.

-tim


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Charles R Harris
On 9/19/06, Tim Hochberg [EMAIL PROTECTED] wrote:
A. M. Archibald wrote: On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote: Keith Goodman wrote: In what order would you like argsort to sort the values -inf, nan, inf?
 Ideally, -inf should sort first, inf should sort last and nan should raise an exception if present. -tim Mmm. Somebody who's working with NaNs has more or less already decided
 they don't want to be pestered with exceptions for invalid data.Do you really think so? In my experience NaNs are nearly always just anindication of a mistake somewhere that didn't get trapped for one reason
or another.I'd be happy if they wound up at either end, but I'm not sure it's worth hacking up the sort algorithm when a simple isnan() can pull them out.Moving them to the end seems to be the worst choice to me. Leaving them
alone is fine with me. Or raising an exception would be fine. Or doingone or the other depending on the error mode settings would be evenbetter if it is practical. What's happening now is that NaNa, NaN==a, and NaNa are all false,
 which rather confuses the sorting algorithm. But as long as it doesn't actually *break* (that is, leave some of the non-NaNs incorrectly sorted) I don't care.Is that true? Are all of numpy's sorting algorithms robust against
nontransitive objects laying around? The answer to that appears to beno. Try running this a couple of times to see what I mean:n = 10for i in range(10):for kind in 'quicksort', 'heapsort', 'mergesort':
 a = rand(n) b = a.copy() a[n//2] = nan a.sort(kind=kind) b.sort(kind=kind) assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b)
The values don't correctly cross the inserted NaN and the sort is incorrect.Except for heapsort those are doing insertion sorts because n is so small. Heapsort is trickier to understand because of the way the heap is formed and values pulled off. I'll look into that. I bet searchsorted gets messed up by nan's. Do you think it worthwhile to worry about it?
Chuck
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
Charles R Harris wrote:


 On 9/19/06, *Tim Hochberg* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 A. M. Archibald wrote:
  On 19/09/06, Tim Hochberg [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
  Keith Goodman wrote:
 
  In what order would you like argsort to sort the values -inf,
 nan, inf?
 
 
  Ideally, -inf should sort first, inf should sort last and nan
 should
  raise an exception if present.
 
  -tim
 
 
  Mmm. Somebody who's working with NaNs has more or less already
 decided
  they don't want to be pestered with exceptions for invalid data.
 Do you really think so? In my experience NaNs are nearly always
 just an
 indication of a mistake somewhere that didn't get trapped for one
 reason
 or another.

   I'd
  be happy if they wound up at either end, but I'm not sure it's worth
  hacking up the sort algorithm when a simple isnan() can pull
 them out.
 
 Moving them to the end seems to be the worst choice to me. Leaving
 them
 alone is fine with me. Or raising an exception would be fine. Or doing
 one or the other depending on the error mode settings would be even
 better if it is practical.

  What's happening now is that NaNa, NaN==a, and NaNa are all
 false,
  which rather confuses the sorting algorithm. But as long as it
 doesn't
  actually *break* (that is, leave some of the non-NaNs incorrectly
  sorted) I don't care.
 
 Is that true? Are all of numpy's sorting algorithms robust against
 nontransitive objects laying around? The answer to that appears to be
 no. Try running this a couple of times to see what I mean:

 n = 10
 for i in range(10):
 for kind in 'quicksort', 'heapsort', 'mergesort':
  a = rand(n)
  b = a.copy()
  a[n//2] = nan
  a.sort(kind=kind)
  b.sort(kind=kind)
  assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b)


 The values don't correctly cross the inserted NaN and the sort is
 incorrect.


 Except for heapsort those are doing insertion sorts because n is so 
 small. Heapsort is trickier to understand because of the way the heap 
 is formed and values pulled off.
I'm not sure where the breakpoint is, but I was seeing failures for all 
three sort types with N as high as 1. I suspect that they're all 
broken in the presence of  NaNs.  I further suspect you'd need some 
punishingly slow n**2 algorithm to be robust in the presence of NaNs.

 I'll look into that. I bet searchsorted gets messed up by nan's. Do 
 you think it worthwhile to worry about it?

No. But that's because it's my contention is that any sequence with NaNs 
in it is *not sorted* and thus isn't suitable input for searchsorted.

-tim


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:
 A. M. Archibald wrote:
  Mmm. Somebody who's working with NaNs has more or less already decided
  they don't want to be pestered with exceptions for invalid data.
 Do you really think so? In my experience NaNs are nearly always just an
 indication of a mistake somewhere that didn't get trapped for one reason
 or another.

Well, I said that because for an image porcessing project I was doing,
the easiest thing to do with certain troublesome pixels was to fill in
NaNs, and then at the end replace the NaNs with sensible values. It
seems as if the point of NaNs is to allow you to keep working with
those numbers that make sense while ingoring those that don't. If you
wanted exceptions, why not get them as soon as the first NaN would
have been generated?

   I'd
  be happy if they wound up at either end, but I'm not sure it's worth
  hacking up the sort algorithm when a simple isnan() can pull them out.
 
 Moving them to the end seems to be the worst choice to me. Leaving them
 alone is fine with me. Or raising an exception would be fine. Or doing
 one or the other depending on the error mode settings would be even
 better if it is practical.

I was just thinking in terms of easy removal.

 Is that true? Are all of numpy's sorting algorithms robust against
 nontransitive objects laying around? The answer to that appears to be
 no. Try running this a couple of times to see what I mean:

 The values don't correctly cross the inserted NaN and the sort is incorrect.

You're quite right: when NaNs are present in the array, sorting and
then removing them does not yield a sorted array. For example,
mergesort just output
[ 2.  4.  6.  9. nan  0.  1.
  3.  5.  7.  8.]

The other two are no better (and arguably worse).  Python's built-in
sort() for lists has the same problem.

This is definitely a bug, and the best way to fix it is not clear to
me - perhaps sort() needs to always do any(isnan(A)) before starting
to sort. I don't really like raising an exception, but sort() isn't
really very meaningful with NaNs in the array. The only other option I
can think of is to somehow remove them, sort without, and reintroduce
them at the end, which is going to be a nightmare when sorting a
single axis of a large array. Or, I suppose, sort() could simply fill
the array with NaNs; I'm sure users will love that.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:

 I'm not sure where the breakpoint is, but I was seeing failures for all
 three sort types with N as high as 1. I suspect that they're all
 broken in the presence of  NaNs.  I further suspect you'd need some
 punishingly slow n**2 algorithm to be robust in the presence of NaNs.

Not at all. Just temporarily make NaNs compare greater than any other
floating-point value and use quicksort (say). You can even do this for
python lists without much trouble.

That's actually a viable suggestion for numpy's sorting, although it
would be kind of ugly to implement: do a quick any(isnan(A)), and if
not, use the fast stock sorting algorithms; if there is a NaN
somewhere in the array, use a version of the sort that has a tweaked
comparison function so the NaNs wind up at the end and are easy to
trim off.

But the current situation, silently returning arrays in which the
non-NaNs are unsorted, is really bad.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Charles R Harris
On 9/19/06, Tim Hochberg [EMAIL PROTECTED] wrote:
Charles R Harris wrote: On 9/19/06, *Tim Hochberg* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 wrote: A. M. Archibald wrote:  On 19/09/06, Tim Hochberg [EMAIL PROTECTED] mailto:
[EMAIL PROTECTED] wrote:   Keith Goodman wrote:   In what order would you like argsort to sort the values -inf, nan, inf?
Ideally, -inf should sort first, inf should sort last and nan should  raise an exception if present. 
  -timMmm. Somebody who's working with NaNs has more or less already decided  they don't want to be pestered with exceptions for invalid data.
 Do you really think so? In my experience NaNs are nearly always just an indication of a mistake somewhere that didn't get trapped for one reason or another.
 I'd  be happy if they wound up at either end, but I'm not sure it's worth  hacking up the sort algorithm when a simple isnan() can pull them out. 
 Moving them to the end seems to be the worst choice to me. Leaving them alone is fine with me. Or raising an exception would be fine. Or doing one or the other depending on the error mode settings would be even
 better if it is practical.  What's happening now is that NaNa, NaN==a, and NaNa are all false,  which rather confuses the sorting algorithm. But as long as it
 doesn't  actually *break* (that is, leave some of the non-NaNs incorrectly  sorted) I don't care.  Is that true? Are all of numpy's sorting algorithms robust against
 nontransitive objects laying around? The answer to that appears to be no. Try running this a couple of times to see what I mean: n = 10 for i in range(10): for kind in 'quicksort', 'heapsort', 'mergesort':
a = rand(n)b = a.copy()a[n//2] = nana.sort(kind=kind)b.sort(kind=kind)assert alltrue(a[:n//2] == b[:n//2]), (kind, a, b)
 The values don't correctly cross the inserted NaN and the sort is incorrect. Except for heapsort those are doing insertion sorts because n is so small. Heapsort is trickier to understand because of the way the heap
 is formed and values pulled off.I'm not sure where the breakpoint is, but I was seeing failures for allthree sort types with N as high as 1. I suspect that they're allbroken in the presence ofNaNs.I further suspect you'd need some
punishingly slow n**2 algorithm to be robust in the presence of NaNs.Quicksort and Mergesort are divide and conquer types. When the pieces get below a certain length they finish up with insertion sort as it is more efficient for small bits. In numpy the breakover points are 15 and 20 respectively. I believe microsoft did a project on this and ended up with a number somewhere around 7, but I didn't see that when doing the tuning for numarray way back when. Not that I did a *lot* of careful tuning work ;)
 I'll look into that. I bet searchsorted gets messed up by nan's. Do you think it worthwhile to worry about it?
No. But that's because it's my contention is that any sequence with NaNsin it is *not sorted* and thus isn't suitable input for searchsorted.If this sort of thing can cause unexpected errors I wonder if it would be worth it to have a global debugging flag that essentially causes isnan to be called before any function applications.
Chuck 
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Charles R Harris [EMAIL PROTECTED] wrote:

 If this sort of thing can cause unexpected errors I wonder if it would be
 worth it to have a global debugging flag that essentially causes  isnan to
 be called before any function applications.

That sounds very like the IEEE floating-point flags, which would be
extremely useful to have, and which are being wored on, IIRC.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
A. M. Archibald wrote:
 On 19/09/06, Charles R Harris [EMAIL PROTECTED] wrote:

   
 If this sort of thing can cause unexpected errors I wonder if it would be
 worth it to have a global debugging flag that essentially causes  isnan to
 be called before any function applications.
 

 That sounds very like the IEEE floating-point flags, which would be
 extremely useful to have, and which are being wored on, IIRC.
   
Are you referring to numpy.geterr / seterr? These exist, but they don't 
seem to be working right as of 1.0b4 (my installation is a little out of 
date).

-tim


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
A. M. Archibald wrote:
 On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:

   
 I'm not sure where the breakpoint is, but I was seeing failures for all
 three sort types with N as high as 1. I suspect that they're all
 broken in the presence of  NaNs.  I further suspect you'd need some
 punishingly slow n**2 algorithm to be robust in the presence of NaNs.
 

 Not at all. Just temporarily make NaNs compare greater than any other
 floating-point value and use quicksort (say). You can even do this for
 python lists without much trouble.
   
I misspoke. What I meant here was keeping the behavior that people think 
that we already have but don't: NaNs stay in place and everything is 
sorted around them. And even that's not true, since you could just 
record where the NaNs are, remove them, sort and put them back. What I 
was really getting at was, that I'm guessing, and it's just a guess, 
that (a) none of the fast sorting algorithms do anything sensible unless 
special cased and (b) one could come up with a naive n**2 sort that does 
do something sensible without special casing (where sensible means leave 
the NaNs alone).
 That's actually a viable suggestion for numpy's sorting, although it
 would be kind of ugly to implement: do a quick any(isnan(A)), and if
 not, use the fast stock sorting algorithms; if there is a NaN
 somewhere in the array, use a version of the sort that has a tweaked
 comparison function so the NaNs wind up at the end and are easy to
 trim off.

 But the current situation, silently returning arrays in which the
 non-NaNs are unsorted, is really bad.
   
If your going to do isnan anyway, why not just raise an exception. An 
array with NaNs in it can't be sorted by any common sense definition of 
sorting. Any treatment of NaNs is going to be arbitrary, so we might as 
well make the user specify what they want. In the face of ambiguity, 
refuse the temptation to guess and all that.

My favorite solution would be to make sort respect the invalid mode of 
seterr/geterr. However at the moment it doesn't seem to (in beta4 at 
least) but neither does add or multiply so those probably need to be 
looked at again

-tim




-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Travis Oliphant
Tim Hochberg wrote:

A. M. Archibald wrote:
  

On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:

  


I'm not sure where the breakpoint is, but I was seeing failures for all
three sort types with N as high as 1. I suspect that they're all
broken in the presence of  NaNs.  I further suspect you'd need some
punishingly slow n**2 algorithm to be robust in the presence of NaNs.

  

Not at all. Just temporarily make NaNs compare greater than any other
floating-point value and use quicksort (say). You can even do this for
python lists without much trouble.
  


I misspoke. What I meant here was keeping the behavior that people think 
that we already have but don't: NaNs stay in place and everything is 
sorted around them. And even that's not true, since you could just 
record where the NaNs are, remove them, sort and put them back. What I 
was really getting at was, that I'm guessing, and it's just a guess, 
that (a) none of the fast sorting algorithms do anything sensible unless 
special cased and (b) one could come up with a naive n**2 sort that does 
do something sensible without special casing (where sensible means leave 
the NaNs alone).
  

That's actually a viable suggestion for numpy's sorting, although it
would be kind of ugly to implement: do a quick any(isnan(A)), and if
not, use the fast stock sorting algorithms; if there is a NaN
somewhere in the array, use a version of the sort that has a tweaked
comparison function so the NaNs wind up at the end and are easy to
trim off.

But the current situation, silently returning arrays in which the
non-NaNs are unsorted, is really bad.
  


If your going to do isnan anyway, why not just raise an exception. An 
array with NaNs in it can't be sorted by any common sense definition of 
sorting. Any treatment of NaNs is going to be arbitrary, so we might as 
well make the user specify what they want. In the face of ambiguity, 
refuse the temptation to guess and all that.

My favorite solution would be to make sort respect the invalid mode of 
seterr/geterr. However at the moment it doesn't seem to (in beta4 at 
least) but neither does add or multiply so those probably need to be 
looked at again
  

The geterr/seterr stuff changes how IEEE hardware flags are handled in 
ufuncs.  Currently they are not even looked at elsewhere. Are you 
saying that add and multiply don't respect the invalid flag?   If not, 
then this might be hardware related.  Does the IEEE invalid hardware 
flag get raised on multiplication by nan or only on creation of nan?   
All the seterr/geterr stuff relies on the hardware flags.   We don't do 
any other checking.


-Travis


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Charles R Harris
On 9/19/06, A. M. Archibald [EMAIL PROTECTED] wrote:
On 19/09/06, Charles R Harris [EMAIL PROTECTED] wrote: If this sort of thing can cause unexpected errors I wonder if it would be worth it to have a global debugging flag that essentially causesisnan to
 be called before any function applications.That sounds very like the IEEE floating-point flags, which would beextremely useful to have, and which are being wored on, IIRC.Thinking a bit, keeping the values in place isn't easy. Mergesort isn't fixable because values can be moved in front of the nan before it is ever looked at. Nor can it be easily set up to leave all the nans at one end because both a  nan and nan  a return false. Quicksort might be doable with some checks. I mean, what if the selected pivot is a nan? The median of three version used also needs thinking about. Hmm. But I think it is the insertion sort that is messing up the order in mergesort as now nothing will move past the nan even if it has to. That could be fixed, but the nan's would still move around.
I think the best thing to do is punt unless the hardware can be set to do something.Chuck
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
Travis Oliphant wrote:
 Tim Hochberg wrote:

   
 A. M. Archibald wrote:
  

 
 On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:

  


   
 I'm not sure where the breakpoint is, but I was seeing failures for all
 three sort types with N as high as 1. I suspect that they're all
 broken in the presence of  NaNs.  I further suspect you'd need some
 punishingly slow n**2 algorithm to be robust in the presence of NaNs.

  

 
 Not at all. Just temporarily make NaNs compare greater than any other
 floating-point value and use quicksort (say). You can even do this for
 python lists without much trouble.
  


   
 I misspoke. What I meant here was keeping the behavior that people think 
 that we already have but don't: NaNs stay in place and everything is 
 sorted around them. And even that's not true, since you could just 
 record where the NaNs are, remove them, sort and put them back. What I 
 was really getting at was, that I'm guessing, and it's just a guess, 
 that (a) none of the fast sorting algorithms do anything sensible unless 
 special cased and (b) one could come up with a naive n**2 sort that does 
 do something sensible without special casing (where sensible means leave 
 the NaNs alone).
  

 
 That's actually a viable suggestion for numpy's sorting, although it
 would be kind of ugly to implement: do a quick any(isnan(A)), and if
 not, use the fast stock sorting algorithms; if there is a NaN
 somewhere in the array, use a version of the sort that has a tweaked
 comparison function so the NaNs wind up at the end and are easy to
 trim off.

 But the current situation, silently returning arrays in which the
 non-NaNs are unsorted, is really bad.
  


   
 If your going to do isnan anyway, why not just raise an exception. An 
 array with NaNs in it can't be sorted by any common sense definition of 
 sorting. Any treatment of NaNs is going to be arbitrary, so we might as 
 well make the user specify what they want. In the face of ambiguity, 
 refuse the temptation to guess and all that.

 My favorite solution would be to make sort respect the invalid mode of 
 seterr/geterr. However at the moment it doesn't seem to (in beta4 at 
 least) but neither does add or multiply so those probably need to be 
 looked at again
  

 
 The geterr/seterr stuff changes how IEEE hardware flags are handled in 
 ufuncs.  Currently they are not even looked at elsewhere. Are you 
 saying that add and multiply don't respect the invalid flag?   If not, 
 then this might be hardware related.  Does the IEEE invalid hardware 
 flag get raised on multiplication by nan or only on creation of nan? 
It looks like I jumped to conclusions here. I was expecting that with 
invalid set to 'raise' that an array that (+-*operations on nans would 
raise an exception. It appears that is incorrect -- only operations that 
create nans from non-nans will trigger this as you suspected. Similarly, 
I expected that over='raise' would cause inf+something to raise an 
exception. Again, not true; this raises an exception only when a new inf 
is created from non infs. At least on my box.

Interesting. And a little surprising.

An interesting tidbit (in an array) inf/inf will raise invalid, but 
nan/nan will not, it just returns nan. Fun, fun fun!
   
 All the seterr/geterr stuff relies on the hardware flags.   We don't do 
 any other checking.
   

Yeah. And just by itself this isn't going to do anything for sort since 
comparing nans will not set any flags, it's just that the result will be 
problematic.If one wanted to use these flags for this one would have to 
use/abuse the result of geterr to trigger an isnan test at the beginning 
of sort and then warn, raise or call as appropriate.

It would probably also not be unreasonable to punt and document sort as 
failing in the presence of nans.


-tim


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Charles R Harris
On 9/19/06, Tim Hochberg [EMAIL PROTECTED] wrote:
Travis Oliphant wrote: Tim Hochberg wrote: A. M. Archibald wrote: On 19/09/06, Tim Hochberg 
[EMAIL PROTECTED] wrote: I'm not sure where the breakpoint is, but I was seeing failures for all three sort types with N as high as 1. I suspect that they're all
 broken in the presence ofNaNs.I further suspect you'd need some punishingly slow n**2 algorithm to be robust in the presence of NaNs.
 Not at all. Just temporarily make NaNs compare greater than any other floating-point value and use quicksort (say). You can even do this for python lists without much trouble.
 I misspoke. What I meant here was keeping the behavior that people think that we already have but don't: NaNs stay in place and everything is
 sorted around them. And even that's not true, since you could just record where the NaNs are, remove them, sort and put them back. What I was really getting at was, that I'm guessing, and it's just a guess,
 that (a) none of the fast sorting algorithms do anything sensible unless special cased and (b) one could come up with a naive n**2 sort that does do something sensible without special casing (where sensible means leave
 the NaNs alone). That's actually a viable suggestion for numpy's sorting, although it would be kind of ugly to implement: do a quick any(isnan(A)), and if
 not, use the fast stock sorting algorithms; if there is a NaN somewhere in the array, use a version of the sort that has a tweaked comparison function so the NaNs wind up at the end and are easy to
 trim off. But the current situation, silently returning arrays in which the non-NaNs are unsorted, is really bad.
 If your going to do isnan anyway, why not just raise an exception. An array with NaNs in it can't be sorted by any common sense definition of sorting. Any treatment of NaNs is going to be arbitrary, so we might as
 well make the user specify what they want. In the face of ambiguity, refuse the temptation to guess and all that. My favorite solution would be to make sort respect the invalid mode of
 seterr/geterr. However at the moment it doesn't seem to (in beta4 at least) but neither does add or multiply so those probably need to be looked at again
 The geterr/seterr stuff changes how IEEE hardware flags are handled in ufuncs.Currently they are not even looked at elsewhere. Are you saying that add and multiply don't respect the invalid flag? If not,
 then this might be hardware related.Does the IEEE invalid hardware flag get raised on multiplication by nan or only on creation of nan?It looks like I jumped to conclusions here. I was expecting that with
invalid set to 'raise' that an array that (+-*operations on nans wouldraise an exception. It appears that is incorrect -- only operations thatcreate nans from non-nans will trigger this as you suspected. Similarly,
I expected that over='raise' would cause inf+something to raise anexception. Again, not true; this raises an exception only when a new infis created from non infs. At least on my box.Interesting. And a little surprising.
An interesting tidbit (in an array) inf/inf will raise invalid, butnan/nan will not, it just returns nan. Fun, fun fun! All the seterr/geterr stuff relies on the hardware flags. We don't do
 any other checking.Yeah. And just by itself this isn't going to do anything for sort sincecomparing nans will not set any flags, it's just that the result will beproblematic.If one wanted to use these flags for this one would have to
use/abuse the result of geterr to trigger an isnan test at the beginningof sort and then warn, raise or call as appropriate.It would probably also not be unreasonable to punt and document sort asfailing in the presence of nans.
For floats we could use something like:lessthan(a,b) := a  b || (a == nan  b != nan)Which would put all the nans at one end and might not add too much overhead.
Chuck
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Charles R Harris [EMAIL PROTECTED] wrote:



 For floats we could use something like:

 lessthan(a,b) := a  b || (a == nan  b != nan)

 Which would put all the nans at one end and might not add too much overhead.

You could put an any(isnan()) out front and run this slower version
only if there are any NaNs (also, you can't use == for NaNs, you have
to use C isNaN). But I'm starting to see the wisdom in simply throwing
an exception, since sorting is not well-defined with NaNs.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Charles R Harris
On 9/19/06, A. M. Archibald [EMAIL PROTECTED] wrote:
On 19/09/06, Charles R Harris [EMAIL PROTECTED] wrote: For floats we could use something like: lessthan(a,b) := a  b || (a == nan  b != nan)
 Which would put all the nans at one end and might not add too much overhead.You could put an any(isnan()) out front and run this slower versiononly if there are any NaNs (also, you can't use == for NaNs, you have
to use C isNaN). But I'm starting to see the wisdom in simply throwingan exception, since sorting is not well-defined with NaNs.Looks like mergesort can be modified to sort around the NaNs without too much trouble if there is a good isnan function available: just cause the pointers to skip over them. I see that most of the isnan stuff seems to be in the ufunc source and isn't terribly simple. Could be broken out into a separate include, I suppose.
I still wonder if it is worth the trouble. As to raising an exception, I seem to recall reading somewhere that exception code tends to be expensive, I haven't done any benchmarks myself.Chuck

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Tim Hochberg
Charles R Harris wrote:


 On 9/19/06, *A. M. Archibald* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 On 19/09/06, Charles R Harris [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 
 
  For floats we could use something like:
 
  lessthan(a,b) := a  b || (a == nan  b != nan)

I believe this would have to be some sort of isnan macros since 
everything compares not equal to nan. I forget the correct spelling at 
the moment though,

 
  Which would put all the nans at one end and might not add too
 much overhead.

 You could put an any(isnan()) out front and run this slower version
 only if there are any NaNs (also, you can't use == for NaNs, you have
 to use C isNaN). But I'm starting to see the wisdom in simply throwing
 an exception, since sorting is not well-defined with NaNs.


 Looks like mergesort can be modified to sort around the NaNs without 
 too much trouble if there is a good isnan function available: just 
 cause the pointers to skip over them. I see that most of the isnan 
 stuff seems to be in the ufunc source and isn't terribly simple. Could 
 be broken out into a separate include, I suppose.

 I still wonder if it is worth the trouble. As to raising an exception, 
 I seem to recall reading somewhere that exception code tends to be 
 expensive, I haven't done any benchmarks myself.

I'm still somewhat mystified by the desire to move the nans to one end 
of the sorted object. I see two scenarios:

   1. The user is not expecting to have nans in the data. In this case
  moving the nans to end is not helpful. The resulting array is
  still not sorted in the sense that a[i-1]=a[i]=a[i+1] does not
  hold and thus is likely to break code that relies on the array
  being sorted. The most prominent example of which is searchsorted.
  In this case you really want to raise an exception if possible
  since no good will come from letting the code continue to run. In
  this case the time in involved in throwing and catching an
  exception is irrelevant.
   2. The user *is* expecting to have nans in the data. This is
  presumably the case that the sorting-nans-to-the-end idea is aimed
  at. So far at least the suggested use has been to sort and then
  strip the nans. I suggest that a better approach is to test for
  and strip the nans before the sort. For example:

# a is an array that may have some nans
# you can do this more pithily, but I'm aiming to minimize isnan calls
# note that this *sometimes* makes a copy.
nanmask = isnan(a)
if sometrue(nanmask):
a = a[not nanmask]
a.sort()
#.

I presume that isnan is somewhat more expensive than the basic ''
operator. In the proposed sort to end version we need N*log(N) isnan
calls versus just N in the above case. The sort to end case probably
won't look any cleaner than the above either since you still need to
count the nans to determine how many to strip.

Perhaps there's some use for the sort to end behaviour that I'm missing, 
but the raise an exception behaviour sure looks a lot more appealing to me.

Here's a strawman proposal:

Sort the array. Then examine numpy.geterr()['invalid']. If it is not
'ignore', then check examine sometrue(isnan(thearray)). If the
latter is true then raise and error, issue a warning or call the
error reporting functioni as appropriate. Note that we always sort
the array to be consistent with the behaviour of the ufuncs that
proceed even when they end up raising an exception.

-tim


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread A. M. Archibald
On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote:

 I'm still somewhat mystified by the desire to move the nans to one end
 of the sorted object. I see two scenarios:

It's mostly to have something to do with them other than throw an
exception. Leaving them in place while the rest of the array is
reshuffled requires a lot of work and isn't particularly better. I
mostly presented it as an alternative to throwing an exception.

Throwing a Python exception now seems like the most reasonable idea.

A. M. Archibald

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion


Re: [Numpy-discussion] sorting -inf, nan, inf

2006-09-19 Thread Charles R Harris
On 9/19/06, A. M. Archibald [EMAIL PROTECTED] wrote:
On 19/09/06, Tim Hochberg [EMAIL PROTECTED] wrote: I'm still somewhat mystified by the desire to move the nans to one end of the sorted object. I see two scenarios:
It's mostly to have something to do with them other than throw anexception. Leaving them in place while the rest of the array isreshuffled requires a lot of work and isn't particularly better. Imostly presented it as an alternative to throwing an exception.
Throwing a Python exception now seems like the most reasonable idea.Well, mergesort can be adapted without a lot of work. Could be used to sort masked arrays too, not that I know why anyone would want that, but then again, I haven't used masked arrays.
Agreed, throwing some sort of error still seems the simplest thing to do.
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV___
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion