Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Nathaniel Smith
On 5 Jun 2013 03:21, Eric Firing efir...@hawaii.edu wrote:

 On 2013/06/04 4:15 PM, Benjamin Root wrote:
  Could non-monotonicity be detected as part of the interp process?
  Perhaps a sign switch in the deltas?

 There are two code paths, depending on the number of points to be
 interpolated.  When it is greater than the size of the table, the deltas
 are pre-computed in a single sweep.  Non-monotonicity could be detected
 there at moderate cost.  In the other code path, for a smaller number of
 points, the deltas are computed only as needed, so monotonicity testing
 would require a separate sweep through the points.  That's the costly
 case that I think might reasonably be an option but that should not be
 required.

Nonetheless, perhaps the function should at least be safe by default? I'm
worried by these multiple reports of people silently getting wrong answers
in practice...

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Slavin, Jonathan
The simplest monotonicity test that I've seen is:

dx = np.diff(x)
monotonic = np.all(dx  0.) or np.all(dx  0.)

I expect that this is pretty fast, though I haven't tested it yet.  If we
want to make checking optional, then I think the default should be to check
with the option to skip the check.

Jon

On Tue, Jun 4, 2013 at 9:03 PM, numpy-discussion-requ...@scipy.org wrote:

 From: Eric Firing efir...@hawaii.edu
 To: numpy-discussion@scipy.org
 Cc:
 Date: Tue, 04 Jun 2013 15:08:29 -1000
 Subject: Re: [Numpy-discussion] suggested change of behavior for interp
 On 2013/06/04 2:05 PM, Charles R Harris wrote:



 On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
 jsla...@cfa.harvard.edu 
 mailto:jslavin@cfa.harvard.**edujsla...@cfa.harvard.edu
 wrote:

 Hi,

 I would like to suggest that the behavior of numpy.interp be changed
 regarding treatment of situations in which the x-coordinates are not
 monotonically increasing.  Specifically, it seems to me that interp
 should work correctly when the x-coordinate is decreasing
 monotonically.  Clearly it cannot work if the x-coordinate is not
 monotonic, but in that case it should raise an exception.  Currently
 if x is not increasing it simply silently fails, providing incorrect
 values.  This fix could be as simple as a monotonicity test and
 inversion if necessary (plus a raise statement for non-monotonic
 cases).


 Seems reasonable, although it might add a bit of execution time.


 The monotonicity test should be an option if it is available at all; when
 interpolating a small number of points from a large pair of arrays, the
 single sweep through the whole array could dominate the execution time.
  Checking for increasing versus decreasing, in contrast, can be done fast,
 so handling the decreasing case transparently is reasonable.

 Eric





Jonathan D. Slavin Harvard-Smithsonian CfA
jsla...@cfa.harvard.edu   60 Garden Street, MS 83
phone: (617) 496-7981   Cambridge, MA 02138-1516
fax: (617) 496-7577USA

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Nathaniel Smith
On Wed, Jun 5, 2013 at 3:16 PM, Slavin, Jonathan
jsla...@cfa.harvard.edu wrote:
 The simplest monotonicity test that I've seen is:

 dx = np.diff(x)
 monotonic = np.all(dx  0.) or np.all(dx  0.)

 I expect that this is pretty fast, though I haven't tested it yet.  If we
 want to make checking optional, then I think the default should be to check
 with the option to skip the check.

The slow down people are worried about is, suppose that 'xp' has
1,000,000 entries, and the user wants to interpolate 1 point. If we
can assume the array is sorted, then we can find which bin the 1 point
falls into by using binary search (np.searchsorted), which will
require examining ~log2(1,000,000) = 20 entries in the array. Checking
that it's sorted, though, will require examining all 1,000,000 points
-- it turns an O(log n) operation into an O(n) operation. And if this
is being called as part of some larger algorithm, this effect can
cascade -- if it gets called m times from a loop, now that's O(mn)
instead of (m log n), etc. That's often the difference between
tractable and intractable.

If you submit a PR that gives interp the option to check for
monotonicity, then we'll almost certainly merge it :-). If you also
make it the default then there might be some debate, though I'd argue
for it...

Whether interp should accept descending inputs is a separate issue and
probably more contentious; I'm weakly against it myself, as
unnecessary magic when you can just say np.interp(x, xp[::-1],
fp[::-1]).

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Julian Taylor
On 05.06.2013 16:33, Nathaniel Smith wrote:
 On Wed, Jun 5, 2013 at 3:16 PM, Slavin, Jonathan
 jsla...@cfa.harvard.edu wrote:
 The simplest monotonicity test that I've seen is:

 dx = np.diff(x)
 monotonic = np.all(dx  0.) or np.all(dx  0.)

 I expect that this is pretty fast, though I haven't tested it yet.  If we
 want to make checking optional, then I think the default should be to check
 with the option to skip the check.
 
 The slow down people are worried about is, suppose that 'xp' has
 1,000,000 entries, and the user wants to interpolate 1 point. If we
 can assume the array is sorted, then we can find which bin the 1 point
 falls into by using binary search (np.searchsorted), which will
 require examining ~log2(1,000,000) = 20 entries in the array. Checking
 that it's sorted, though, will require examining all 1,000,000 points
 -- it turns an O(log n) operation into an O(n) operation. And if this
 is being called as part of some larger algorithm, this effect can
 cascade -- if it gets called m times from a loop, now that's O(mn)
 instead of (m log n), etc. That's often the difference between
 tractable and intractable.
 
 If you submit a PR that gives interp the option to check for
 monotonicity, then we'll almost certainly merge it :-). If you also
 make it the default then there might be some debate, though I'd argue
 for it...

I would not like it when the performance goes from log to linear by default.
It is documented that the coordinates must be increasing after all.

How about as a compromise the function checks one or two closest
neighbors around the interpolation point and warns/errors if those are
not monotonic.

Its not fool proof but should at least catch the very wrong cases with
an acceptable performance loss.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Nathaniel Smith
On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
jtaylor.deb...@googlemail.com wrote:
 On 05.06.2013 16:33, Nathaniel Smith wrote:
 The slow down people are worried about is, suppose that 'xp' has
 1,000,000 entries, and the user wants to interpolate 1 point. If we
 can assume the array is sorted, then we can find which bin the 1 point
 falls into by using binary search (np.searchsorted), which will
 require examining ~log2(1,000,000) = 20 entries in the array. Checking
 that it's sorted, though, will require examining all 1,000,000 points
 -- it turns an O(log n) operation into an O(n) operation. And if this
 is being called as part of some larger algorithm, this effect can
 cascade -- if it gets called m times from a loop, now that's O(mn)
 instead of (m log n), etc. That's often the difference between
 tractable and intractable.

 If you submit a PR that gives interp the option to check for
 monotonicity, then we'll almost certainly merge it :-). If you also
 make it the default then there might be some debate, though I'd argue
 for it...

 I would not like it when the performance goes from log to linear by default.
 It is documented that the coordinates must be increasing after all.

I agree, I don't like it either. But the problem is there are two
contradictory things and I don't like either of them:
1) performance going from log to linear by default (for a subset of
somewhat extreme cases)
2) people silently getting the wrong results (and according to reports
in this thread, the warning in the docs does not actually prevent
this)

The question is which one do we dislike more. My feeling is that in
the cases where (1) comes up, it will annoy people and get them to
check the docs and find the go-faster switch, while the first warning
of (2) is when your spaceship crashes, so we should worry more about
(2).

 How about as a compromise the function checks one or two closest
 neighbors around the interpolation point and warns/errors if those are
 not monotonic.

 Its not fool proof but should at least catch the very wrong cases with
 an acceptable performance loss.

There are tons of ways for data to accidentally end up sorted within
each local region, but unsorted overall, e.g., if you're combining
results from parallel simulation runs. Such data would systematically
pass this check, but still give nonsensical answers.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Charles R Harris
On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
  On 05.06.2013 16:33, Nathaniel Smith wrote:
  The slow down people are worried about is, suppose that 'xp' has
  1,000,000 entries, and the user wants to interpolate 1 point. If we
  can assume the array is sorted, then we can find which bin the 1 point
  falls into by using binary search (np.searchsorted), which will
  require examining ~log2(1,000,000) = 20 entries in the array. Checking
  that it's sorted, though, will require examining all 1,000,000 points
  -- it turns an O(log n) operation into an O(n) operation. And if this
  is being called as part of some larger algorithm, this effect can
  cascade -- if it gets called m times from a loop, now that's O(mn)
  instead of (m log n), etc. That's often the difference between
  tractable and intractable.
 
  If you submit a PR that gives interp the option to check for
  monotonicity, then we'll almost certainly merge it :-). If you also
  make it the default then there might be some debate, though I'd argue
  for it...
 
  I would not like it when the performance goes from log to linear by
 default.
  It is documented that the coordinates must be increasing after all.

 I agree, I don't like it either. But the problem is there are two
 contradictory things and I don't like either of them:
 1) performance going from log to linear by default (for a subset of
 somewhat extreme cases)
 2) people silently getting the wrong results (and according to reports
 in this thread, the warning in the docs does not actually prevent
 this)

 The question is which one do we dislike more. My feeling is that in
 the cases where (1) comes up, it will annoy people and get them to
 check the docs and find the go-faster switch, while the first warning
 of (2) is when your spaceship crashes, so we should worry more about
 (2).

  How about as a compromise the function checks one or two closest
  neighbors around the interpolation point and warns/errors if those are
  not monotonic.
 
  Its not fool proof but should at least catch the very wrong cases with
  an acceptable performance loss.

 There are tons of ways for data to accidentally end up sorted within
 each local region, but unsorted overall, e.g., if you're combining
 results from parallel simulation runs. Such data would systematically
 pass this check, but still give nonsensical answers.


Some actual benchmarks might be useful to the discussion.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Charles R Harris
On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
  On 05.06.2013 16:33, Nathaniel Smith wrote:
  The slow down people are worried about is, suppose that 'xp' has
  1,000,000 entries, and the user wants to interpolate 1 point. If we
  can assume the array is sorted, then we can find which bin the 1 point
  falls into by using binary search (np.searchsorted), which will
  require examining ~log2(1,000,000) = 20 entries in the array. Checking
  that it's sorted, though, will require examining all 1,000,000 points
  -- it turns an O(log n) operation into an O(n) operation. And if this
  is being called as part of some larger algorithm, this effect can
  cascade -- if it gets called m times from a loop, now that's O(mn)
  instead of (m log n), etc. That's often the difference between
  tractable and intractable.
 
  If you submit a PR that gives interp the option to check for
  monotonicity, then we'll almost certainly merge it :-). If you also
  make it the default then there might be some debate, though I'd argue
  for it...
 
  I would not like it when the performance goes from log to linear by
 default.
  It is documented that the coordinates must be increasing after all.

 I agree, I don't like it either. But the problem is there are two
 contradictory things and I don't like either of them:
 1) performance going from log to linear by default (for a subset of
 somewhat extreme cases)
 2) people silently getting the wrong results (and according to reports
 in this thread, the warning in the docs does not actually prevent
 this)

 The question is which one do we dislike more. My feeling is that in
 the cases where (1) comes up, it will annoy people and get them to
 check the docs and find the go-faster switch, while the first warning
 of (2) is when your spaceship crashes, so we should worry more about
 (2).

  How about as a compromise the function checks one or two closest
  neighbors around the interpolation point and warns/errors if those are
  not monotonic.
 
  Its not fool proof but should at least catch the very wrong cases with
  an acceptable performance loss.

 There are tons of ways for data to accidentally end up sorted within
 each local region, but unsorted overall, e.g., if you're combining
 results from parallel simulation runs. Such data would systematically
 pass this check, but still give nonsensical answers.


 Some actual benchmarks might be useful to the discussion.


And perhaps an inplace C function ismonotone would be generally useful.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-05 Thread Charles R Harris
On Wed, Jun 5, 2013 at 12:00 PM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote:

 On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:
  On 05.06.2013 16:33, Nathaniel Smith wrote:
  The slow down people are worried about is, suppose that 'xp' has
  1,000,000 entries, and the user wants to interpolate 1 point. If we
  can assume the array is sorted, then we can find which bin the 1 point
  falls into by using binary search (np.searchsorted), which will
  require examining ~log2(1,000,000) = 20 entries in the array. Checking
  that it's sorted, though, will require examining all 1,000,000 points
  -- it turns an O(log n) operation into an O(n) operation. And if this
  is being called as part of some larger algorithm, this effect can
  cascade -- if it gets called m times from a loop, now that's O(mn)
  instead of (m log n), etc. That's often the difference between
  tractable and intractable.
 
  If you submit a PR that gives interp the option to check for
  monotonicity, then we'll almost certainly merge it :-). If you also
  make it the default then there might be some debate, though I'd argue
  for it...
 
  I would not like it when the performance goes from log to linear by
 default.
  It is documented that the coordinates must be increasing after all.

 I agree, I don't like it either. But the problem is there are two
 contradictory things and I don't like either of them:
 1) performance going from log to linear by default (for a subset of
 somewhat extreme cases)
 2) people silently getting the wrong results (and according to reports
 in this thread, the warning in the docs does not actually prevent
 this)

 The question is which one do we dislike more. My feeling is that in
 the cases where (1) comes up, it will annoy people and get them to
 check the docs and find the go-faster switch, while the first warning
 of (2) is when your spaceship crashes, so we should worry more about
 (2).

  How about as a compromise the function checks one or two closest
  neighbors around the interpolation point and warns/errors if those are
  not monotonic.
 
  Its not fool proof but should at least catch the very wrong cases with
  an acceptable performance loss.

 There are tons of ways for data to accidentally end up sorted within
 each local region, but unsorted overall, e.g., if you're combining
 results from parallel simulation runs. Such data would systematically
 pass this check, but still give nonsensical answers.


 Some actual benchmarks might be useful to the discussion.


 And perhaps an inplace C function ismonotone would be generally useful.


Or make greater.reduce operate correctly.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Slavin, Jonathan
Hi,

I would like to suggest that the behavior of numpy.interp be changed
regarding treatment of situations in which the x-coordinates are not
monotonically increasing.  Specifically, it seems to me that interp should
work correctly when the x-coordinate is decreasing monotonically.  Clearly
it cannot work if the x-coordinate is not monotonic, but in that case it
should raise an exception.  Currently if x is not increasing it simply
silently fails, providing incorrect values.  This fix could be as simple as
a monotonicity test and inversion if necessary (plus a raise statement for
non-monotonic cases).

Jon

Jonathan D. Slavin Harvard-Smithsonian CfA
jsla...@cfa.harvard.edu   60 Garden Street, MS 83
phone: (617) 496-7981   Cambridge, MA 02138-1516
fax: (617) 496-7577USA

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Charles R Harris
On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
jsla...@cfa.harvard.eduwrote:

 Hi,

 I would like to suggest that the behavior of numpy.interp be changed
 regarding treatment of situations in which the x-coordinates are not
 monotonically increasing.  Specifically, it seems to me that interp should
 work correctly when the x-coordinate is decreasing monotonically.  Clearly
 it cannot work if the x-coordinate is not monotonic, but in that case it
 should raise an exception.  Currently if x is not increasing it simply
 silently fails, providing incorrect values.  This fix could be as simple as
 a monotonicity test and inversion if necessary (plus a raise statement for
 non-monotonic cases).


Seems reasonable, although it might add a bit of execution time.

Chuck



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Eric Firing
On 2013/06/04 2:05 PM, Charles R Harris wrote:


 On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
 jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote:

 Hi,

 I would like to suggest that the behavior of numpy.interp be changed
 regarding treatment of situations in which the x-coordinates are not
 monotonically increasing.  Specifically, it seems to me that interp
 should work correctly when the x-coordinate is decreasing
 monotonically.  Clearly it cannot work if the x-coordinate is not
 monotonic, but in that case it should raise an exception.  Currently
 if x is not increasing it simply silently fails, providing incorrect
 values.  This fix could be as simple as a monotonicity test and
 inversion if necessary (plus a raise statement for non-monotonic cases).


 Seems reasonable, although it might add a bit of execution time.

The monotonicity test should be an option if it is available at all; 
when interpolating a small number of points from a large pair of arrays, 
the single sweep through the whole array could dominate the execution 
time.  Checking for increasing versus decreasing, in contrast, can be 
done fast, so handling the decreasing case transparently is reasonable.

Eric


 Chuck

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Benjamin Root
Could non-monotonicity be detected as part of the interp process? Perhaps a
sign switch in the deltas?

I have been bitten by this problem too.

Cheers!
Ben Root

On Jun 4, 2013 9:08 PM, Eric Firing efir...@hawaii.edu wrote:

 On 2013/06/04 2:05 PM, Charles R Harris wrote:
 
 
  On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
  jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote:
 
  Hi,
 
  I would like to suggest that the behavior of numpy.interp be changed
  regarding treatment of situations in which the x-coordinates are not
  monotonically increasing.  Specifically, it seems to me that interp
  should work correctly when the x-coordinate is decreasing
  monotonically.  Clearly it cannot work if the x-coordinate is not
  monotonic, but in that case it should raise an exception.  Currently
  if x is not increasing it simply silently fails, providing incorrect
  values.  This fix could be as simple as a monotonicity test and
  inversion if necessary (plus a raise statement for non-monotonic
cases).
 
 
  Seems reasonable, although it might add a bit of execution time.

 The monotonicity test should be an option if it is available at all;
 when interpolating a small number of points from a large pair of arrays,
 the single sweep through the whole array could dominate the execution
 time.  Checking for increasing versus decreasing, in contrast, can be
 done fast, so handling the decreasing case transparently is reasonable.

 Eric

 
  Chuck

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] suggested change of behavior for interp

2013-06-04 Thread Eric Firing
On 2013/06/04 4:15 PM, Benjamin Root wrote:
 Could non-monotonicity be detected as part of the interp process?
 Perhaps a sign switch in the deltas?

There are two code paths, depending on the number of points to be 
interpolated.  When it is greater than the size of the table, the deltas 
are pre-computed in a single sweep.  Non-monotonicity could be detected 
there at moderate cost.  In the other code path, for a smaller number of 
points, the deltas are computed only as needed, so monotonicity testing 
would require a separate sweep through the points.  That's the costly 
case that I think might reasonably be an option but that should not be 
required.

Eric


 I have been bitten by this problem too.

 Cheers!
 Ben Root

 On Jun 4, 2013 9:08 PM, Eric Firing efir...@hawaii.edu
 mailto:efir...@hawaii.edu wrote:
  
   On 2013/06/04 2:05 PM, Charles R Harris wrote:
   
   
On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan
jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu
 mailto:jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote:
   
Hi,
   
I would like to suggest that the behavior of numpy.interp be
 changed
regarding treatment of situations in which the x-coordinates
 are not
monotonically increasing.  Specifically, it seems to me that interp
should work correctly when the x-coordinate is decreasing
monotonically.  Clearly it cannot work if the x-coordinate is not
monotonic, but in that case it should raise an exception.
   Currently
if x is not increasing it simply silently fails, providing
 incorrect
values.  This fix could be as simple as a monotonicity test and
inversion if necessary (plus a raise statement for
 non-monotonic cases).
   
   
Seems reasonable, although it might add a bit of execution time.
  
   The monotonicity test should be an option if it is available at all;
   when interpolating a small number of points from a large pair of arrays,
   the single sweep through the whole array could dominate the execution
   time.  Checking for increasing versus decreasing, in contrast, can be
   done fast, so handling the decreasing case transparently is reasonable.
  
   Eric
  
   
Chuck
  
   ___
   NumPy-Discussion mailing list
   NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org
   http://mail.scipy.org/mailman/listinfo/numpy-discussion



 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion