Re: [Numpy-discussion] suggested change of behavior for interp
On 5 Jun 2013 03:21, Eric Firing efir...@hawaii.edu wrote: On 2013/06/04 4:15 PM, Benjamin Root wrote: Could non-monotonicity be detected as part of the interp process? Perhaps a sign switch in the deltas? There are two code paths, depending on the number of points to be interpolated. When it is greater than the size of the table, the deltas are pre-computed in a single sweep. Non-monotonicity could be detected there at moderate cost. In the other code path, for a smaller number of points, the deltas are computed only as needed, so monotonicity testing would require a separate sweep through the points. That's the costly case that I think might reasonably be an option but that should not be required. Nonetheless, perhaps the function should at least be safe by default? I'm worried by these multiple reports of people silently getting wrong answers in practice... -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
The simplest monotonicity test that I've seen is: dx = np.diff(x) monotonic = np.all(dx 0.) or np.all(dx 0.) I expect that this is pretty fast, though I haven't tested it yet. If we want to make checking optional, then I think the default should be to check with the option to skip the check. Jon On Tue, Jun 4, 2013 at 9:03 PM, numpy-discussion-requ...@scipy.org wrote: From: Eric Firing efir...@hawaii.edu To: numpy-discussion@scipy.org Cc: Date: Tue, 04 Jun 2013 15:08:29 -1000 Subject: Re: [Numpy-discussion] suggested change of behavior for interp On 2013/06/04 2:05 PM, Charles R Harris wrote: On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan jsla...@cfa.harvard.edu mailto:jslavin@cfa.harvard.**edujsla...@cfa.harvard.edu wrote: Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Seems reasonable, although it might add a bit of execution time. The monotonicity test should be an option if it is available at all; when interpolating a small number of points from a large pair of arrays, the single sweep through the whole array could dominate the execution time. Checking for increasing versus decreasing, in contrast, can be done fast, so handling the decreasing case transparently is reasonable. Eric Jonathan D. Slavin Harvard-Smithsonian CfA jsla...@cfa.harvard.edu 60 Garden Street, MS 83 phone: (617) 496-7981 Cambridge, MA 02138-1516 fax: (617) 496-7577USA ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On Wed, Jun 5, 2013 at 3:16 PM, Slavin, Jonathan jsla...@cfa.harvard.edu wrote: The simplest monotonicity test that I've seen is: dx = np.diff(x) monotonic = np.all(dx 0.) or np.all(dx 0.) I expect that this is pretty fast, though I haven't tested it yet. If we want to make checking optional, then I think the default should be to check with the option to skip the check. The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable. If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it... Whether interp should accept descending inputs is a separate issue and probably more contentious; I'm weakly against it myself, as unnecessary magic when you can just say np.interp(x, xp[::-1], fp[::-1]). -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On 05.06.2013 16:33, Nathaniel Smith wrote: On Wed, Jun 5, 2013 at 3:16 PM, Slavin, Jonathan jsla...@cfa.harvard.edu wrote: The simplest monotonicity test that I've seen is: dx = np.diff(x) monotonic = np.all(dx 0.) or np.all(dx 0.) I expect that this is pretty fast, though I haven't tested it yet. If we want to make checking optional, then I think the default should be to check with the option to skip the check. The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable. If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it... I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all. How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic. Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 05.06.2013 16:33, Nathaniel Smith wrote: The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable. If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it... I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all. I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them: 1) performance going from log to linear by default (for a subset of somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this) The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2). How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic. Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss. There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers. -n ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 05.06.2013 16:33, Nathaniel Smith wrote: The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable. If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it... I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all. I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them: 1) performance going from log to linear by default (for a subset of somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this) The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2). How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic. Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss. There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers. Some actual benchmarks might be useful to the discussion. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 05.06.2013 16:33, Nathaniel Smith wrote: The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable. If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it... I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all. I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them: 1) performance going from log to linear by default (for a subset of somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this) The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2). How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic. Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss. There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers. Some actual benchmarks might be useful to the discussion. And perhaps an inplace C function ismonotone would be generally useful. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On Wed, Jun 5, 2013 at 12:00 PM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Jun 5, 2013 at 11:59 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Wed, Jun 5, 2013 at 11:48 AM, Nathaniel Smith n...@pobox.com wrote: On Wed, Jun 5, 2013 at 6:08 PM, Julian Taylor jtaylor.deb...@googlemail.com wrote: On 05.06.2013 16:33, Nathaniel Smith wrote: The slow down people are worried about is, suppose that 'xp' has 1,000,000 entries, and the user wants to interpolate 1 point. If we can assume the array is sorted, then we can find which bin the 1 point falls into by using binary search (np.searchsorted), which will require examining ~log2(1,000,000) = 20 entries in the array. Checking that it's sorted, though, will require examining all 1,000,000 points -- it turns an O(log n) operation into an O(n) operation. And if this is being called as part of some larger algorithm, this effect can cascade -- if it gets called m times from a loop, now that's O(mn) instead of (m log n), etc. That's often the difference between tractable and intractable. If you submit a PR that gives interp the option to check for monotonicity, then we'll almost certainly merge it :-). If you also make it the default then there might be some debate, though I'd argue for it... I would not like it when the performance goes from log to linear by default. It is documented that the coordinates must be increasing after all. I agree, I don't like it either. But the problem is there are two contradictory things and I don't like either of them: 1) performance going from log to linear by default (for a subset of somewhat extreme cases) 2) people silently getting the wrong results (and according to reports in this thread, the warning in the docs does not actually prevent this) The question is which one do we dislike more. My feeling is that in the cases where (1) comes up, it will annoy people and get them to check the docs and find the go-faster switch, while the first warning of (2) is when your spaceship crashes, so we should worry more about (2). How about as a compromise the function checks one or two closest neighbors around the interpolation point and warns/errors if those are not monotonic. Its not fool proof but should at least catch the very wrong cases with an acceptable performance loss. There are tons of ways for data to accidentally end up sorted within each local region, but unsorted overall, e.g., if you're combining results from parallel simulation runs. Such data would systematically pass this check, but still give nonsensical answers. Some actual benchmarks might be useful to the discussion. And perhaps an inplace C function ismonotone would be generally useful. Or make greater.reduce operate correctly. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] suggested change of behavior for interp
Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Jon Jonathan D. Slavin Harvard-Smithsonian CfA jsla...@cfa.harvard.edu 60 Garden Street, MS 83 phone: (617) 496-7981 Cambridge, MA 02138-1516 fax: (617) 496-7577USA ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan jsla...@cfa.harvard.eduwrote: Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Seems reasonable, although it might add a bit of execution time. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On 2013/06/04 2:05 PM, Charles R Harris wrote: On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote: Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Seems reasonable, although it might add a bit of execution time. The monotonicity test should be an option if it is available at all; when interpolating a small number of points from a large pair of arrays, the single sweep through the whole array could dominate the execution time. Checking for increasing versus decreasing, in contrast, can be done fast, so handling the decreasing case transparently is reasonable. Eric Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
Could non-monotonicity be detected as part of the interp process? Perhaps a sign switch in the deltas? I have been bitten by this problem too. Cheers! Ben Root On Jun 4, 2013 9:08 PM, Eric Firing efir...@hawaii.edu wrote: On 2013/06/04 2:05 PM, Charles R Harris wrote: On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote: Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Seems reasonable, although it might add a bit of execution time. The monotonicity test should be an option if it is available at all; when interpolating a small number of points from a large pair of arrays, the single sweep through the whole array could dominate the execution time. Checking for increasing versus decreasing, in contrast, can be done fast, so handling the decreasing case transparently is reasonable. Eric Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] suggested change of behavior for interp
On 2013/06/04 4:15 PM, Benjamin Root wrote: Could non-monotonicity be detected as part of the interp process? Perhaps a sign switch in the deltas? There are two code paths, depending on the number of points to be interpolated. When it is greater than the size of the table, the deltas are pre-computed in a single sweep. Non-monotonicity could be detected there at moderate cost. In the other code path, for a smaller number of points, the deltas are computed only as needed, so monotonicity testing would require a separate sweep through the points. That's the costly case that I think might reasonably be an option but that should not be required. Eric I have been bitten by this problem too. Cheers! Ben Root On Jun 4, 2013 9:08 PM, Eric Firing efir...@hawaii.edu mailto:efir...@hawaii.edu wrote: On 2013/06/04 2:05 PM, Charles R Harris wrote: On Tue, Jun 4, 2013 at 12:07 PM, Slavin, Jonathan jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu mailto:jsla...@cfa.harvard.edu wrote: Hi, I would like to suggest that the behavior of numpy.interp be changed regarding treatment of situations in which the x-coordinates are not monotonically increasing. Specifically, it seems to me that interp should work correctly when the x-coordinate is decreasing monotonically. Clearly it cannot work if the x-coordinate is not monotonic, but in that case it should raise an exception. Currently if x is not increasing it simply silently fails, providing incorrect values. This fix could be as simple as a monotonicity test and inversion if necessary (plus a raise statement for non-monotonic cases). Seems reasonable, although it might add a bit of execution time. The monotonicity test should be an option if it is available at all; when interpolating a small number of points from a large pair of arrays, the single sweep through the whole array could dominate the execution time. Checking for increasing versus decreasing, in contrast, can be done fast, so handling the decreasing case transparently is reasonable. Eric Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org mailto:NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion