On Tue, 27 Sep 2011 11:25:48 +1000, Steven D'Aprano wrote: > The audience for numpy is a small minority of Python users, and they
Certainly, though I'd like to mention that scientific computing is a major success story for Python, so hopefully it's a minority with something to contribute <wink> > tend to be more sophisticated. I'm sure they can cope with two functions > with different APIs <wink> No problem with having different APIs, but in that case I'd hope the builtin wouldnt' be named linspace, to avoid confusion. In numpy/scipy we try hard to avoid collisions with existing builtin names, hopefully in this case we can prevent the reverse by having a dialogue. > While continuity of API might be a good thing, we shouldn't accept a > poor API just for the sake of continuity. I have some criticisms of the > linspace API. > > numpy.linspace(start, stop, num=50, endpoint=True, retstep=False) > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html > > * It returns a sequence, which is appropriate for numpy but in standard > Python it should return an iterator or something like a range object. Sure, no problem there. > * Why does num have a default of 50? That seems to be an arbitrary > choice. Yup. linspace was modeled after matlab's identically named command: http://www.mathworks.com/help/techdoc/ref/linspace.html but I have no idea why the author went with 50 instead of 100 as the default (not that 100 is any better, just that it was matlab's choice). Given how linspace is often used for plotting, 100 is arguably a more sensible choice to get reasonable graphs on normal-resolution displays at typical sizes, absent adaptive plotting algorithms. > * It arbitrarily singles out the end point for special treatment. When > integrating, it is just as common for the first point to be singular as > the end point, and therefore needing to be excluded. Numerical integration is *not* the focus of linspace(): in numerical integration, if an end point is singular you have an improper integral and *must* approach the singularity much more carefully than by simply dropping the last point and hoping for the best. Whether you can get away by using (desired_end_point - very_small_number) --the dumb, naive approach-- or not depends a lot on the nature of the singularity. Since numerical integration is a complex and specialized domain and the subject of an entire subcomponent of the (much bigger than numpy) scipy library, there's no point in arguing the linspace API based on numerical integration considerations. Now, I *suspect* (but don't remember for sure) that the option to have it right-hand-open-ended was to match the mental model people have for range: In [5]: linspace(0, 10, 10, endpoint=False) Out[5]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.]) In [6]: range(0, 10) Out[6]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] I'm not arguing this was necessarily a good idea, just my theory on how it came to be. Perhaps R. Kern or one of the numpy lurkers in here will pitch in with a better recollection. > * If you exclude the end point, the stepsize, and hence the values > returned, change: > > >>> linspace(1, 2, 4) > array([ 1. , 1.33333333, 1.66666667, 2. ]) > >>> linspace(1, 2, 4, endpoint=False) > array([ 1. , 1.25, 1.5 , 1.75]) > > This surprises me. I expect that excluding the end point will just > exclude the end point, i.e. return one fewer point. That is, I expect > num to count the number of subdivisions, not the number of points. I find it very natural. It's important to remember that *the whole point* of linspace's existence is to provide arrays with a known, fixed number of points: In [17]: npts = 10 In [18]: len(linspace(0, 5, npts)) Out[18]: 10 In [19]: len(linspace(0, 5, npts, endpoint=False)) Out[19]: 10 So the invariant to preserve is *precisely* the number of points, not the step size. As Guido has pointed out several times, the value of this function is precisely to steer people *away* from thinking of step sizes in a context where they are more likely than not going to get it wrong. So linspace focuses on a guaranteed number of points, and lets the step size chips fall where they may. > * The retstep argument changes the return signature from => array to => > (array, number). I think that's a pretty ugly thing to do. If linspace > returned a special iterator object, the step size could be exposed as an > attribute. Yup, it's not pretty but understandable in numpy's context, a library that has a very strong design focus around arrays, and numpy arrays don't have writable attributes: In [20]: a = linspace(0, 10) In [21]: a.stepsize = 0.1 --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) /home/fperez/<ipython-input-21-ded7f1198857> in <module>() ----> 1 a.stepsize = 0.1 AttributeError: 'numpy.ndarray' object has no attribute 'stepsize' So while not the most elegant solution (and I agree that with a different return object a different approach can be taken), I think it's a practical compromise that works well for numpy. > * I'm not sure that start/end/count is a better API than > start/step/count. Guido has argued this point quite well, I think, but let me add that many years of experience and millions of lines of numerical code beg to differ. start/end/count is *precisely* the right api for this problem, and exposing step directly is very much the wrong thing to do here. I should add that numpy does provide an 'arange' function that does match the built-in range() api, but returns an array instead of a list/ iterator. This function does happen to allow for floating-point steps, but does come with the following warning about them in its docstring: Docstring: arange([start,] stop[, step,], dtype=None, maskna=False) Return evenly spaced values within a given interval. Values are generated within the half-open interval ``[start, stop)`` (in other words, the interval including `start` but excluding `stop`). For integer arguments the function is equivalent to the Python built-in `range <http://docs.python.org/lib/built-in-funcs.html>`_ function, but returns a ndarray rather than a list. When using a non-integer step, such as 0.1, the results will often not be consistent. It is better to use ``linspace`` for these cases. # END docstring > * This one is pure bike-shedding: I don't like the name linspace. Sure, in numpy's case it was chosen purely to make existing matlab users more comfortable, I think. I don't particularly like it either (I don't come from a matlab background myself), FWIW. I do hope, though, that the chosen name is *not*: - 'interval'. An interval in mathematics has a strong notion of only endpoints, containing all elements between its endpoints in the underlying ordered set. - 'interpolate' or similar: numerical interpolation is a whole 'nother topic and I think this name would be more likely to confuse people expecting function interpolation than anything. But thanks for looking into this, and I do hope that feedback from the numpy/scipy users and accumulated experience is useful. Cheers, f _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com