Nick Coghlan <ncoghlan <at> gmail.com> writes: > > Short version: +1 for the first point (but for different reasons), -1 for the > rest. Use cases for advanced slicing operations are not provided by the > standard library, but by Numpy's sophisticated data manipulation capabilities. > I am glad you mentioned Numpy, because my post was mostly motivated by my Numpy experiences. Numpy's integration into the standard Python library was on the table for many years (PEP-209), but for reasons that I don't completely understand that proposal was never accepted. As I said, I don't know the real reasons for rejection, but in my view the problem with adding numpy to the standard library is that in many aspects numpy is not a package, but a different language (now complete with its own scalars and arithmetic rules that make 1/0 = 0!). Numpy is a perfect language for scientific computing, borrowing more from APL than from python, but I would rather see Py3K providing ways to implement scientific libraries without becoming an APL-like language.
> Alexander Belopolsky wrote: > <get rid of ...>> 1. l[:] syntax for shallow copy. > > I kind of agree with this one, mainly because I'd like standard library data > types to return views for slicing operations. Making a copy based on a view > is > as easy as wrapping the view in a call to the appropriate constructor. > Avoiding the memory impact of multiple slicing operations that copy data > around is much harder. > > Returning views rather than copies would also eliminate some of the use cases > for islice(). > I understand that Numpy's implementation of views was not acceptable because Python lists often relocate their storage. Maybe Py3K views will provide the way to solve that problem. > > 2. Overloading of [] syntax. The primary meaning of c[i] is: get the > > i-th item from the collection. This meaning is consistent between > > lists/tuples and dicts. The only difference is that i may not be an > > integer in the case of dict. > > The c[x] syntax isn't really overloaded - it always means "ask the container > c > for the item corresponding to subscript x" > I disagree. If type(x) is slice, then in c[x] notation (1) x is not a subscript and (2) the value of c[x] is not an item. (1) Depending on what you mean by "subscript," your statement is either a tautology or is incorrect. If subscript == whatever appears between square brackets, than x is a subscript by definition, but in my view subscript is a typographical term referring in the present context to the tradition of denoting vector components by adding a subscript to the name of the vector. I am not familiar with any scientific notation for slices. (2) c[x] is not an item: >>> c = range(10) >>> x = slice(3,6) >>> c[x] in c False This is another case where strings differ from other containers: >>> c = 'abcbdefghij' >>> c[x] in c True > For a dict, x must be hashable, but otherwise both x and the item returned > are > unconstrained. Other mappings may remove the requirement for hashability. > Yes, and Numpy is the prime example. However, in my view Numpy takes [] overloading a little bit too far. I Numpy c[x] can have the following meanings (the list is probably incomplete): 1. Traditional subscripting. Multidimentional arrays are indexed by tuples. c[x] is a scalar. 2. Projection. Happens when len(x) < rank(c) or if ellipsis is present (to allow obtaining rank-0 arrays). If you view multidimensional arrays as functions (think of "()" replacing "[]"), then projection is similar top functional.partial. c[x] is an array view of rank <= rank(c). 3. Slicing. Nominally rank-preserving, by can be combined with projection in the same expression. c[x] is an array view of rank = rank(c). 4. Special kind of reshape. As in c[newaxis]. c[x] is an array view of rank > rank(c). 5. Selection. AKA "fancy indexing": c[x] is a copy. I probably missed a few, but you get the picture. All that functionality could be implemented without asking python to add new syntax. Slicing can be a function or a method. Tuple-based indexing did not require any additional syntax, but it could easily be implemented by overloading __call__ instead of __getitem__ with an additional benefit of a natural way to support named dimensions. > Sequences use the rule that x must be either an integer (object with an > __index__ method), or a slice object. The key characteristic that > distinguishes a sequence from a general mapping is that c[0:0] == type(c)(). > ... and c[x] in c may be False. > Multi-dimensional arrays then loosen the restrictions on x imposed by > sequences slightly to also permit tuples. The key characteristic to > distinguish Numpy-style arrays from other sequences is that c[0:0] == c[0:0,]. > As I explained above that loosening was not entirely necessary. Numpy could easily use "()" syntax for that and make itself more familiar to Fortran and C++ programmers. > These behaviours aren't fundamental rules of programming that need to be > embedded in the underlying language implementation. The kinds of subscript > that makes sense may vary from container to container. Python's current > approach avoids embedding particular interpretations in the language allowing > each data structure designer to make their own decisions (hopefully guided by > the conventions used for existing data structures). > That's true, but using similar notation to perform different operations depending on the type of the operands often leads to confusion, particularly when operands are otherwise similar. This type of confusion is real: I've seen a bug report for Numeric filed by a user who realized that he cannot resize an array using slice assignment. > > Slicing is specific to lists, tuples > > and strings (I am ignoring non-built-in types for now). > > Ignoring external types when discussing slicing is a mistake. Much of > Python's > slicing design was driven by the Numpy folks, rather than the needs of the > standard library. > Isn't it a sign of a weakness in the language design when an external library dictates changes to the syntax? Recognizing ':' and '...' only inside '[]' feels a little odd. There are probably some parsing issues with making a:b a shortcut for range(a,b) and allow it anywhere, but I don't see a problem with making ... a keyword. (Parsing problems with ':' could be solved by spelling it '..' or 'to', but I know that's not going to happen:-). > > 3. Overloading of []= syntax. Similarly to #2, this is the case when > > the same notation is used to do conceptually different operations. > > In addition it provides alternative ways to do the same thing (e.g. l > > += a vs. l[len(l):] = a). > > The OOW in TOOWTDI stands for "One Obvious Way" not "Only One Way" :) > > As Josiah said, for manipulating data structures, that obvious way is > typically the appropriate methods of the collection being used. > I probably made a wrong example. My main gripe about slice assignment is that it gives a feel of slice being a view when it is not. > > 4. Extended slicing. I believe the most common use case l[::-1] was > > eliminated with the introduction of "reversed". The remaining > > functionality in case of a tuple c can be expressed as tuple(c[i] for > > i in range(start,stop, stride)). The later is more verbose than c > > [start:stop:stride], but also more flexible. > > Extended slicing was added to provide syntactic support for various > operations > on Numpy's multi-dimensional arrays. As I understand it, the later addition > of > support to the types in the standard library was more due to consistency > reasons than really compelling uses cases. In Numpy the need for extended slicing is often a sign of an inapropriate choice of dimension. For example if you often use ::2 slices of 1-d arrays, you can probably represent your data by an Nx2 matrix and refer to columns instead of ::2 slices. However, I am not against the functionality, I am only against the syntax. I would prefer writing c.slice(0, 10, by=2) instead of c[0:10:2]. -- sasha _______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
