I did some work at last year's Google sprint on removing the simple slicing API (__getslice__, tp_as_sequence->sq_slice) in favour of the more flexible sliceobject API (__getitem__ and tp_as_mapping->mp_subscript using slice objects as index.) For some more detail, see the semi-PEP below. (I hesitate to call it a PEP because it's way past the Py3k PEP deadline, but the email I was originally going to send on this subject grew in such a size that I figured I might as well use PEP layout and use the opportunity to record some best practices and behaviour. And the change should probably be recorded in a PEP anyway, even though it has never been formally proposed, just taken as a given.)
If anyone is bored and/or interested in doing some complicated work, there is still a bit of (optional) work to be done in this area: I uploaded patches to be applied to the trunk SF 8 months ago -- extended slicing support for a bunch of types. Some of that extended slicing support is limited to step-1 slices, though, most notably UserString.MutableString and ctypes. I can guarantee adding non-step-1 support to them is a challenging and fulfilling exercise, having done it for several types, but I can't muster the intellectual stamina to do it for these (to me) fringe types. The patches can be found in Roundup: http://bugs.python.org/issue?%40search_text=&title=&%40columns=title&id=&%40columns=id&creation=&creator=twouters&activity=&%40columns=activity&%40sort=activity&actor=&type=&components=&versions=&severity=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&%40pagesize=50&%40startwith=0&%40action=search(there doesn't seem to be a shorter URL; just search for issues created by 'twouters' instead.) If nobody cares, I will be checking these patches into the trunk this weekend (after updating them), and then update and check in the rest of the p3yk-noslice branch into the py3k branch. Abstract ======== This proposal discusses getting rid of the two types of slicing Python uses, ``simple`` and ``extended``. Extended slicing was added later, and uses a different API at both the C and the Python level for backward compatibility. Extended slicing can express everything simple slicing can express, however, making the simple slicing API practically redundant. A Tale of Two APIs ================== Simple slicing is a slice operation without a step, Ellipsis or tuple of slices -- the archetypical slice of just `start` and/or `stop`, with a single colon separating them and both sides being optional:: L[1:3] L[2:] L[:-5] L[:] An extended slice is any slice that isn't simple:: L[1:5:2] L[1:3, 8:10] L[1, ..., 5:-2] L[1:3:] (Note that the presence of an extra colon in the last example makes the very first simple slice an extended slice, but otherwise expresses the exact same slicing operation.) In applying a simple slice, Python does the work of translating omitted, out of bounds or negative indices into the appropriate actual indices, based on the length of the sequence. The normalized ``start`` and ``stop`` indices are then passed to the appropriate method: ``__getslice__``, ``__setslice__`` or ``__delslice__`` for Python classes, ``tp_as_sequence``'s ``sq_slice`` or ``sq_ass_slice`` for C types. For extended slicing, no special handling of slice indices is done. The indices in ``start:stop:step`` are wrapped in a ``slice`` object, with missing indices represented as None. The indices are otherwise taken as-is. The sequence object is then indexed with the slice object as if it were a mapping: ``__getitem__``,`` __setitem__`` or ``__delitem__`` for Python classes, ``tp_as_mapping``'s ``mp_subscript`` or ``mp_ass_subscript``. It is entirely up to the sequence to interpret the meaning of missing, out of bounds or negative indices, let alone non-numerical indices like tuples or Ellipsis or arbitrary objects. Since at least Python 2.1, applying a simple slice to an object that does not implement the simple slicing API will fall back to using extended slicing, calling __getitem__ (or mp_subscript) instead of __getslice__ (or sq_slice), and similarly for slice assignment/deletion. Problems ======== Aside from the obvious disadvantage of having two ways to do the same thing, simple slicing is an inconvenient wart for several reasons: 1) It (passively) promotes supporting only simple slicing, as observed by the builtin types only supporting extended slicing many years after extended slicing was introduced. 2) The Python VM dedicates 12 of its opcodes, about 11%, to support simple slicing, and effectively reserves another 13 for code convenience. Reducing the Big Switch in the bytecode interpreter would certainly not hurt Python performance. 5) The same goes for the number of functions, macros and function-pointers supporting simple slicing, although the impact would be maintainability and readability of the source rather than performance. Proposed Solution ================= The proposed solution, as implemented in the p3yk-noslice SVN branch, gets rid of the simple slicing methods and PyType entries. The simple C API (using ``Py_ssize_t`` for start and stop) remains, but creates a slice object as necessary instead. Various types had to be updated to support slice objects, or improve the simple slicing case of extended slicing. The result is that ``__getslice__``, ``__setslice__`` and ``__delslice__`` are no longer called in any situation. Classes that delegate ``__getitem__`` (or the C equivalent) to a sequence type get any slicing behaviour of that type for free. Classes that implement their own slicing will have to be modified to accept slice objects and process the indices themselves. This means that at the C level, like is already the case at the Python level, the same method is used for mapping-like access as for slicing. C types will still want to implement ``tp_as_sequence->sq_item``, but that function will only be called when using the ``PySequence_*Item()`` API. Those API functions do not (yet) fall back to using ``tp_as_mapping->mp_subscript``, although they possibly should. A casualty of this change is ``PyMapping_Check()``. It used to check for ``tp_as_mapping`` being available, and was modified to check for ``tp_as_mapping`` but *not* ``tp_as_sequence->sq_slice`` when extended slicing was added to the builtin types. It could conceivably check for ``tp_as_sequence->sq_item`` instead of ``sq_slice``, but the added value is unclear (especially considering ABCs.) In the standard library and CPython itself, ``PyMapping_Check()`` is used mostly to provide early errors, for instance by checking the arguments to ``exec()``. Alternate Solution ------------------ A possible alternative to removing simple slicing completely, would be to introduce a new typestruct hook, with the same signature as ``tp_as_mapping->mp_subscript``, which would be called for slicing operations. All as-mapping index operations would have to fall back to this new ``sq_extended_slice`` hook, in order for ``seq[slice(...)]`` to work as expected. For some added efficiency and error-checking, expressions using actual slice syntax could compile into bytecodes specific for slicing (of which there would only be three, instead of twelve.) This approach would simplify C types wanting to support extended slicing but not arbitrary-object indexing (and vice-versa) somewhat, but the benefit seems too small to warrant the added complexity in the CPython runtime itself. Implementing Extended Slicing ============================= Supporting extended slicing in C types is not as easily done as supporting simple slicing. There are a number of edgecases in interpreting the odder combinations of ``start``, ``stop`` and ``step``. This section tries to give some explanations and best practices. Extended Slicing in C --------------------- Because the mapping API takes precedence over the sequence API, any ``tp_as_mapping->mp_subscript`` and ``tp_as_mapping->mp_ass_subscript`` functions need to proper typechecks on their argument. In Python 2.5 and later, this is best done using ``PyIndex_Check()`` and ``PySlice_Check()`` (and possibly ``PyTuple_Check()`` and comparison against ``Py_Ellipsis``.) For compatibility with Python 2.4 and earlier, ``PyIndex_Check()`` would have to be replaced with ``PyInt_Check()`` and ``PyLong_Check()``. Indices that pass ``PyIndex_Check()`` should be converted to a ``Py_ssize_t`` using ``PyIndex_AsSsizeT()`` and delegated to ``tp_as_sequence->sq_item``. (For compatibility with Python 2.4, use ``PyNumber_AsLong()`` and downcast to an ``int`` instead.) The exact meaning of tuples of slices, and of Ellipsis, is up to the type, as no standard-library types support it. It may be useful to use the same convention as the Numpy package. Slices inside tuples, if supported, should probably follow the same rules as direct slices. >From slice objects, correct indices can be extracted with ``PySlice_GetIndicesEx()``. Negative and out-of-bounds indices will be adjusted based on the provided length, but a negative ``step``, and a ``stop`` before a ``step`` are kept as-is. This means that, for a getslice operation, a simple for-loop can be used to visit the correct items in the correct order:: for (cur = start, i = 0; i < slicelength; cur += step, i++) dest[i] = src[cur]; If ``PySlice_GetIndicesEx()`` is not appropriate, the individual indices can be extracted from the ``PySlice`` object. If the indices are to be converted to C types, that should be done using ``PyIndex_Check()``, ``PyIndex_AsSsizeT()`` and the ``Py_ssize_t`` type, except that ``None`` should be accepted as the default value for the index. For deleting slices (``mp_ass_subscript`` called with ``NULL`` as value) where the order does not matter, a reverse slice can be turned into the equivalent forward slice with:: if (step < 0) { stop = start + 1; start = stop + step*(slicelength - 1) - 1; step = -step; } For slice assignment with a ``step`` other than 1, it's usually necessary to require the source iterable to have the same length as the slice. When assigning to a slice of length 0, care needs to be taken to select the right insertion point. For a slice S[5:2], the correct insertion point is before index 5, not before index 2. For both deleting slice and slice assignment, it is important to remember arbitrary Python code may be executed when calling Py_DECREF() or otherwise interacting with arbitrary objects. Because of that, it's important your datatype stays consistent throughout the operation. Either operate on a copy of your datatype, or delay (for instance) Py_DECREF() calls until the datatype is updated. The latter is usually done by keeping a scratchpad of to-be-DECREF'ed items. Extended slicing in Python -------------------------- The simplest way to support extended slicing in Python is by delegating to an underlying type that already supports extended slicing. The class can simply index the underlying type with the slice object (or tuple) it was indexed with. Barring that, the Python code will have to pretty much apply the same logic as the C type. ``PyIndex_AsSsizeT()`` is available as ``operator.index()``, with a ``try/except`` block replacing ``PyIndex_Check()``. ``isinstance(o, slice)`` and ``sliceobj.indices()`` replace ``PySlice_Check()`` and ``PySlice_GetIndices()``, but the slicelength (which is provided by ``PySlice_GetIndicesEx()``) has to be calculated manually. Testing extended slicing ------------------------ Proper tests of extended slicing capabilities should at least include the following (if the operations are supported), assuming a sequence of length 10. Triple-colon notation is used everywhere so it uses extended slicing even in Python 2.5 and earlier:: S[2:5:] (same as S[2:5]) S[5:2:] (same as S[5:2], an empty slice) S[::] (same as S[:], a copy of the sequence) S[:2:] (same as S[:2]) S[:11:] (same as S[:11], a copy of the sequence) S[5::] (same as S[5:]) S[-11::] (same as S[-11:], a copy of the sequence) S[-5:2:1] (same as S[:2]) S[-5:-2:2] (same as S[-5:-2], an empty slice) S[5:2:-1] (the reverse of S[2:4]) S[-2:-5:-1] (the reverse of S[-4:-1]) S[:5:2] ([ S[0], S[2], S[4] ])) S[9::2] ([ S[9] ]) S[8::2] ([ S[8] ]) S[7::2] ([ S[7], S[9]]) S[1::-1] ([ S[1], S[0] ]) S[1:0:-1] ([ S[1] ], does not include S[0]!) S[1:-1:-1] (an empty slice) S[::10] ([ S[0] ]) S[::-10] ([ S[9] ]) S[2:5:] = [1, 2, 3] ([ S[2], S[3], S[4] ] become [1, 2, 3]) S[2:5:] = [1] (S[2] becomes 1, S[3] and S[4] are deleted) S[5:2:] = [1, 2, 3] ([1, 2, 3] inserted before S[5]) S[2:5:2] = [1, 2] ([ S[2], S[4] ] become [1, 2]) S[5:2:-2] = [1, 2] ([ S[3], S[5] ] become [2, 1]) S[3::3] = [1, 2, 3] ([ S[3], S[6], S[9] ] become [1, 2, 3]) S[:-5:-2] = [1, 2] ([ S[7], S[9] ] become [2, 1]) S[::-1] = S (reverse S in-place awkwardly) S[:5:] = S (replaces S[:5] with a copy of S) S[2:5:2] = [1, 2, 3] (error: assigning length-3 to slicelength-2) S[2:5:2] = None (error: need iterable)
_______________________________________________ Python-3000 mailing list [email protected] http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com
