New submission from Stefan Behnel <[email protected]>:
In the context of better interfacing of PyPy with Cython, it appears that
simple looking things like PyTuple_GET_ITEM() are often rather involved in
PyPy's C-API implementation. However, since functions/macros like these are
used very frequently, this has an effect on the achievable performance.
It occurred to me that there are cases that involve many C-API calls where the
intention is simply to unpack a sequence (or iterable) of known length, often
just 2 or 3 items. Argument unpacking is one such situation (for which there
are appropriate C-API functions), dict item iteration or iteration over
enumerate() are other well known cases (at least in Python space). As the one
obvious way to handle the general use case, I propose the following addition of
a convenience function to the C-API:
int PyIter_Unpack(PyObject* iterable, Py_ssize_t min_unpack, Py_ssize_t
max_unpack, ...)
As indicated by the names, it's meant to unpack any iterable or iterator,
really, i.e. it would fall back to iteration if the iterable is neither a tuple
nor list, for which special handling code makes the most sense. I thought about
naming it PySequence_Unpack(), but that would imply that it should reject
unordered (or, for safety, any unknown) iterables and non-sequence iterator as
input, which IMHO would complicate matters more than it would help. A warning
about unordered iterables in the documentation should be enough. I would expect
that most users would actually know the type of sequence that they are
processing.
The "max_unpack" parameter gives the number of varargs that follows, which are
all either of type PyObject** or NULL, the latter indicating that the value is
not of interest. Non-NULL pointers will receive a new reference to the item at
the corresponding index.
The "min_unpack" parameter is made available for error checking. If less items
are found in the iterable, the function sets a ValueError and returns -1.
Assignments may or may not have taken place at this point, but no owned
references are passed back in this case. If, on successful unpacking, the
number of unpacked items is smaller than "max_unpack", all remaining item
pointers will be set to NULL. Users who do not care about the number of items
would pass 0 and those who know the exact length would pass that as both
"min_unpack" and "max_unpack".
There is one case I'm not sure about yet, and that's how to handle the case of
finding more items than "max_unpack" requests. I think it's just as convenient
in some cases to automatically raise an exception, as it is in other cases to
just ignore them. I think a way to solve this could be to not raise an
exception, but to return 0 when all items were processed and 1 when there are
remaining items. In this case, users who care could check the result and if
they consider left-over items an error, clean up the returned references and
raise an error manually. Alternatively, the function could return the number of
unpacked items, but that may involve more work on the user side in order to
find out what needs to be done. The drawback of a tristate return with and
without errors set is that the straight forward "if (PyIter_Unpack(...))" check
is no longer enough to correctly detect and propagate errors. Also, when
passing an iterator, the function would have to eat one more value in order
to determine the return code. That may not be what the caller wants.
Maybe an additional flag parameter ("check_size") could solve this. If true,
the function will check the size of sequences and report longer sequences as
errors, and for iterators, will unpack the next item and report it as error if
available. If false, additional values will be ignored for sequences and no
attempt will be made for iterators to unpack more items than requested.
Because of the questions above, and because this addition involves a certain
redundancy with what's there already (namely the argument and tuple unpacking
functions which do not work on lists or arbitrary iterables and/or raise the
wrong exceptions), I'm asking for comments before writing up a patch. Any
thoughts on this?
----------
components: Interpreter Core
messages: 154217
nosy: scoder
priority: normal
severity: normal
status: open
title: add a convenience C-API function for unpacking iterables
type: enhancement
versions: Python 3.3
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue14121>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com