In Python 2.5, list objects were special-cased to skip PyObject_GetItem and go straight to PyList_GET_ITEM. That special case gave made other sequences 20% slower than lists for getitem operations. The special case was removed in Python 3 (haven't checked 2.6).
Today I was tracing through how PyObject_GetItem works, trying to figure it why it's so much slower, to see if we can get back some of that speed without special-casing just one type. Below is a rough sketch of what PyObject_GetItem does for the common case of a sequence with a valid positive integer parameter, and my observations on where there is room for improvement. I'm willing to write patches and test them, but I wanted to get some general feedback first. After all, it may be the way it is for A Very Good Reason that I'm not aware of. ;-) The code paths for PyObject_SetItem and PyObject_DelItem are essentially the same, so any speed improvments to one could easily be applied to the other two operations. PyObject_GetItem ob i ob==NULL # can't be null anyway i==NULL # can't be null anyway is mapping? ob is sequence? ob PyIndex_Check i tp_as_number i tp_flags i tp_as_number->nb_index i PyNumber_AsSsize_t i PyNumber_Index i i==NULL # still can't be null PyLong_Check i i==NULL # still can't be null PyLong_Check i # redundant => v v == -1 PySequence_GetItem ob v ob==NULL # still can't be null is sequence? ob # redundant sq_item? ob sq_item ob v I think there are two avenues for speed improvement, both of which involve removing redundancy: 1) Assume the index is a PyLong until proven otherwise The PyIndex_Check in PyObject_GetItem looks pretty useless. If it returns false, PyObject_GetItem throws a type error. If we skipped the PyIndex_Check when it would have returned false, PyNumber_AsSsize_t would later call PyIndex_Check and throw a type error. Unless I'm missing something, this is a clear win and makes the code simpler. In PyNumber_AsSsize_t, we could speed up the common case by trying PyLong_Check first, and if it fails then calling PyNumber_Index. This change would make the common case faster, but make the code a few lines longer. 2) Remove some of the exessive checking for NULL unless Py_DEBUG is defined Many of the routines in abstract.c check their parameters for NULL, as a sanity check, and throw a SystemError if NULL is detected. I'm tempted to suggest making these checks only when Py_DEBUG is defined, but I suspect if you wanted it that way, you would have changed it already. ;) Assuming you really want the NULL checks in production Python, there are 5 checks for NULL even though there are only two parameters. Seems like overkill? -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com