In benchmarking PEP 393, I noticed that many UTF-8 decode
calls originate from C code with static strings, in particular
PyObject_CallMethod. Many of such calls already have been optimized
to cache a string object, however, PyObject_CallMethod remains
unoptimized since it requires a char*.

I find the ad-hoc approach of declaring and initializing variables
inadequate, in particular since it is difficult to clean up all
those string objects at interpreter shutdown.

I propose to add an explicit API to deal with such identifiers.
With this API,

    tmp = PyObject_CallMethod(result, "update", "O", other);

would be replaced with

     PyObject *tmp;
     Py_identifier(update);
     ...
     tmp = PyObject_CallMethodId(result, &PyId_update, "O", other);

Py_identifier expands to a struct

typedef struct Py_Identifier {
    struct Py_Identifier *next;
    const char* string;
    PyObject *object;
} Py_Identifier;

string will be initialized by the compiler, next and object on
first use. The new API for that will be

PyObject* PyUnicode_FromId(Py_Identifier*);
PyObject* PyObject_CallMethodId(PyObject*, Py_Identifier*, char*, ...);
PyObject* PyObject_GetAttrId(PyObject*, Py_Identifier*);
int PyObject_SetAttrId(PyObject*, Py_Identifier*, PyObject*);
int PyObject_HasAttrId(PyObject*, Py_Identifier*);

I have micro-benchmarked this; for

import time
d={}
i=d.items()
t=time.time()
for _ in range(10**6):
    i | d
print(time.time()-t)

I get a speed-up of 30% (notice that "i | d" invokes
the above PyObject_CallMethod call).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to