Daniel Veillard wrote: > You have a python function calling a native function. That function returns > a string. That C string is translated to a Python string by the wrapper > using PyString_FromString(). That operation seems to be extremely expensive.
PyString basically boils down to: determine the length of the string call fast allocator copy string to area allocated by fast allocator for UTF-8 data, the steps are: determine maximum possible length of the string call fast allocator copy string to area allocated by fast allocator, character by character. handle UTF-8 code sequences. adjust size of allocated area, if necessary cElementTree has to do all this for all strings in the document, of course, and the time it takes is included in my parsing benchmark. and I guess libxml2 is doing something very similar, but using your own allocator and object layout. but parsing is one thing, using the data from Python code is another. to return data to Python, all cElementTree has to do (in the normal case) is to return the string object it created during the parse. that's a pointer copy, not a buffer copy. libxml2, in contrast, has to copy the strings once again, using Python's allocator and Python's string object layout. and if you don't cache stuff, you end up doing this every time someone accesses a node... </F> _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig