Daniel Veillard wrote:
On Fri, Jan 14, 2005 at 11:43:04AM +0100, Fredrik Lundh wrote:

Daniel Veillard wrote:

Seriously, with respect to performances one of the trouble I have seen when
doing a bit of profiling is that interning strings, i.e. the process of
taking string coming from C and turning them into Python string objects,
to be extremely costly, I don't know if it's the hash function or the way
the string hash works but it was one of the biggest cost when I tried
(with python 2.3 or 2.2 I can't remember precisely when it was).

in python, conversion and interning and hash calculations are three different things, so I'm not sure what your problem really was. but I'm curious. can you elaborate?

You have a python function calling a native function. That function returns
a string. That C string is translated to a Python string by the wrapper using PyString_FromString(). That operation seems to be extremely expensive.

That's nothing. It's even worse if you have to transform the UTF-8 strings that libxml2 delivers into Python unicode strings.:)


By the way, I'm at least one of the persons Fredrik has been mailing with as concerning the speed comparisons, as I've been implement the ElementTree API on top of libxml2. This now works, without having to clean up your memory after yourself, and with unicode strings, etc. You can also do xpath and XSLT a lot more easily with lxml.etree, though especially XSLT support is still coming together.

lxml.etree is likely to be a lot slower than a more low-level binding at various operations, but it's a ton more convenient (aka "Pythonic"). You can do things like this:

>>> from lxml import etree
>>> tree = etree.parse('ot.xml')
>>> tree.xpath('(//v)[5]/text()')
[u'And God called the light Day, and the darkness he called Night. And the evening and the morning were the first day.\n']


or, even this:

>>> result = tree.xpath('(//v)[5]')
>>> result[0].text = 'The day and night verse.'
>>> tree.xpath('(//v)[5]/text()')
[u'The day and night verse.']

i.e. the result of xpath queries are ElementTree style objects and the whole XML tree is navigable using the ElementTree API.

Regards,

Martijn
_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

Reply via email to