Re: xpathEval fails for large files

Fredrik Lundh Tue, 22 Jul 2008 02:20:48 -0700

Kanchana wrote:

I tried to extract some data with xpathEval. Path contain more than
100,000 elements.


doc = libxml2.parseFile("test.xml")
ctxt = doc.xpathNewContext()
result = ctxt.xpathEval('//src_ref/@editions')
doc.freeDoc()
ctxt.xpathFreeContext()

this will stuck in following line and will result in high usage of
CPU.
result = ctxt.xpathEval('//src_ref/@editions')

Any suggestions to resolve this.

what happens if you just search for "//src_ref"? what happens if youuse libxml's command line tools to do the same search?

Is there any better alternative to handle large documents?

the raw libxml2 API is pretty hopeless; there's a much nicer bindingcalled lxml:


    http://codespeak.net/lxml/

but that won't help if the problem is with libxml2 itself, though (incase you probably should check with an appropriate libxml2 forum).

there's also cElementTree (bundled with Python 2.5), but that has onlylimited xpath support in the current version.

both lxml and other implementations of the ET API supports incrementaltree parsing:


    http://effbot.org/zone/element-iterparse.htm

which handles huge documents quite nicely, but requires you to write thesearch logic in Python:


    for event, elem in ET.iterparse("test.xml"):
         if elem.tag == "src_ref" and elem.get("editions"):
             ... process element ...
             elem.clear()

</F>

--
http://mail.python.org/mailman/listinfo/python-list

Re: xpathEval fails for large files

Reply via email to