On Jul 23, 2:03 am, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Fredrik Lundh wrote: > > Kanchana wrote: > > >> I tried to extract some data with xpathEval. Path contain more than > >> 100,000 elements. > > >> doc = libxml2.parseFile("test.xml") > >> ctxt = doc.xpathNewContext() > >> result = ctxt.xpathEval('//src_ref/@editions') > >> doc.freeDoc() > >> ctxt.xpathFreeContext() > > >> this will stuck in following line and will result in high usage of > >> CPU. > >> result = ctxt.xpathEval('//src_ref/@editions') > > >> Any suggestions to resolve this. > > > what happens if you just search for "//src_ref"? what happens if you > > use libxml's command line tools to do the same search? > > >> Is there any better alternative to handle large documents? > > > the raw libxml2 API is pretty hopeless; there's a much nicer binding > > called lxml: > > > http://codespeak.net/lxml/ > > > but that won't help if the problem is with libxml2 itself, though > > It may still help a bit as lxml's setup of libxml2 is pretty memory friendly > and hand-tuned in a lot of places. But it's definitely worth trying with both > cElementTree and lxml to see what works better for you. Depending on your > data, this may be fastest in lxml 2.1: > > doc = lxml.etree.parse("test.xml") > for el in doc.iter("src_ref"): > attrval = el.get("editions") > if attrval is not None: > # do something > > Stefan
Original file was 18MB, and contained 288328 element attributes for the particular path. I wonder whether for loop will cause a problem in iterating for 288328 times. -- http://mail.python.org/mailman/listinfo/python-list