Kanchana wrote: > Hi, > > I tried to extract some data with xpathEval. Path contain more than > 100,000 elements. > > doc = libxml2.parseFile("test.xml") > ctxt = doc.xpathNewContext() > result = ctxt.xpathEval('//src_ref/@editions') > doc.freeDoc() > ctxt.xpathFreeContext() > > this will stuck in following line and will result in high usage of > CPU. > result = ctxt.xpathEval('//src_ref/@editions') > > Any suggestions to resolve this. > > Is there any better alternative to handle large documents?
One option might be an XML database. I'm familiar with Sedna ( http://modis.ispras.ru/sedna/ ). In practice, you store the document in the database, and let the database do the extracting for you. Sedna does XQuery, which is a very nice way to get just what you want out of your document or collection of documents. Good: It's free (Apache 2.0 license) It's cross-platform (later Windows x86, Linux x86, FreeBSD, MacOS X) It has python bindings (zif.sedna at the cheese shop and others). It's pretty fast, particularly if you set-up indexes. Document and document collection size are limited only by disk space. Not so good: Sedna runs as a server. Expect to use in the range of 100M of RAM per database. A database can contain many many documents, so you probably only want one database, anyway. Disclosure: I'm the author of the zif.sedna package, and I'm interpreting the fact that I have not received much feedback as "It works pretty well" :) - Jim Washington -- http://mail.python.org/mailman/listinfo/python-list