Thanks for this. I've studied some algorhitms and depth-first search looks as it's what I want.
2010/6/25 Stefan Behnel <[email protected]>: > Michael Ludwig, 23.06.2010 23:29: >> >> Oliver Kindernay schrieb am 23.06.2010 um 18:39 (+0200): >>> >>> I am using libxml2 HTML 4.0 parser to parse HTML and XHTML web pages. >>> I want to found specific tags (i.e a), so I have to walk through the >>> tree of parsed document. And I don't want to use recursion like in >>> this example http://xmlsoft.org/examples/tree1.c. Is there some >>> mechanism in libxml which provides parsed nodes in some queue? >> >> Sounds like you should be using a high-level approach such as XPath >> or XSLT. Forgoing the benefits provided by these technologies is like >> deliberately using flintstone to make fire. > > Not necessarily. lxml.etree (Pythonic Python bindings for libxml2) has a > pair of macros for an iterative tree traversal loop. When I introduced it, > it gave me a 30% speed-up compared to my original recursive traversal code, > and it was almost 10% faster than plain XPath at the time. See the > bench_lxml_xpath() and bench_lxml_getiterator() functions here: > > http://codespeak.net/lxml/performance.html#a-longer-example > > The code is near the end of this file (look for a long comment starting with > "depth first tree walker"): > > http://codespeak.net/svn/lxml/trunk/src/lxml/etree_defs.h > > These macros are the main reason why tree iteration is so blazingly fast in > lxml.etree. Just look at these numbers: > > http://codespeak.net/lxml/performance.html#tree-traversal > > When searching for a specific tag (and when XML-ID is not an option), a well > forged loop can be a lot faster than a generic XPath implementation. > > Stefan > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ > [email protected] > http://mail.gnome.org/mailman/listinfo/xml > _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
