Hi, Gloria W wrote: > Stefan, congratulations. This is definitely useful.
Thanks! :) > Please talk a bit about the API, and how it differs/varies from > cElementTree, http://codespeak.net/lxml/dev/compatibility.html > or link to some examples. The docs are full of doctest examples. However, as lxml.html is still pretty new, its docs are not as comprehensive as those for lxml.etree yet. > For example, the node nesting, > the usage of a 'tail' for trailing text. I wonder if lxml offers more of > a DOM compliant node nesting, or if it conforms to the > conventions/oddities of ElemenTree. lxml.etree aims for ElementTree compatibility, so these things work alike. The above link describes the differences that we either cannot work around for technical reasons (or performance reasons) or that are considerate decisions where we think ElementTree is wrong. Note that the ElementTree API is more and more becoming a basis for other APIs in lxml. There is lxml.objectify, which replaces a lot of this API by something that works more like Python objects themselves (a data binding approach). lxml.html extends the API with a bunch of helper methods for link handling and also changes the way the serialisation works to better adapt it to HTML. There's also MathDOM, a MathML implementation, which was the original reason for making lxml extensible at the Element level, back in the days of lxml 0.7. The original idea was actually 'stolen' from Xist, although lxml has definitely found its own way of dealing with it. The one thing I like most about lxml is the tool integration. For example, you can use the Element API in lxml.etree or lxml.objectify or lxml.html, with any of the five path languages: ElementPath, ETXPath, XPath, CSS-Selectors or ObjectPath. I think this is a trend that should continue. Most XML/HTML formats can benefit from specialised Element classes with specially adapted or added methods, properties and even different tree behaviour, while still taking advantage of all the other tools that lxml provides. The possibilities that lxml offers here are close to unlimited (both at the Python level and at the C level) - even with the 'oddities' (as you called it) of ElementTree. I personally believe that .tail attributes are actually a big advantage, as the ignorance of text nodes simplifies the tree model considerably (well, the public one, not necessarily the internal one...) > Also show us how it differs from BeautifulSoup, which has extremely > robust unicode handling and mangled XML/HTML tag completion, but may > benchmark a bit slower. libxml2 does not have as robust support for HTML-like tag soup as BeautifulSoup, but it does a pretty good job anyway. In lxml 2.0, lxml.html comes with BeautifulSoup integration (as ElementTree does), so now you can have both: a tag soup parser and all the features of lxml. Stefan _______________________________________________ XML-SIG maillist - XML-SIG@python.org http://mail.python.org/mailman/listinfo/xml-sig