Package: python-html5lib Version: 0.999-2 Severity: important (Filing this as “important” since it breaks the entire package for my use case and a reverse dependency, feel free to downgrade severity).
python-html5lib seems to have changed its API in the 0.999 upstream release. I get the following error when running planet-venus: ERROR:planet.runner:Error processing http://blogs.noname-ev.de/commandline-tools/feeds/index.rss2 ERROR:planet.runner:AttributeError: 'module' object has no attribute 'TreeBuilder' ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 472, in spiderPlanet writeCache(uri, feed_info, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 279, in writeCache reconstitute.source(xdoc.documentElement,data.feed,data.bozo,data.version) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/reconstitute.py", line 231, in source content(xsource, 'subtitle', source.get('subtitle_detail',None), bozo) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/reconstitute.py", line 167, in content parser = html5parser.HTMLParser(tree=dom.TreeBuilder) This error can be addressed by patching planet-venus as follows: --- i/planet/reconstitute.py +++ w/planet/reconstitute.py @@ -18,6 +18,7 @@ from xml.sax.saxutils import escape from xml.dom import minidom, Node from html5lib import html5parser from html5lib.treebuilders import dom +from html5lib import treebuilders import planet, config try: @@ -164,7 +165,7 @@ def content(xentry, name, detail, bozo): bozo=1 if detail.type.find('xhtml')<0 or bozo: - parser = html5parser.HTMLParser(tree=dom.TreeBuilder) + parser = html5parser.HTMLParser(tree=treebuilders.getTreeBuilder("dom")) html = parser.parse(xdiv % detail.value, encoding="utf-8") for body in html.documentElement.childNodes: if body.nodeType != Node.ELEMENT_NODE: continue But, even after that is fixed, there are errors which I wasn’t able to fix: ERROR:planet.runner:Error processing https://raumzeitlabor.de/feed/ ERROR:planet.runner:AttributeError: 'module' object has no attribute 'XHTMLSerializer' ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 472, in spiderPlanet writeCache(uri, feed_info, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 166, in writeCache scrub.scrub(feed_uri, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/scrub.py", line 137, in scrub xhtml = serializer.XHTMLSerializer(inject_meta_charset = False) The XHTMLSerializer seems to be gone entirely. I am not sure whether the HTMLSerializer is enough for what planet-venus does. Just using that did not work for me, though, I get exceptions with this patch: --- i/planet/scrub.py +++ w/planet/scrub.py @@ -131,10 +131,11 @@ def scrub(feed_uri, data): # Run this through HTML5's serializer from html5lib import html5parser, sanitizer, treebuilders from html5lib import treewalkers, serializer + from html5lib.serializer import HTMLSerializer p = html5parser.HTMLParser(tokenizer=sanitizer.HTMLSanitizer, tree=treebuilders.getTreeBuilder('dom')) doc = p.parseFragment(node.value, encoding='utf-8') - xhtml = serializer.XHTMLSerializer(inject_meta_charset = False) + xhtml = HTMLSerializer(inject_meta_charset = False) walker = treewalkers.getTreeWalker('dom') - tree = xhtml.serialize(walker(doc), encoding='utf-8') + tree = xhtml.render(walker(doc), encoding='utf-8') node['value'] = ''.join([str(token) for token in tree]) The error is: ERROR:planet.runner:Error processing http://soup.wobbl.es/rss ERROR:planet.runner:TypeError: 'NoneType' object has no attribute '__getitem__' ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 472, in spiderPlanet writeCache(uri, feed_info, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 166, in writeCache scrub.scrub(feed_uri, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/scrub.py", line 140, in scrub tree = xhtml.render(walker(doc), encoding='utf-8') ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 307, in render return b"".join(list(self.serialize(treewalker, encoding))) ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 199, in serialize for token in treewalker: ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/filters/optionaltags.py", line 18, in __iter__ type = token["type"] I then tried s/dom/lxml/ and installed python-lxml, but no dice: ERROR:planet.runner:Error processing http://soup.wobbl.es/rss ERROR:planet.runner:AttributeError: DocumentFragment instance has no attribute 'tag' ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 472, in spiderPlanet writeCache(uri, feed_info, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/spider.py", line 166, in writeCache scrub.scrub(feed_uri, data) ERROR:planet.runner: File "/usr/lib/pymodules/python2.7/planet/scrub.py", line 140, in scrub tree = xhtml.render(walker(doc), encoding='utf-8') ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 307, in render return b"".join(list(self.serialize(treewalker, encoding))) ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 199, in serialize for token in treewalker: ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/filters/optionaltags.py", line 17, in __iter__ for previous, token, next in self.slider(): ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/filters/optionaltags.py", line 9, in slider for token in self.source: ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/treewalkers/_base.py", line 144, in __iter__ details = self.getNodeDetails(currentNode) ERROR:planet.runner: File "/usr/lib/python2.7/dist-packages/html5lib/treewalkers/lxmletree.py", line 145, in getNodeDetails elif node.tag == etree.Comment: Any idea on how to fix planet-venus? Would it make sense to add some compatibility code? _______________________________________________ Python-modules-team mailing list [email protected] http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/python-modules-team

