Your message dated Thu, 08 May 2014 22:02:57 +0200
with message-id <[email protected]>
and subject line Re: Bug#735837: Please move planet-venus from Experimental to 
Sid
has caused the Debian Bug report #735837,
regarding python-html5lib 0.999-2 breaks planet-venus
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
735837: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=735837
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: python-html5lib
Version: 0.999-2
Severity: important

(Filing this as “important” since it breaks the entire package for my
 use case and a reverse dependency, feel free to downgrade severity).

python-html5lib seems to have changed its API in the 0.999 upstream
release.

I get the following error when running planet-venus:

ERROR:planet.runner:Error processing 
http://blogs.noname-ev.de/commandline-tools/feeds/index.rss2
ERROR:planet.runner:AttributeError: 'module' object has no attribute 
'TreeBuilder'
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 472, in spiderPlanet
    writeCache(uri, feed_info, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 279, in writeCache
    reconstitute.source(xdoc.documentElement,data.feed,data.bozo,data.version)
ERROR:planet.runner:  File 
"/usr/lib/pymodules/python2.7/planet/reconstitute.py", line 231, in source
    content(xsource, 'subtitle', source.get('subtitle_detail',None), bozo)
ERROR:planet.runner:  File 
"/usr/lib/pymodules/python2.7/planet/reconstitute.py", line 167, in content
    parser = html5parser.HTMLParser(tree=dom.TreeBuilder)

This error can be addressed by patching planet-venus as follows:

--- i/planet/reconstitute.py
+++ w/planet/reconstitute.py
@@ -18,6 +18,7 @@ from xml.sax.saxutils import escape
 from xml.dom import minidom, Node
 from html5lib import html5parser
 from html5lib.treebuilders import dom
+from html5lib import treebuilders
 import planet, config
 
 try:
@@ -164,7 +165,7 @@ def content(xentry, name, detail, bozo):
             bozo=1
 
     if detail.type.find('xhtml')<0 or bozo:
-        parser = html5parser.HTMLParser(tree=dom.TreeBuilder)
+        parser = 
html5parser.HTMLParser(tree=treebuilders.getTreeBuilder("dom"))
         html = parser.parse(xdiv % detail.value, encoding="utf-8")
         for body in html.documentElement.childNodes:
             if body.nodeType != Node.ELEMENT_NODE: continue

But, even after that is fixed, there are errors which I wasn’t able to
fix:

ERROR:planet.runner:Error processing https://raumzeitlabor.de/feed/
ERROR:planet.runner:AttributeError: 'module' object has no attribute 
'XHTMLSerializer'
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 472, in spiderPlanet
    writeCache(uri, feed_info, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 166, in writeCache
    scrub.scrub(feed_uri, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/scrub.py", line 
137, in scrub
    xhtml = serializer.XHTMLSerializer(inject_meta_charset = False)

The XHTMLSerializer seems to be gone entirely. I am not sure whether the
HTMLSerializer is enough for what planet-venus does. Just using that did
not work for me, though, I get exceptions with this patch:

--- i/planet/scrub.py
+++ w/planet/scrub.py
@@ -131,10 +131,11 @@ def scrub(feed_uri, data):
             # Run this through HTML5's serializer
             from html5lib import html5parser, sanitizer, treebuilders
             from html5lib import treewalkers, serializer
+            from html5lib.serializer import HTMLSerializer
             p = html5parser.HTMLParser(tokenizer=sanitizer.HTMLSanitizer,
               tree=treebuilders.getTreeBuilder('dom'))
             doc = p.parseFragment(node.value, encoding='utf-8')
-            xhtml = serializer.XHTMLSerializer(inject_meta_charset = False)
+            xhtml = HTMLSerializer(inject_meta_charset = False)
             walker = treewalkers.getTreeWalker('dom')
-            tree = xhtml.serialize(walker(doc), encoding='utf-8')
+            tree = xhtml.render(walker(doc), encoding='utf-8')
             node['value'] = ''.join([str(token) for token in tree])

The error is:

ERROR:planet.runner:Error processing http://soup.wobbl.es/rss
ERROR:planet.runner:TypeError: 'NoneType' object has no attribute '__getitem__'
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 472, in spiderPlanet
    writeCache(uri, feed_info, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 166, in writeCache
    scrub.scrub(feed_uri, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/scrub.py", line 
140, in scrub
    tree = xhtml.render(walker(doc), encoding='utf-8')
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 
307, in render
    return b"".join(list(self.serialize(treewalker, encoding)))
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 
199, in serialize
    for token in treewalker:
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/filters/optionaltags.py", line 18, 
in __iter__
    type = token["type"]

I then tried s/dom/lxml/ and installed python-lxml, but no dice:

ERROR:planet.runner:Error processing http://soup.wobbl.es/rss
ERROR:planet.runner:AttributeError: DocumentFragment instance has no attribute 
'tag'
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 472, in spiderPlanet
    writeCache(uri, feed_info, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/spider.py", 
line 166, in writeCache
    scrub.scrub(feed_uri, data)
ERROR:planet.runner:  File "/usr/lib/pymodules/python2.7/planet/scrub.py", line 
140, in scrub
    tree = xhtml.render(walker(doc), encoding='utf-8')
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 
307, in render
    return b"".join(list(self.serialize(treewalker, encoding)))
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/serializer/htmlserializer.py", line 
199, in serialize
    for token in treewalker:
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/filters/optionaltags.py", line 17, 
in __iter__
    for previous, token, next in self.slider():
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/filters/optionaltags.py", line 9, in 
slider
    for token in self.source:
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/treewalkers/_base.py", line 144, in 
__iter__
    details = self.getNodeDetails(currentNode)
ERROR:planet.runner:  File 
"/usr/lib/python2.7/dist-packages/html5lib/treewalkers/lxmletree.py", line 145, 
in getNodeDetails
    elif node.tag == etree.Comment:

Any idea on how to fix planet-venus? Would it make sense to add some
compatibility code?

--- End Message ---
--- Begin Message ---
Hi.

Aiko Barz <[email protected]> writes:

> Hello,
>
> please move this package from Experimental to Sid. The difference
> between those versions is:
>
> Sid: Not working == Bad user experience
> Exp: Working == Good user experience
>

Thanks for your report (although #735837 was filed against
python-html5lib and not planet-venus).

Having 3 reports of success with experimental version convinced me it is
probably time to move it to unstable.

So I uploaded an almost identical planet-venus as 0~git9de2109-2 in
unstable.

I think this report on python-html5lib can then be closed, now.

Best regards,

[0] http://packages.qa.debian.org/p/planet-venus/news/20140508T193350Z.html
-- 
Olivier BERGER 
http://www-public.telecom-sudparis.eu/~berger_o/ - OpenPGP-Id: 2048R/5819D7E8
Ingenieur Recherche - Dept INF
Institut Mines-Telecom, Telecom SudParis, Evry (France)

--- End Message ---
_______________________________________________
Python-modules-team mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/python-modules-team

Reply via email to