Hi.
It seems the problem still happens with v.0.99 (from a pending upload package
prepared for experimental) :
$ python
Python 2.7.5+ (default, Sep 17 2013, 17:31:54)
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import html5lib
>>> html5lib.parse('foo\bfoo', treebuilder='lxml')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 28, in
parse
return p.parse(doc, encoding=encoding)
File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 224, in
parse
parseMeta=parseMeta, useChardet=useChardet)
File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 93, in
_parse
self.mainLoop()
File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 183, in
mainLoop
new_token = phase.processCharacters(new_token)
File "/usr/lib/python2.7/dist-packages/html5lib/html5parser.py", line 991, in
processCharacters
self.tree.insertText(token["data"])
File "/usr/lib/python2.7/dist-packages/html5lib/treebuilders/_base.py", line
320, in insertText
parent.insertText(data)
File "/usr/lib/python2.7/dist-packages/html5lib/treebuilders/etree_lxml.py",
line 240, in insertText
builder.Element.insertText(self, data, insertBefore)
File "/usr/lib/python2.7/dist-packages/html5lib/treebuilders/etree.py", line
108, in insertText
self._element.text += data
File "lxml.etree.pyx", line 921, in lxml.etree._Element.text.__set__
(src/lxml/lxml.etree.c:41264)
File "apihelpers.pxi", line 652, in lxml.etree._setNodeText
(src/lxml/lxml.etree.c:18755)
File "apihelpers.pxi", line 1335, in lxml.etree._utf8
(src/lxml/lxml.etree.c:24545)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes
or control characters
>>>
olivier@inf-8660:~/svn/svn.debian.org/python-modules/packages/build-area$ dpkg
-l python-html5lib
Souhait=inconnU/Installé/suppRimé/Purgé/H=à garder
|
État=Non/Installé/fichier-Config/dépaqUeté/échec-conFig/H=semi-installé/W=attend-traitement-déclenchements
|/ Err?=(aucune)/besoin Réinstallation (État,Err: majuscule=mauvais)
||/ Nom Version
Architecture Description
+++-===========================================-==========================-==========================-===========================================================================================
ii python-html5lib 0.99-1 all
HTML parser/tokenizer based on the WHATWG HTML5
specification
Are you sure this is a bug ?
Would you mind checking with upstream and/or forwarding the issue there ?
Best regards,
--
Olivier BERGER
(OpenPGP: 4096R/7C5BB6A5 : http://weusepgp.info)
http://www.olivierberger.com/weblog/
_______________________________________________
Python-modules-team mailing list
[email protected]
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/python-modules-team