[Repoze-dev] [issue103] repoze.bitblt removes doctype

Damien Baty Tue, 03 Nov 2009 18:25:53 -0800

New submission from Damien Baty <damien.b...@gmail.com>:

When rewriting image tags, repoze.bitblt removes the doctype of any (X)HTML 
content (cf. attached test). It should not.


I have found a fix for XHTML code (cf. attached patch) by changing how the 
content is parsed. However, the bug persists for HTML content (when 'try_html' 
is not enforced). I tried to use the same technique as for XHTML (using 
lxml.etree.parse() instead of lxml.html.document_fromstring()) but the 
transformed 
content then always includes a doctype. Perhaps we could then remove it when 
it was not present in the original content, but it starts to be a bit more 
complicated than it should... (I admit that I did not dig too much in lxml...)

In a nutshell, the attached patch will keep the doctype for XHTML content. For 
HTML content, the current (bogus) behaviour is kept (and the doctype is 
removed). Malthe (or anyone who uses this package), if you do not object, I'll 
commit the patch.

----------
assignedto: dbaty
files: transform.py.patch
messages: 291
nosy: dbaty
priority: bug
status: unread
title: repoze.bitblt removes doctype

__________________________________
Repoze Bugs <b...@bugs.repoze.org>
<http://bugs.repoze.org/issue103>
__________________________________

Index: transform.py
===================================================================
--- transform.py	(rÃ©vision 6960)
+++ transform.py	(copie de travail)
@@ -1,4 +1,6 @@
+import lxml.etree
 import lxml.html
+from StringIO import StringIO
 import urlparse
 
 try:
@@ -17,7 +19,7 @@
     if try_xhtml:
         try:
             parser = lxml.html.XHTMLParser(resolve_entities=False)
-            root = lxml.html.document_fromstring(body, parser=parser)
+            root = lxml.etree.parse(StringIO(body), parser)
             isxml = True
         except lxml.etree.XMLSyntaxError, e:
             root = lxml.html.document_fromstring(body)

_______________________________________________
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev

[Repoze-dev] [issue103] repoze.bitblt removes doctype

Reply via email to