[Zope-dev] Content Type Meta tag stripping in zope.pagetemplate

2012-02-22 Thread Miano Njoka
Hello all,

I'm a fairly new zope developer, came across a bug in my application
that meta http-equiv=content-type content=text/html;charset=UTF-8
/ tags were being stripped out from ZPT templates. Is there a reason
for this? This is done in the _prepare_html function of
zope.pagetemplate.pagetemplatefile.PageTemplateFile. My application
produces XHTML that contains non-ASCII characters that is then used by
other applications so it needs to have the content type set on the
document itself in addition to the HTTP headers.

Secondly, finding and stripping of the meta tag is done using a regular
expression so simply changing the order of the attributes on the
meta tag would make the reg-exp not match.

Attached is a patch that uses HTMLParser to find the content type meta
tag instead of a regex. It stops parsing the html as soon as it
encounters the required meta tag.

Miano
Index: src/zope/pagetemplate/pagetemplatefile.py
===
--- src/zope/pagetemplate/pagetemplatefile.py	(revision 124430)
+++ src/zope/pagetemplate/pagetemplatefile.py	(working copy)
@@ -23,19 +23,49 @@
 import re
 import logging
 
+from HTMLParser import HTMLParser, HTMLParseError
+
 from zope.pagetemplate.pagetemplate import PageTemplate
 
 DEFAULT_ENCODING = utf-8
 
-meta_pattern = re.compile(
-r'\s*meta\s+http-equiv=[\']?Content-Type[\']?'
-r'\s+content=[\']?([^;]+);\s*charset=([^\']+)[\']?\s*/?\s*\s*',
+meta_pattern = re.compile(r'\s*[\']?([^;]+);\s*charset=([^\']+)',
 re.IGNORECASE)
 
+
 def package_home(gdict):
 filename = gdict[__file__]
 return os.path.dirname(filename)
 
+
+class FoundMetaContentTypeTag(Exception):
+def __init__(self, value):
+self.parameter = value
+def __str__(self):
+return repr(self.parameter)
+
+
+class FindMetaContentTypeHTMLParser(HTMLParser):
+def __init__(self):
+HTMLParser.__init__(self)
+self.content_type = None
+self.encoding = DEFAULT_ENCODING
+
+def handle_startendtag(self, tag, attrs):
+if tag == meta:
+http_equiv = [a[1] for a in attrs if a[0] == http-equiv]
+if http_equiv and http_equiv[0].lower() == content-type:
+content = [a[1] for a in attrs if a[0] == content]
+if content:
+match = meta_pattern.search(content[0])
+if match is not None:
+self.content_type, self.encoding = match.groups()
+raise FoundMetaContentTypeTag(Content Type Meta tag found)
+
+def get_params(self):
+return self.content_type, self.encoding
+
+
 class PageTemplateFile(PageTemplate):
 Zope wrapper for filesystem Page Template using TAL, TALES, and METAL
 
@@ -57,16 +87,16 @@
 return path
 
 def _prepare_html(self, text):
-match = meta_pattern.search(text)
-if match is not None:
-type_, encoding = match.groups()
-# TODO: Shouldn't meta/?xml? stripping
-# be in PageTemplate.__call__()?
-text = meta_pattern.sub(, text)
-else:
-type_ = None
-encoding = DEFAULT_ENCODING
-return unicode(text, encoding), type_
+parser = FindMetaContentTypeHTMLParser()
+content_type = None
+encoding = DEFAULT_ENCODING
+try:
+parser.feed(text)
+except FoundMetaContentTypeTag:
+content_type, encoding = parser.get_params()
+except HTMLParseError:
+pass
+return unicode(text, encoding), content_type
 
 def _read_file(self):
 __traceback_info__ = self.filename
Index: src/zope/pagetemplate/tests/test_ptfile.py
===
--- src/zope/pagetemplate/tests/test_ptfile.py	(revision 124430)
+++ src/zope/pagetemplate/tests/test_ptfile.py	(working copy)
@@ -161,7 +161,9 @@
 self.failUnlessEqual(rendered.strip(),
 uhtmlheadtitle
 u\u0422\u0435\u0441\u0442
-u/title/head/html)
+u'/titlemeta http-equiv=Content-Type'
+u' content=text/html; charset=windows-1251 /'
+u/head/html)
 
 def test_xhtml(self):
 pt = self.get_pt(
@@ -176,7 +178,9 @@
 self.failUnlessEqual(rendered.strip(),
 uhtmlheadtitle
 u\u0422\u0435\u0441\u0442
-u/title/head/html)
+u'/titlemeta http-equiv=Content-Type'
+u' content=text/html; charset=windows-1251 /'
+u/head/html)
 
 
 
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Content Type Meta tag stripping in zope.pagetemplate

2012-02-22 Thread Fred Drake
On Wed, Feb 22, 2012 at 10:28 AM, Miano Njoka mianonj...@gmail.com wrote:
 meta http-equiv=content-type content=text/html;charset=UTF-8
 / tags were being stripped out from ZPT templates. Is there a reason
 for this?

As I recall, the rationale goes like this:

1. We're sniffing the input encoding from the charset setting.

2. We're storing the content-type on the instance (I hope tihs
   is still true).

3. The template/application/publisher is responsible for
   delivering the the output with an appropriate content-type
   header.


-- 
Fred L. Drake, Jr.    fred at fdrake.net
A storm broke loose in my mind.  --Albert Einstein
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


[Zope-dev] zope-tests - OK: 39

2012-02-22 Thread Zope tests summarizer
This is the summary for test reports received on the 
zope-tests list between 2012-02-21 00:00:00 UTC and 2012-02-22 00:00:00 UTC:

See the footnotes for test reports of unsuccessful builds.

An up-to date view of the builders is also available in our 
buildbot documentation: 
http://docs.zope.org/zopetoolkit/process/buildbots.html#the-nightly-builds

Reports received


   ZTK 1.0 / Python2.4.6 Linux 64bit
   ZTK 1.0 / Python2.5.5 Linux 64bit
   ZTK 1.0 / Python2.6.7 Linux 64bit
   ZTK 1.0dev / Python2.4.6 Linux 64bit
   ZTK 1.0dev / Python2.5.5 Linux 64bit
   ZTK 1.0dev / Python2.6.7 Linux 64bit
   ZTK 1.1 / Python2.5.5 Linux 64bit
   ZTK 1.1 / Python2.6.7 Linux 64bit
   ZTK 1.1 / Python2.7.2 Linux 64bit
   Zope 3.4 KGS / Python2.4.6 64bit linux
   Zope 3.4 KGS / Python2.5.5 64bit linux
   Zope 3.4 Known Good Set / py2.4-32bit-linux
   Zope 3.4 Known Good Set / py2.4-64bit-linux
   Zope 3.4 Known Good Set / py2.5-32bit-linux
   Zope 3.4 Known Good Set / py2.5-64bit-linux
   Zope-2.10 Python-2.4.6 : Linux
   Zope-2.11 Python-2.4.6 : Linux
   Zope-2.12 Python-2.6.6 : Linux
   Zope-2.12-alltests Python-2.6.6 : Linux
   Zope-2.13 Python-2.6.6 : Linux
   Zope-2.13-alltests Python-2.6.6 : Linux
   Zope-trunk Python-2.6.6 : Linux
   Zope-trunk-alltests Python-2.6.6 : Linux
   winbot / ZODB_dev py_265_win32
   winbot / ZODB_dev py_265_win64
   winbot / ZODB_dev py_270_win32
   winbot / ZODB_dev py_270_win64
   winbot / ztk_10 py_254_win32
   winbot / ztk_10 py_265_win32
   winbot / ztk_10 py_265_win64
   winbot / ztk_11 py_254_win32
   winbot / ztk_11 py_265_win32
   winbot / ztk_11 py_265_win64
   winbot / ztk_11 py_270_win32
   winbot / ztk_11 py_270_win64
   winbot / ztk_dev py_265_win32
   winbot / ztk_dev py_265_win64
   winbot / ztk_dev py_270_win32
   winbot / ztk_dev py_270_win64

Non-OK results
--

___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )


Re: [Zope-dev] Content Type Meta tag stripping in zope.pagetemplate

2012-02-22 Thread Miano Njoka
On Wed, Feb 22, 2012 at 8:08 PM, Fred Drake f...@fdrake.net wrote:
 On Wed, Feb 22, 2012 at 10:28 AM, Miano Njoka mianonj...@gmail.com wrote:
 meta http-equiv=content-type content=text/html;charset=UTF-8
 / tags were being stripped out from ZPT templates. Is there a reason
 for this?

 As I recall, the rationale goes like this:

 1. We're sniffing the input encoding from the charset setting.

 2. We're storing the content-type on the instance (I hope tihs
   is still true).

 3. The template/application/publisher is responsible for
   delivering the the output with an appropriate content-type
   header.


Yes, this is true, but why strip out the meta tag from the resulting HTML?
___
Zope-Dev maillist  -  Zope-Dev@zope.org
https://mail.zope.org/mailman/listinfo/zope-dev
**  No cross posts or HTML encoding!  **
(Related lists -
 https://mail.zope.org/mailman/listinfo/zope-announce
 https://mail.zope.org/mailman/listinfo/zope )