On Fri, Feb 24, 2012 at 09:57:57PM +0100, Charlie Clark wrote:
> Am 24.02.2012, 09:47 Uhr, schrieb Miano Njoka <mianonj...@gmail.com>:
> >While it is not essential, it is necessary in some cases where the
> >finished document will be read from disk or is used by other
> >applications eg. Deliverance[http://packages.python.org/Deliverance/].
> >In fact w3c's HTML validator throws a warning that one should declare
> >the character encoding in the document itself if it is missing.
> This is actually what the validator says:
> """
> No character encoding information was found within the document,
> either in an HTML meta element or an XML declaration. It is often
> recommended to declare the character encoding in the document
> itself, especially if there is a chance that the document will be
> read from or saved to disk, CD, etc.
> """
> As ZPT produces XHTML the proper place for any encoding declaration
> is in the XML declaration, defaulting to UTF-8, which should throw a
> validation error if incorrect.

A strong -1 for zope.pagetemplate adding <?xml ... ?> declarations

> Like much of the HTML standard the
> meta tags were never really thought through and, because invisible
> to the user, all too often copied mindlessly from one project to
> another: I have customers today with completely invalid and
> misleading meta tags of which they and the rest of the world is
> blissfully unware. And as a result browsers - the main consumers of
> the format were made fault tolerant - after all the user often had
> no idea what was causing the problem or how to rectify it. I have
> seen many examples of the server saying one think and the meta
> something else entirely. I think nearly all browsers believe what
> the server says over what's in the meta tag.

The HTML spec requires that:

  "To sum up, conforming user agents must observe the following
   priorities when determining a document's character encoding (from
   highest priority to lowest):

    1. An HTTP "charset" parameter in a "Content-Type" field.
    2. A META declaration with "http-equiv" set to "Content-Type" and a
       value set for "charset".
    3. The charset attribute set on an element that designates an
       external resource."

            -- http://www.w3.org/TR/html4/charset.html#h-5.2.2

(Aside: The rationale for this ordering, IIRC, is that it allows HTTP
servers to do on-the-fly charset conversion from one 8-bit charset to a
different one, without having to parse HTML and modify the charset name
in the <meta> declaration.)

> According to MAMA, which was instrumental in developing HTML 5 based
> on what has actually been written, the charset was set in the
> http-headersover 99 % of the time. Unfortunately, it doesn't contain
> any stats on discrepancies between the http-header and the meta.
> http://dev.opera.com/articles/view/mama
> While there is apparently a possible security risk when not
> declaring the charset I think the Pythonic principle of "there
> should be preferably one obvious way to do something" should apply
> when within Zope trying to decide the charset of a file and that
> should be well documented. I'd suggest keeping the stripping but
> implementing a more rigorous approach such as you suggest.

I'm not a big fan of the stripping.

Consider people using wget to mirror websites (or some equivalent way --
hitting Save As in a browser and selecting "Web Page (original)" instead
of "Web Page (complete)").  The Content-Type header is not going to be
saved on disk.

Why should zope.pagetemplate forbid programmers from duplicating the
charset information in the <meta> element, at least as long as that
information is correct (i.e. matches the content type)?

Marius Gedminas
http://pov.lt/ -- Zope 3/BlueBream consulting and development

Attachment: signature.asc
Description: Digital signature

Zope-Dev maillist  -  Zope-Dev@zope.org
**  No cross posts or HTML encoding!  **
(Related lists -
 https://mail.zope.org/mailman/listinfo/zope )

Reply via email to