-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I havn't received much feedback on the ZPT mailing list, so I thought I'd bring it over here to a wider audience (thread is at http://mail.zope.org/pipermail/zpt/2004-March/005218.html ).
Begin forwarded message:
From: Stuart Bishop <[EMAIL PROTECTED]> Date: 29 March 2004 6:13:06 PM To: Dieter Maurer <[EMAIL PROTECTED]> Cc: [EMAIL PROTECTED] Subject: Re: [ZPT] Makeing PageTemplate's edit pages Unicode aware
On 27/03/2004, at 9:57 PM, Dieter Maurer wrote:
Stuart Bishop wrote at 2004-3-25 12:27 +1100:Currently, if you enter non-ascii text into the title or contents
fields on a PageTemplate's edit page, the data ends up stored as
an encoded string (using management_page_charset, if it is set. Unknown
encoding if it is not).
This should be easy to fix using the foo:charset:ustring notation to have Zope convert the encoded strings to Unicode. However, the file upload feature is more problematic. Should the file upload try converting the file to Unicode from UTF-8 and raise an exception if this is not possible? I personally feel this is preferable to ending up with arbitrarily enncoded document source, with no idea of the character set used.
I do not think that Zope should convert when it does not know the
encoding. I am unaware that a missing "management_page_charset"
can be interpreted as "UTF-8". If this were the case, converstion
to unicode might be correct. By the way: the HTML specification
says that uploaded files should come with a "content-type" declaration.
In this case, the charset specified there (if any) should be used
to determine the encoding.
Yes - A missing management_page_charset should probably be interpreted as either US-ASCII or ISO-8859-1. US-ASCII is probably more correct, but I would guess that most browsers will be configured to use ISO-8859-1 as their default (and this might be specified in the HTML spec?)
I guess using the charset type the browser tells us for file uploads means we can blame the browser. I don't know how this could be reliable (since text files themselves don't encode their character set unless they happen to be UTF-16 or have a BOM). I am wondering if having a file upload function is incompatible with a Unicode aware page templates product.
If management_page_charset is not set, it is unknown what charset is being used. The only way of knowing the character set of data that has been submitted is to know the character set of the form that it was submitted from. All other mechanisms do not work due to incompatibilities in how the browsers work.
Currently, if you create a page template that contains non-ASCII characters, any tal:content or tal:replace expressions that return Unicode will now raise a Unicode error. This can be demonstrated simply: <html> <div>My 2¢</div> <div tal:content="python:u'My 2\N{CENT SIGN}'">My 2¢</div> </html> These are the things I think need to be fixed in Zope's Page Templates implementation to make them Unicode aware. There may be more (?):
- It should be possible for the actual page template source to be stored as a Unicode string. Currently, there is an assert ensuring it is a traditional string.
- The title property should be a Unicode string.
- PageTemplateFile should grow an optional charset parameter, defaulting to US-ASCII.
- PageTemplate.write(text) should raise an exception if text is not either a Unicode string or an ASCII string.
- The ZopePageTemplate edit page should use Zope's :charset:ustring notation so Unicode strings get passed to its handler.
- The file upload widget needs to either be removed, or grow a charset box. I don't think either of these solutions are ideal :-(
Note that when I say 'Unicode string', we can still store ASCII text using a traditional string to save space.
My application is currently using a ZopePageTemplate subclass that has been modified to use Unicode strings for the document source and title, and it seems to be functioning just fine. Does anyone know if that "assert type(text) == type('')" in PageTemplate.write is there for a reason?
- -- Stuart Bishop <[EMAIL PROTECTED]>
http://www.stuartbishop.net/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)
iD8DBQFAdfPfAfqZj7rGN0oRAkBuAJ0WLSC3V2eL+zNzkQqBqjJ2bl5degCfe2SB DlT7NTsieQlDhVgEnHYaXp8= =6XPE -----END PGP SIGNATURE-----
_______________________________________________ Zope-Dev maillist - [EMAIL PROTECTED] http://mail.zope.org/mailman/listinfo/zope-dev ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope )