Re: [Zope] Re: Zope iso-8859-1 to utf-8
Also, watcha out for the gotcha in BaseResponse.py, which can end up doing a default encoding to latin-1 in some circumstances. I really want to make that hard coded thing configurable in zope.conf at some stage... Chris Pascal Peregrina wrote: I see... And what python function would you use for conversion ? I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine. The generated documents are still sent to browsers as iso-8859-1, and are not broken. So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1 ? Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ? Thanks. Pascal -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Max M Envoyé : mardi 13 septembre 2005 14:51 À : zope@zope.org Objet : [Zope] Re: Zope iso-8859-1 to utf-8 Pascal Peregrina wrote: Hi, I have been running a Zope installation for 2 years, so there are now lots of objects, properties, etc... I would like to know what are the possible issues I may have to face if I change the default encoding for iso-8859-1 to utf-8 in ZMI. You must write a script that converts any property on any object in your site that is latin-1 to utf-8. So first find all objects you use. See what types they are. Find all text and string attributes on those opjects. Write a function that converts from latin to utf and run that on every object. The hard part will be finding all the attributes, but perhaps you can write a method that can help find those properties for you using introspection. -- Simplistix - Content Management, Zope Python Consulting - http://www.simplistix.co.uk ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
RE: [Zope] Re: Zope iso-8859-1 to utf-8
I see... And what python function would you use for conversion ? I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine. The generated documents are still sent to browsers as iso-8859-1, and are not broken. So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1 ? Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ? Thanks. Pascal -Message d'origine- De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de Max M Envoyé : mardi 13 septembre 2005 14:51 À : zope@zope.org Objet : [Zope] Re: Zope iso-8859-1 to utf-8 Pascal Peregrina wrote: Hi, I have been running a Zope installation for 2 years, so there are now lots of objects, properties, etc... I would like to know what are the possible issues I may have to face if I change the default encoding for iso-8859-1 to utf-8 in ZMI. You must write a script that converts any property on any object in your site that is latin-1 to utf-8. So first find all objects you use. See what types they are. Find all text and string attributes on those opjects. Write a function that converts from latin to utf and run that on every object. The hard part will be finding all the attributes, but perhaps you can write a method that can help find those properties for you using introspection. -- hilsen/regards Max M, Denmark http://www.mxm.dk/ IT's Mad Science ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev ) ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com ** ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
RE: [Zope] Re: Zope iso-8859-1 to utf-8
Pascal Peregrina wrote at 2005-9-13 14:21 +0100: I see... And what python function would you use for conversion ? unicode(iso_string, 'iso-8859-1').encode('utf-8') I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine. Strange. I had expected that non-ASCII characters were displayed in a wrong way. The generated documents are still sent to browsers as iso-8859-1, and are not broken. If you switched to utf-8, then *you* should ensure that they are sent as utf-8. So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1 This is a strange question... The problem does not lie with the characters but with their codes. The code agrees between UTF-8 and iso-8859-1 for precisely the ASCII characters (unicode chars 0-127). Unicode characters 128-255 use 2 bytes in UTF-8 but 1 in iso-8859-1. Unicode characters 256 and up can be represented encoded in UTF-8 but not iso-8859-1. ... Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ? ustring is a unicode string: stored inside Zope as unicode, sent to the browser UTF-8 encoded and expected to come back UTF-8 encoded. string is a plain (non unicode) string. It should use the encoding of your page (UTF-8, once you switched to UTF-8). -- Dieter ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )
Re: [Zope] Re: Zope iso-8859-1 to utf-8
Thanks Dieter, you are right, I was a little confused. I think that if my pages did not break so far it's because we use html entities for non standard characters... I will have to convert everything in a rush now to avoid issues :( -- (sent from my BlackBerry) -Original Message- From: Dieter Maurer [EMAIL PROTECTED] To: Pascal Peregrina [EMAIL PROTECTED] CC: 'Max M' [EMAIL PROTECTED]; zope@zope.org zope@zope.org Sent: Tue Sep 13 19:10:08 2005 Subject: RE: [Zope] Re: Zope iso-8859-1 to utf-8 Pascal Peregrina wrote at 2005-9-13 14:21 +0100: I see... And what python function would you use for conversion ? unicode(iso_string, 'iso-8859-1').encode('utf-8') I made some tests and was surprised of the results... I switched ZMI to UTF-8 (management_page_charset) and edited some of my documents / properties and all went fine. Strange. I had expected that non-ASCII characters were displayed in a wrong way. The generated documents are still sent to browsers as iso-8859-1, and are not broken. If you switched to utf-8, then *you* should ensure that they are sent as utf-8. So my question would be : which valid UTF-8 characters (for typical Western languages like English, French, Spanish, ...) would be invalid in iso-8859-1 This is a strange question... The problem does not lie with the characters but with their codes. The code agrees between UTF-8 and iso-8859-1 for precisely the ASCII characters (unicode chars 0-127). Unicode characters 128-255 use 2 bytes in UTF-8 but 1 in iso-8859-1. Unicode characters 256 and up can be represented encoded in UTF-8 but not iso-8859-1. ... Last thing, if ZMI is switched to UTF-8, then what is the difference between ustring/string, etc properties ? ustring is a unicode string: stored inside Zope as unicode, sent to the browser UTF-8 encoded and expected to come back UTF-8 encoded. string is a plain (non unicode) string. It should use the encoding of your page (UTF-8, once you switched to UTF-8). -- Dieter ** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. This footnote also confirms that this email message has been swept by MIMEsweeper for the presence of computer viruses. www.mimesweeper.com ** ___ Zope maillist - Zope@zope.org http://mail.zope.org/mailman/listinfo/zope ** No cross posts or HTML encoding! ** (Related lists - http://mail.zope.org/mailman/listinfo/zope-announce http://mail.zope.org/mailman/listinfo/zope-dev )