Re: [Zope] Re: Zope iso-8859-1 to utf-8

2005-09-23 Thread Chris Withers
Also, watcha out for the gotcha in BaseResponse.py, which can end up 
doing a default encoding to latin-1 in some circumstances.


I really want to make that hard coded thing configurable in zope.conf at 
some stage...


Chris

Pascal Peregrina wrote:

I see...  And what python function would you use for conversion ?

I made some tests and was surprised of the results... 
I switched ZMI to UTF-8 (management_page_charset) and edited some of my

documents / properties and all went fine.
The generated documents are still sent to browsers as iso-8859-1, and are
not broken.

So my question would be : which valid UTF-8 characters (for typical Western
languages like English, French, Spanish, ...) would be invalid in iso-8859-1
?

Last thing, if ZMI is switched to UTF-8, then what is the difference between
ustring/string, etc properties ?

Thanks.

Pascal

-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de
Max M
Envoyé : mardi 13 septembre 2005 14:51
À : zope@zope.org
Objet : [Zope] Re: Zope iso-8859-1 to utf-8


Pascal Peregrina wrote:


Hi,

I have been running a Zope installation for 2 years, so there are now lots
of objects, properties, etc...

I would like to know what are the possible issues I may have to face if I
change the default encoding for iso-8859-1 to utf-8 in ZMI.



You must write a script that converts any property on any object in your 
site that is latin-1 to utf-8.


So first find all objects you use. See what types they are.

Find all text and string attributes on those opjects.

Write a function that converts from latin to utf and run that on every 
object.


The hard part will be finding all the attributes, but perhaps you can 
write a method that can help find those properties for you using 
introspection.





--
Simplistix - Content Management, Zope  Python Consulting
   - http://www.simplistix.co.uk
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )


RE: [Zope] Re: Zope iso-8859-1 to utf-8

2005-09-13 Thread Pascal Peregrina
I see...  And what python function would you use for conversion ?

I made some tests and was surprised of the results... 
I switched ZMI to UTF-8 (management_page_charset) and edited some of my
documents / properties and all went fine.
The generated documents are still sent to browsers as iso-8859-1, and are
not broken.

So my question would be : which valid UTF-8 characters (for typical Western
languages like English, French, Spanish, ...) would be invalid in iso-8859-1
?

Last thing, if ZMI is switched to UTF-8, then what is the difference between
ustring/string, etc properties ?

Thanks.

Pascal

-Message d'origine-
De : [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] la part de
Max M
Envoyé : mardi 13 septembre 2005 14:51
À : zope@zope.org
Objet : [Zope] Re: Zope iso-8859-1 to utf-8


Pascal Peregrina wrote:
 Hi,
  
 I have been running a Zope installation for 2 years, so there are now lots
 of objects, properties, etc...
  
 I would like to know what are the possible issues I may have to face if I
 change the default encoding for iso-8859-1 to utf-8 in ZMI.

You must write a script that converts any property on any object in your 
site that is latin-1 to utf-8.

So first find all objects you use. See what types they are.

Find all text and string attributes on those opjects.

Write a function that converts from latin to utf and run that on every 
object.

The hard part will be finding all the attributes, but perhaps you can 
write a method that can help find those properties for you using 
introspection.


-- 

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists -
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


RE: [Zope] Re: Zope iso-8859-1 to utf-8

2005-09-13 Thread Dieter Maurer
Pascal Peregrina wrote at 2005-9-13 14:21 +0100:
I see...  And what python function would you use for conversion ?

   unicode(iso_string, 'iso-8859-1').encode('utf-8')

I made some tests and was surprised of the results... 
I switched ZMI to UTF-8 (management_page_charset) and edited some of my
documents / properties and all went fine.

Strange. I had expected that non-ASCII characters were displayed
in a wrong way.

The generated documents are still sent to browsers as iso-8859-1, and are
not broken.

If you switched to utf-8, then *you* should ensure that
they are sent as utf-8.

So my question would be : which valid UTF-8 characters (for typical Western
languages like English, French, Spanish, ...) would be invalid in iso-8859-1

This is a strange question...
The problem does not lie with the characters but with their codes.

The code agrees between UTF-8 and iso-8859-1 for precisely the
ASCII characters (unicode chars 0-127). Unicode characters
128-255 use 2 bytes in UTF-8 but 1 in iso-8859-1. Unicode characters
256 and up can be represented encoded in UTF-8 but not iso-8859-1.

 ...
Last thing, if ZMI is switched to UTF-8, then what is the difference between
ustring/string, etc properties ?

ustring is a unicode string: stored inside Zope as unicode,
sent to the browser UTF-8 encoded and expected to come back
UTF-8 encoded.

string is a plain (non unicode) string. It should use
the encoding of your page (UTF-8, once you switched to UTF-8).

-- 
Dieter
___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )


Re: [Zope] Re: Zope iso-8859-1 to utf-8

2005-09-13 Thread Pascal Peregrina
Thanks Dieter, you are right, I was a little confused.

I think that if my pages did not break so far it's because we use html
entities for non standard characters...

I will have to convert everything in a rush now to avoid issues :(

--
(sent from my BlackBerry)


-Original Message-
From: Dieter Maurer [EMAIL PROTECTED]
To: Pascal Peregrina [EMAIL PROTECTED]
CC: 'Max M' [EMAIL PROTECTED]; zope@zope.org zope@zope.org
Sent: Tue Sep 13 19:10:08 2005
Subject: RE: [Zope] Re: Zope iso-8859-1 to utf-8

Pascal Peregrina wrote at 2005-9-13 14:21 +0100:
I see...  And what python function would you use for conversion ?

   unicode(iso_string, 'iso-8859-1').encode('utf-8')

I made some tests and was surprised of the results... 
I switched ZMI to UTF-8 (management_page_charset) and edited some of my
documents / properties and all went fine.

Strange. I had expected that non-ASCII characters were displayed
in a wrong way.

The generated documents are still sent to browsers as iso-8859-1, and are
not broken.

If you switched to utf-8, then *you* should ensure that
they are sent as utf-8.

So my question would be : which valid UTF-8 characters (for typical Western
languages like English, French, Spanish, ...) would be invalid in
iso-8859-1

This is a strange question...
The problem does not lie with the characters but with their codes.

The code agrees between UTF-8 and iso-8859-1 for precisely the
ASCII characters (unicode chars 0-127). Unicode characters
128-255 use 2 bytes in UTF-8 but 1 in iso-8859-1. Unicode characters
256 and up can be represented encoded in UTF-8 but not iso-8859-1.

 ...
Last thing, if ZMI is switched to UTF-8, then what is the difference
between
ustring/string, etc properties ?

ustring is a unicode string: stored inside Zope as unicode,
sent to the browser UTF-8 encoded and expected to come back
UTF-8 encoded.

string is a plain (non unicode) string. It should use
the encoding of your page (UTF-8, once you switched to UTF-8).

-- 
Dieter


**
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**

___
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - 
 http://mail.zope.org/mailman/listinfo/zope-announce
 http://mail.zope.org/mailman/listinfo/zope-dev )