According to Giuseppe Bonelli:
> sorry if this is not zope specific, but can someone please explain 
> to me the following behaviour when trying to convert an iso-8859-1 string 
> read from a file to an utf-8 encoded one?
> s='\x93test\x94' #an iso-8859-1 string
>                   #\x93 and \x94 are left and right
>                   #double quotation marks,
>                   #as seen in a browser set to iso-8859-1

\x93 and \x94 are *not* iso-8859-1 quotation marks. See for example

Instead they seem to be from the Windows-125X  (X=0,1,...) codepage:

> ss=unicode(s,'iso-8859-1').encode('utf-8')
> gives
> ss='\xc2\x93test\xc2\x94'
> which is wrong (as seen in a browser set to utf-8)!


  >>> unicode(s,'cp1250').encode('utf-8')

is right.

> Do I have to explicitly replace all characters above \x7F ?

No, you have to use the right encodings ;-)


[EMAIL PROTECTED]                Fax: +43/1/31336/9207
Zentrum fuer Informatikdienste, Wirtschaftsuniversitaet Wien, Austria
Zope maillist  -
**   No cross posts or HTML encoding!  **
(Related lists - )

Reply via email to