Hi all,
sorry if this is not zope specific, but can someone please explain to me the following behaviour when trying to convert an iso-8859-1 string read from a file to an utf-8 encoded one?


s='\x93test\x94' #an iso-8859-1 string
                 #\x93 and \x94 are left and right
                 #double quotation marks,
                 #as seen in a browser set to iso-8859-1
ss=unicode(s,'iso-8859-1').encode('utf-8')
gives
ss='\xc2\x93test\xc2\x94'
which is wrong (as seen in a browser set to utf-8)!

but:
u=unicode(s,'iso-8859-1')
u=u.replace(u'\x93',u'\u201C') #u201C is unicode left double quot mark
u=u.replace(u'\x94',u'\u201D') #u201d is unicode right double quot mark ss=u.encode('utf-8')
gives
ss='\xe2\x80\x9ctest\xe2\x80\x9d'
which is right (as seen in a browser set to utf-8)!

Do I have to explicitly replace all characters above \x7F ?

TIA
__peppo


_______________________________________________
Zope maillist  -  Zope@zope.org
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **
(Related lists - http://mail.zope.org/mailman/listinfo/zope-announce
http://mail.zope.org/mailman/listinfo/zope-dev )

Reply via email to