[Zope] converting bytestreams from iso-8859-1 to utf-8

Giuseppe Bonelli Sat, 06 May 2006 11:07:21 -0700

Hi all,

sorry if this is not zope specific, but can someone please explainto me the following behaviour when trying to convert an iso-8859-1 stringread from a file to an utf-8 encoded one?


s='\x93test\x94' #an iso-8859-1 string
                 #\x93 and \x94 are left and right
                 #double quotation marks,
                 #as seen in a browser set to iso-8859-1
ss=unicode(s,'iso-8859-1').encode('utf-8')
gives
ss='\xc2\x93test\xc2\x94'
which is wrong (as seen in a browser set to utf-8)!

but:
u=unicode(s,'iso-8859-1')
u=u.replace(u'\x93',u'\u201C') #u201C is unicode left double quot mark

u=u.replace(u'\x94',u'\u201D') #u201d is unicode right double quot markss=u.encode('utf-8')

gives
ss='\xe2\x80\x9ctest\xe2\x80\x9d'
which is right (as seen in a browser set to utf-8)!

Do I have to explicitly replace all characters above \x7F ?

TIA
__peppo


_______________________________________________
Zope maillist  -  [email protected]
http://mail.zope.org/mailman/listinfo/zope
**   No cross posts or HTML encoding!  **

(Related lists -http://mail.zope.org/mailman/listinfo/zope-announce

http://mail.zope.org/mailman/listinfo/zope-dev )

[Zope] converting bytestreams from iso-8859-1 to utf-8

Reply via email to