[issue7090] encoding uncode objects greater than FFFF

2009-10-09 Thread Ezio Melotti
Ezio Melotti added the comment: If you want to specify codepoints greater than U+ you have to use u'\U': >>> x = u'\u10380' >>> x.encode('utf-8') '\xe1\x80\xb80' >>> x[0] u'\u1038' >>> x[1] u'0' >>> y = u'\U00010380' >>> y.encode('utf-8') '\xf0\x90\x8e\x80' -- nosy: +ezio.me

[issue7090] encoding uncode objects greater than FFFF

2009-10-09 Thread Mahmoud
New submission from Mahmoud : Odd behaviour with str.encode or codecs.Codec.encode or simailar functions, when dealing with uncode objects above with 2.6 >>> u'\u10380'.encode('utf') '\xe1\x80\xb80' with 3.x '\u10380'.encode('utf') '\xe1\x80\xb80' correct output must be: \xf0\x90\x8e\x80