Ezio Melotti added the comment:
If you want to specify codepoints greater than U+ you have to use
u'\U':
>>> x = u'\u10380'
>>> x.encode('utf-8')
'\xe1\x80\xb80'
>>> x[0]
u'\u1038'
>>> x[1]
u'0'
>>> y = u'\U00010380'
>>> y.encode('utf-8')
'\xf0\x90\x8e\x80'
--
nosy: +ezio.me
New submission from Mahmoud :
Odd behaviour with str.encode or codecs.Codec.encode or simailar
functions, when dealing with uncode objects above
with 2.6
>>> u'\u10380'.encode('utf')
'\xe1\x80\xb80'
with 3.x
'\u10380'.encode('utf')
'\xe1\x80\xb80'
correct output must be:
\xf0\x90\x8e\x80