Thank you for all your replies
On Wed, Jan 28, 2015 at 4:56 PM, Steven D'Aprano <st...@pearwood.info> wrote: > On Wed, Jan 28, 2015 at 03:05:58PM +0530, Sunil Tech wrote: > > Hi All, > > > > When i copied a text from web and pasted in the python-terminal, it > > automatically coverted into unicode(i suppose) > > > > can anyone tell me how it does? > > Eg: > > >>> p = "你好" > > >>> p > > '\xe4\xbd\xa0\xe5\xa5\xbd' > > It is hard to tell exactly, since we cannot see what p is supposed to > be. I am predicting that you are using Python 2.7, which uses > byte-strings by default, not Unicode text-strings. > > To really answer your question correctly, we need to know the operating > system and which terminal you are using, and the terminal's encoding. I > will guess a Linux system, with UTF-8 encoding in the terminal. > > So, when you paste some Unicode text into the terminal, the terminal > receives the UTF-8 bytes, and displays the characters: > > 你好 > > On my system, they display like boxes, but I expect that they are: > > CJK UNIFIED IDEOGRAPH-4F60 > CJK UNIFIED IDEOGRAPH-597D > > But, because this is Python 2, and you used byte-strings "" instead of > Unicode strings u"", Python sees the raw UTF-8 bytes. > > py> s = u'你好' # Note this is a Unicode string u'...' > py> import unicodedata > py> for c in s: > ... print unicodedata.name(c) > ... > CJK UNIFIED IDEOGRAPH-4F60 > CJK UNIFIED IDEOGRAPH-597D > py> s.encode('UTF-8') > '\xe4\xbd\xa0\xe5\xa5\xbd' > > which matches your results. > > Likewise for this example: > > py> s = u'ªîV' # make sure to use Unicode u'...' > py> for c in s: > ... print unicodedata.name(c) > ... > FEMININE ORDINAL INDICATOR > LATIN SMALL LETTER I WITH CIRCUMFLEX > LATIN CAPITAL LETTER V > py> s.encode('utf8') > '\xc2\xaa\xc3\xaeV' > > > which matches yours: > > > >>> o = 'ªîV' > > >>> o > > '\xc2\xaa\xc3\xaeV' > > > Obviously all this is confusing and harmful. In Python 3, the interpeter > defaults to Unicode text strings, so that this issue goes away. > > > -- > Steve > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor