Re: [Tutor] Encoding

Mark Tolonen Fri, 05 Mar 2010 19:31:09 -0800

"Giorgio" <anothernetfel...@gmail.com> wrote in messagenews:23ce85921003050915p1a084c0co73d973282d8fb...@mail.gmail.com...

2010/3/5 Dave Angel <da...@ieee.org>

I think the problem is that i can't find any difference between 2 lines
quoted above:


a = u"ciao è ciao"

and

a = "ciao è ciao"
a = unicode(a)


Maybe this will help:

   # coding: utf-8

   a = "ciao è ciao"
   b = u"ciao è ciao".encode('latin-1')

a is a UTF-8 string, due to #coding line in source.
b is a latin-1 string, due to explicit encoding.

   a = unicode(a)
   b = unicode(b)

Now what will happen? unicode() uses 'ascii' if not specified, because ithas no idea of the encoding of a or b. Only the programmer knows. It doesnot use the #coding line to decide.

#coding is *only* used to specify the encoding the source file is saved in,so when Python executes the script, reads the source and parses a literalUnicode string (u'...', u"...", etc.) the bytes read from the file aredecoded using the #coding specified.

If Python parses a byte string ('...', "...", etc.) the bytes read from thefile are stored directly in the string. The coding line isn't even used.The bytes will be exactly what was saved in the file between the quotes.


-Mark


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Encoding

Reply via email to