"Giorgio" <anothernetfel...@gmail.com> wrote in message news:23ce85921003050915p1a084c0co73d973282d8fb...@mail.gmail.com...
2010/3/5 Dave Angel <da...@ieee.org>
I think the problem is that i can't find any difference between 2 lines
quoted above:

a = u"ciao è ciao"

and

a = "ciao è ciao"
a = unicode(a)

Maybe this will help:

   # coding: utf-8

   a = "ciao è ciao"
   b = u"ciao è ciao".encode('latin-1')

a is a UTF-8 string, due to #coding line in source.
b is a latin-1 string, due to explicit encoding.

   a = unicode(a)
   b = unicode(b)

Now what will happen? unicode() uses 'ascii' if not specified, because it has no idea of the encoding of a or b. Only the programmer knows. It does not use the #coding line to decide.

#coding is *only* used to specify the encoding the source file is saved in, so when Python executes the script, reads the source and parses a literal Unicode string (u'...', u"...", etc.) the bytes read from the file are decoded using the #coding specified.

If Python parses a byte string ('...', "...", etc.) the bytes read from the file are stored directly in the string. The coding line isn't even used. The bytes will be exactly what was saved in the file between the quotes.

-Mark


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to