On 6/6/14 1:11 PM, Marko Rauhamaa wrote:
Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>:
On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:
Unicode, like ASCII, is a code. Representing text in unicode is
A Unicode string as an abstract data type has no encoding.
Unicode itself is an encoding. See it in action here:
72 101 108 108 111 44 32 119 111 114 108 100
It is a Platonic ideal, a pure form like the real numbers.
Far from it. It is a mapping from symbols to integers. The symbols are
the Platonic ones.
The Unicode/ASCII encoding above represents the same "Platonic" string
as this ESCDIC one:
212 133 147 147 150 107 64 166 150 153 137 132
Unicode string like this:
s = u"NOBODY expects the Spanish Inquisition!"
should not be thought of as a bunch of bytes in some encoding,
Encoding is not tied to bytes or even computers. People can speak in
code, after all.
Marko, you are right about the broader English meaning of the word
"encoding". The original point here was that "Unicode text" provides no
information about what sequence of bytes is at work.
In the Unicode ecosystem, an encoding is a specification of how the text
will be represented in a byte stream. Saying something is "Unicode"
doesn't provide that information. You have to say, "UTF8" or "UTF16" or
"UCS2", etc, in order to know how bytes will be involved.
When Ethan said, "a Unicode string, as a data type, has no encoding," he
meant (as he explained) that a Unicode string doesn't require or imply
any particular mapping to bytes.
I'm sure you understand this, I'm just trying to clarify the different
meanings of the word "encoding".
Ned Batchelder, http://nedbatchelder.com