On 6/6/14 1:11 PM, Marko Rauhamaa wrote:
Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info>:

On Fri, 06 Jun 2014 18:32:39 +0300, Marko Rauhamaa wrote:
Unicode, like ASCII, is a code. Representing text in unicode is
encoding.

A Unicode string as an abstract data type has no encoding.

Unicode itself is an encoding. See it in action here:

     72 101 108 108 111 44 32 119 111 114 108 100

It is a Platonic ideal, a pure form like the real numbers.

Far from it. It is a mapping from symbols to integers. The symbols are
the Platonic ones.

The Unicode/ASCII encoding above represents the same "Platonic" string
as this ESCDIC one:

     212 133 147 147 150 107 64 166 150 153 137 132

Unicode string like this:

s = u"NOBODY expects the Spanish Inquisition!"

should not be thought of as a bunch of bytes in some encoding,

Encoding is not tied to bytes or even computers. People can speak in
code, after all.



Marko, you are right about the broader English meaning of the word "encoding". The original point here was that "Unicode text" provides no information about what sequence of bytes is at work.

In the Unicode ecosystem, an encoding is a specification of how the text will be represented in a byte stream. Saying something is "Unicode" doesn't provide that information. You have to say, "UTF8" or "UTF16" or "UCS2", etc, in order to know how bytes will be involved.

When Ethan said, "a Unicode string, as a data type, has no encoding," he meant (as he explained) that a Unicode string doesn't require or imply any particular mapping to bytes.

I'm sure you understand this, I'm just trying to clarify the different meanings of the word "encoding".


Marko



--
Ned Batchelder, http://nedbatchelder.com

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to