Steven D'Aprano writes: > On Fri, Jan 17, 2014 at 11:19:44AM +0900, Stephen J. Turnbull wrote:
> > "ASCII compatible" is a technical term in encodings, which means > > "bytes in the range 0-127 always have ASCII coded character semantics, > > do what you like with bytes in the range 128-255."[1] > > Examples, and counter-examples, may help. Let me see if I have got this > right: an ASCII-compatible encoding may be an ASCII-superset like > Latin-1, or a variable-width encoding like UTF-8 where the ASCII chars > are encoded to the same bytes as ASCII, and non-ASCII chars are not. A > counter-example would be UTF-16, or some of the Asian encodings like > Big5. Am I right so far? All correct. > But Nick isn't talking about an encoding, he's talking about a data > format. I think that an ASCII-compatible format means one where (in at > least *some* parts of the data) bytes between 0 and 127 have the same > meaning as in ASCII, e.g. byte 84 is to be interpreted as ASCII > character "T". This doesn't mean that every byte 84 means "T", only that > some of them do -- hopefully a well-defined sections of the data. Below, > you introduce the term "ASCII segments" for these. Yes, except that I believe Nick, as well as the "file-and-wire guys", strengthen "hopefully well-defined" to just "well-defined". > > <specified bytes methods> are designed for use *only* on bytes > > that are ASCII segments; use on other data is likely to cause > > hard-to-diagnose corruption. > > An example: if you have the byte b'\x63', calling upper() on that will > return b'\x43'. That is only meaningful if the byte is intended as the > ASCII character "c". Good example. _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com