On Wed, Mar 11, 2020 at 9:28 PM Steven D'Aprano <st...@pearwood.info> wrote: > > On Wed, Mar 11, 2020 at 07:28:06AM +1100, Chris Angelico wrote: > > > That's exactly what "ASCII compatible" means. Since ASCII is a > > seven-bit encoding, an encoding is ASCII-compatible if (a) every ASCII > > character is represented by the corresponding byte value, and (b) > > every seven-bit value represents that ASCII character. > > Sorry Chris, that explanation left me more confused than I started :-( > > Let me have a go... > > The ASCII encoding is a mapping between *seven-bit numeric values* and > 128 distinct characters, some of which are human-readable: > > A = 1000001 > B = 1000010 > a = 1100001 > > and some of which are considered to be "binary" characters: > > NUL = 0000000 > SOH = 0000001 > DEL = 1111111
Correct. > In practice today, seven bits are inconvenient, so these are always > padded with a leading 0 bit. Yes, since there's no practical way to represent ASCII characters in seven-bit units, so we store those numbers in eight-bit bytes. > An encoding is compatible with ASCII if, and only if, the following is > true: > > * all 128 of the ASCII characters are handled by the encoding; > > * each of those characters are mapped to the same eight-bit value as > the ASCII encoding would use (including the leading 0 bit); Correct - this is my "(a)" condition > * no non-ASCII character is mapped to one of those eight-bit values; > > * or to something which could be confused with one of those eight-bit > values by a naive application that processed them a byte at a time. And corect - this is my "(b)" condition. Any byte value below 128 must represent the corresponding ASCII character, and nothing else. > E.g. if an encoding mapped some character ∇ to the 16-bit value: > > 01000001 11110000 > > that would not be considered ASCII-compatible, because the first byte > would be interpreted as "A" by a naive application. Exactly. > Most (all?) of the "extended ASCII" eight-bit encodings are ASCII > compatible, because they use only bytes with a leading 1 for the > non-ASCII characters. Right. ASCII-compatible and a single-byte encoding, simple, straight-forward, and easy to work with. But, of course, limited to just 128 non-ASCII characters. > UTF-8 is also ASCII compatible. > > UTF-16 and UTF-32 are *not* ASCII compatible. > > How did I go? Nailed it. And explained it far more clearly than I did. ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7E7W7LMYVSLKP7P2X6HVRO4NDK2SZCXS/ Code of Conduct: http://python.org/psf/codeofconduct/