I find these to be true statements, but I don't see how they support or refute that which came before.
On Sun, Nov 18, 2012 at 3:58 PM, Philippe Verdy <[email protected]> wrote: > The same chapter makes a normative reference to ISO/IEC 2022 for C0 > controls, it does not say that this concerns ISO/IEC 8859 (which does not > reference itself ISO/IEC 2022 as being normative, but only informational > just to day that it is compatible with it, as well as with ISO 6429, and a > wide range of other international or national norms and various private > standards, but not all of them : e.g. the VISCII national standard is not > compatible with ISO/IEC 2022). > > > > 2012/11/17 Buck Golemon <[email protected]> > >> > So don't say that there are one-for-one equivalences. >> >> I was just quoting this section of the standard: >> http://www.unicode.org/versions/Unicode6.2.0/ch16.pdf >> >> > There is a simple, one-to-one mapping between 7-bit (and 8-bit) control >> codes and the Unicode control codes: every 7-bit (or 8-bit) control code is >> numerically equal to its corresponding Unicode code point. >> >> A one-to-one equivalency between bytes and unicode-points is exactly what >> is specified here, limited to the domain of "8-bit control codes". >> >> >> On Fri, Nov 16, 2012 at 9:48 PM, Philippe Verdy <[email protected]>wrote: >> >>> If you are thinking about "byte values" you are working at the encoding >>> scheme level (in fact another lower level which defines a protocol >>> presentation layer, e.g. "transport syntaxes" in MIME). Unicode codepoints >>> are conceptually not an encoding scheme, just a coded character set >>> (independant of the encoding scheme). >>> >>> Separate the levels of abstraction and you'll be much more fine. Forget >>> the apparent homonymies that exist between distinct layers of abstraction >>> and use each standard in what it is designed for (including the Unicode >>> "character/glyph model" which is not defining an encoding scheme). >>> >>> So don't say that there are one-for-one equivalences. This is wrong : >>> the adaptation layer must exist between abstraction levels and between >>> separate standards, but the Unicode standard does not specify them >>> completely (with the only exception of standard UTF encodings schemes, >>> which is just one possible adaptation across some abstraction levels, but >>> is not made to adapt alone to other standards than what is in the Unicode >>> standard itself). >>> >>> >>> >>> 2012/11/17 Buck Golemon <[email protected]> >>> >>>> On Fri, Nov 16, 2012 at 4:11 PM, Doug Ewell <[email protected]> wrote: >>>> >>>>> Buck Golemon wrote: >>>>> >>>>> Is it incorrect to say that 0x81 is a non-semantic byte in cp1252, and >>>>>> to map it to the equally-non-semantic U+81 ? >>>>>> >>>>>> This would allow systems that follow the html5 standard and use cp1252 >>>>>> in place of latin1 to continue to be binary-faithful and reversible. >>>>>> >>>>> >>>>> This isn't quite as black-and-white as the question about Latin-1. If >>>>> you are targeting HTML5, you are probably safe in treating an incoming >>>>> 0x81 >>>>> (for example) as either U+0081 or U+FFFD, or throwing some kind of error. >>>> >>>> >>>> Why do you make this conditional on targeting html5? >>>> >>>> To me, replacement and error is out because it means the system loses >>>> data or completely fails where it used to succeed. >>>> Currently there's no reasonable way for me to implement the U+0081 >>>> option other than inventing a new "cp1252+latin1" codec, which seems >>>> undesirable. >>>> >>>> >>>>> HTML5 insists that you treat 8859-1 as if it were CP1252, so it no >>>>> longer matters what the byte is in 8859-1. >>>> >>>> >>>> I feel like you skipped a step. The byte is 0x81 full stop. I agree >>>> that it doesn't matter how it's defined in latin1 (also it's not defined in >>>> latin1). >>>> The section of the unicode standard that says control codes are equal >>>> to their unicode characters doesn't mention latin1. Should it? >>>> I was under the impression that it meant any single-byte encoding, >>>> since it goes out of its way to talk about "8-bit" control codes. >>>> >>> >>> >> >

