David Hopwood wrote: > Below '#' is used to quote from the Unicode 3.2 standard as proposed > in PDUTR #28, and '>' is used to quote my suggested changes.
I second David's thourough, and clearly presented, contribution. However, I have to suggest one minor improvement: > Conformance clauses ... > This is what I think clauses C5 and C10 should be: ... > > C10 A process shall make no change in a valid code sequence other > > than the possible replacement of character sequences by their > > canonical-equivalent sequences, if that process purports not to > > modify the interpretation of that code sequence. ... > > - Changing the bit or byte ordering when transforming between different > > machine architectures does not modify the interpretation of the text. I consider the bit ordering a hardware issue, invisible to the programmer or the end-user; hence, I'd not mention it in this note. W.r.t. the byte ordering, this note does apply only to UTF-16 and UTF-32 with a BOM. It does not apply to UTF-8, as this format implies a particular byte ordering. Neither does it apply to UTF-16LE, UTF16-BE, UTF32-LE, UTF32-BE; rather, swapping the byte-order, in any one of these formats, amounts to trans- forming to a different UTF, viz. UTF-16BE, UTF16-LE, UTF32-BE, and UTF-32LE, respectively. > > - Transforming to a different Unicode Transformation Format does not > > modify the interpretation of the text. Hence, I propose the following wording for the last two notes on the proposed C10 clause: | - Changing the byte ordering of a string encoded in either UTF-16, | or UTF-32, when a Byte Order Mark is present, does not modify the | interpretation of the text. | | - Transforming to a different Unicode Transformation Format does not | modify the interpretation of the text. This includes transformations | between Unicode Transformation Formats that only differ by their | respective byte ordering, such as a transformation from UTF-16BE | to UTF-16LE (irrespective, whether the byte-ordering is explicitely | specified, or is implied by the target environment the string is | ported to). I hope I could make my suggestion clear; improvements of my wording are certainly possible, as I am not a native speaker of English. Best wishes, Otto Stolz

