David Hopwood wrote:

> Below '#' is used to quote from the Unicode 3.2 standard as proposed
> in PDUTR #28, and '>' is used to quote my suggested changes.


I second David's thourough, and clearly presented, contribution.

However, I have to suggest one minor improvement:

> Conformance clauses
...

>   This is what I think clauses C5 and C10 should be:
...
>   > C10 A process shall make no change in a valid code sequence other
>   >     than the possible replacement of character sequences by their
>   >     canonical-equivalent sequences, if that process purports not to
>   >     modify the interpretation of that code sequence. 
...

>   >   - Changing the bit or byte ordering when transforming between different
>   >     machine architectures does not modify the interpretation of the text.

I consider the bit ordering a hardware issue, invisible to the programmer
or the end-user; hence, I'd not mention it in this note.

W.r.t. the byte ordering, this note does apply only to UTF-16 and UTF-32 
with
a BOM.

It does not apply to UTF-8, as this format implies a particular byte 
ordering.

Neither does it apply to UTF-16LE, UTF16-BE, UTF32-LE, UTF32-BE; rather,
swapping the byte-order, in any one of these formats, amounts to trans-
forming to a different UTF, viz. UTF-16BE, UTF16-LE, UTF32-BE, and UTF-32LE,
respectively.

 >   > - Transforming to a different Unicode Transformation Format does not
 >   >   modify the interpretation of the text.

Hence, I propose the following wording for the last two notes on the
proposed C10 clause:

| - Changing the byte ordering of a string encoded in either UTF-16,
|   or UTF-32, when a Byte Order Mark is present, does not modify the
|   interpretation of the text.
|
| - Transforming to a different Unicode Transformation Format does not
|   modify the interpretation of the text. This includes transformations
|   between Unicode Transformation Formats that only differ by their
|   respective byte ordering, such as a transformation from UTF-16BE
|   to UTF-16LE (irrespective, whether the byte-ordering is explicitely
|   specified, or is implied by the target environment the string is
|   ported to).

I hope I could make my suggestion clear; improvements of my wording are
certainly possible, as I am not a native speaker of English.

Best wishes,
   Otto Stolz


Reply via email to