A) treating NUL as ignorable is really deep legacy. Totally no longer appropriate for modern data. B) there are many Unicode character codes with leading or trailing or other NUL bytes, so UTF-16 and UTF-32 cannot be exchanged under the assumption of "NUL is ignorable"

A./


On 7/13/2012 2:16 PM, Philippe Verdy wrote:
Null characters are almost always avoided in interchanged plain texts.
This is not a practicle problem. The use of nulls as significant
characters is extremely exceptional, as they almost always require an
envelope format to specify data lengths. this envelope format is in a
file that is not plain-text by itself.

2012/7/13 Stephan Stiller <stephan.stil...@gmail.com>:
As an aside to the BOM discussion - something I've always been meaning to
ask.

So there is a BOM-ambiguity when a file starts with
     FF FE
and then a couple of U+0000 characters, yes? Because this could be either
UTF-16 or UTF-32 under little-endianness. Has this been pointed out and
discussed beforehand?

Because the set of BOMs in different encodings don't constitute a
prefix-free code.

Stephan






Reply via email to