A) treating NUL as ignorable is really deep legacy. Totally no longer
appropriate for modern data.
B) there are many Unicode character codes with leading or trailing or
other NUL bytes, so UTF-16 and UTF-32 cannot be exchanged under the
assumption of "NUL is ignorable"
A./
On 7/13/2012 2:16 PM, Philippe Verdy wrote:
Null characters are almost always avoided in interchanged plain texts.
This is not a practicle problem. The use of nulls as significant
characters is extremely exceptional, as they almost always require an
envelope format to specify data lengths. this envelope format is in a
file that is not plain-text by itself.
2012/7/13 Stephan Stiller <stephan.stil...@gmail.com>:
As an aside to the BOM discussion - something I've always been meaning to
ask.
So there is a BOM-ambiguity when a file starts with
FF FE
and then a couple of U+0000 characters, yes? Because this could be either
UTF-16 or UTF-32 under little-endianness. Has this been pointed out and
discussed beforehand?
Because the set of BOMs in different encodings don't constitute a
prefix-free code.
Stephan