Re: BOM ambiguity?

Asmus Freytag Fri, 13 Jul 2012 14:48:47 -0700

A) treating NUL as ignorable is really deep legacy. Totally no longerappropriate for modern data.B) there are many Unicode character codes with leading or trailing orother NUL bytes, so UTF-16 and UTF-32 cannot be exchanged under theassumption of "NUL is ignorable"

A./



On 7/13/2012 2:16 PM, Philippe Verdy wrote:

Null characters are almost always avoided in interchanged plain texts.
This is not a practicle problem. The use of nulls as significant
characters is extremely exceptional, as they almost always require an
envelope format to specify data lengths. this envelope format is in a
file that is not plain-text by itself.

2012/7/13 Stephan Stiller <stephan.stil...@gmail.com>:

As an aside to the BOM discussion - something I've always been meaning to
ask.

So there is a BOM-ambiguity when a file starts with
     FF FE
and then a couple of U+0000 characters, yes? Because this could be either
UTF-16 or UTF-32 under little-endianness. Has this been pointed out and
discussed beforehand?

Because the set of BOMs in different encodings don't constitute a
prefix-free code.

Stephan

Re: BOM ambiguity?

Reply via email to