Re: BOM ambiguity?

2012-07-15 Thread Doug Ewell
Stephan Stiller wrote: With that in mind, there is value in documenting, however briefly, that reading FF FE 00 00 is by itself technically ambiguous. I have seen this documented many times, though I can't say for sure that it was in official Unicode literature. Even though you can never

Re: BOM ambiguity?

2012-07-15 Thread Stephan Stiller
I have seen this documented many times, though I can't say for sure that it was in official Unicode literature. Excellent, so let's have someone state whether it's in the official Unicode literature. And independent of whether it is or not, I know that some mention of the content of this

BOM ambiguity?

2012-07-13 Thread Stephan Stiller
As an aside to the BOM discussion - something I've always been meaning to ask. So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under little-endianness. Has this been pointed out

Re: BOM ambiguity?

2012-07-13 Thread Philippe Verdy
-text by itself. 2012/7/13 Stephan Stiller stephan.stil...@gmail.com: As an aside to the BOM discussion - something I've always been meaning to ask. So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16

Re: BOM ambiguity?

2012-07-13 Thread Stephan Stiller
Null characters are almost always avoided in interchanged plain texts. This is not a practicle problem. The use of nulls as significant characters is extremely exceptional Yes, but still I think that the BOM ambiguity needs to be documented. If it already is, the documentation isn't visible

Re: BOM ambiguity?

2012-07-13 Thread Asmus Freytag
is in a file that is not plain-text by itself. 2012/7/13 Stephan Stiller stephan.stil...@gmail.com: As an aside to the BOM discussion - something I've always been meaning to ask. So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because

Re: BOM ambiguity?

2012-07-13 Thread Philippe Verdy
2012/7/13 Asmus Freytag asm...@ix.netcom.com: A) treating NUL as ignorable is really deep legacy. Totally no longer appropriate for modern data. I did not say that. But modern data heavily uses bytes as fillers for padding, or as terminators in various enveloppe formats. There are some more

Re: BOM ambiguity?

2012-07-13 Thread Ken Whistler
On 7/13/2012 1:54 PM, Stephan Stiller wrote: So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under little-endianness. Has this been pointed out and discussed beforehand

Re: BOM ambiguity?

2012-07-13 Thread Philippe Verdy
: the ambiguity persists in arbitrary plain text files, but not from HTML and XML documents) 2012/7/14 Ken Whistler k...@sybase.com: On 7/13/2012 1:54 PM, Stephan Stiller wrote: So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could

Re: BOM ambiguity?

2012-07-13 Thread John W Kennedy
On Jul 13, 2012, at 4:54 PM, Stephan Stiller wrote: As an aside to the BOM discussion - something I've always been meaning to ask. So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under

Re: BOM ambiguity?

2012-07-13 Thread Stephan Stiller
So there is a BOM-ambiguity when a file starts with FF FE and then a couple of U+ characters, yes? Because this could be either UTF-16 or UTF-32 under little-endianness. Has this been pointed out and discussed beforehand? No, there is not a BOM-ambiguity. Rather, there is an English

Fwd: Re: BOM ambiguity?

2012-07-13 Thread Stephan Stiller
PS: I mean, what you (Ken W) are writing is an argument for documenting the format outside of the file proper, and that's good, but then one wouldn't/shouldn't use a BOM in the first place. So if one uses the BOM as a format indicator (not a perfect situation, I understand), that often