On Wed, 04 Jun 2014 11:40:11 -0700 Asmus Freytag <asm...@ix.netcom.com> wrote:
> On 6/4/2014 11:26 AM, Doug Ewell wrote: > > I meant U+FEFF as a zero-width no-break space. Obviously it is very > > common to see U+FEFF as a signature or BOM. > The semantics of it were chosen at the time to make no sense > at the start, and to make the character invisible in most situations. > The remnant of its semantic was later taken up by Word Joiner, so that > there is now NO use for this as part of text. > The use as part of a convention has always been clear. If you stick > this at the front, readers will byte-reverse your data; that should > weed out accidental use pretty quickly :) Or prevent people from > getting "cute" with it in other ways. Wrong! If you stick U+FEFF at the start of a file, expect it to be stripped. If you stick U+FFFE at the start of a file, then expect to see the rest of the text to be byte-reversed. > So, I would think that for this particular code point, you can safely > assume that it's buggy or test data. The example that's usually given is that of a text file sliced into segments to avoid file size limits. In these cases, there is the risk that U+FEFF as ZWNBSP will wind up at the start of a segment and be stripped. The solution using the Windows command window is to perform a *binary* concatenation of the segments; if one doesn't, newlines will be inserted between the segments, which is much severer damage. Richard. _______________________________________________ Unicode mailing list Unicode@unicode.org http://unicode.org/mailman/listinfo/unicode