It is a pretty good assumption; but if BOMs are used on smaller fields the probability goes up. And to be perfectly reliable, you can't assume it.
That is one reason that the WORD JOINER was encoded, so that eventually we can use FEFF solely as a BOM. Mark ————— Γνῶθι σαυτόν — Θαλῆς [For transliteration, see http://oss.software.ibm.com/cgi-bin/icu/tr] http://www.macchiato.com ----- Original Message ----- From: "Doug Ewell" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]> Cc: "Mark Davis" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Wednesday, April 10, 2002 22:35 Subject: Re: MS/Unix BOM FAQ again (small fix) > Mark Davis <[EMAIL PROTECTED]> wrote: > > > - when one of the BOM-allowing UTFs starts with a BOM, you know the > > encoding*, and you strip off the BOM when you get the content. > > > > *assuming that no UTF-16 file has U+0000 as the first character. > > In the real world, this is a pretty good assumption -- almost as good, > in fact, as the one I've been stating for years: that no Unicode file > will have a zero-width no-break space (intended as such) as the first > character. > > -Doug Ewell > Fullerton, California > > > >

