I would like to add some information here without getting myself into the core of the discussion:
HTML recognizes a lot fewer "whitespace" characters than Java or Unicode. Different people have
different sets of "whitespace" characters.
Unicode's White_Space property (PropList.txt) contains 24 code points (Unicode 3.2) but not U+FEFF.
U+FEFF ZWNBSP is a format control (Cf), not any kind of space in the usual sense.
U+FEFF, like all Cf, is a Default_Ignorable_Code_Point (DerivedCoreProperties.txt). (That is,
sorting, searching, matching, etc. usually ignore it unless such code points are explicitly useful.)
RFC 2279 *is* being updated, see http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis-03.txt
Version -04 is supposed to be public shortly.
markus
--
Opinions expressed here may not reflect my company's positions unless otherwise noted.
- Re: BOM's at Beginning of Web ... Tex Texin
- Re: BOM's at Beginning of Web ... Tom Gewecke
- Re: BOM's at Beginning of Web ... Deborah Goldsmith
- A new font called Gentium Marion Gunn
- Re: BOM's at Beginning of Web Pages? Doug Ewell
- Re: BOM's at Beginning of Web Pages? Jungshik Shin
- Re: BOM's at Beginning of Web Pages? Tex Texin
- Re: BOM's at Beginning of Web Pages? Tex Texin
- Re: BOM's at Beginning of Web Pages? Doug Ewell
- Re: BOM's at Beginning of Web Pages? Tom Gewecke
- Re: BOM's at Beginning of Web Pages? Markus Scherer
- Re: BOM's at Beginning of Web Pages? jameskass
- Re: BOM's at Beginning of Web Pages? Roozbeh Pournader
- Re: BOM's at Beginning of Web Pages? Tex Texin
- Re: BOM's at Beginning of Web Pages? Martin Duerst
- Re: BOM's at Beginning of Web Pages? Jonathan Coxhead
- Re: BOM's at Beginning of Web Pages? Martin Duerst
- Re: BOM's at Beginning of Web Pages? jameskass
- Re: BOM's at Beginning of Web Pages? Roozbeh Pournader
- Re: BOM's at Beginning of Web Pages? jameskass
- Re: BOM's at Beginning of Web Pages? jameskass

