On Mon, May 17, 2010 at 3:00 PM, Aaron Sherman <a...@ajs.com> wrote: > FFFE and FEFF are used to manage byte-ordering, so they really shouldn't be > part of a URI (URIs should exist in a context in which byte ordering is > assured, would be my take).
Neither U+FFFE nor U+FFFF is a valid character, but U+FEFF is perfectly cromulent, if deprecated: it's the ZERO-WIDTH NON-BREAKING SPACE (U+200C ZERO WIDTH NON-JOINER is the modern replacement). The choice of byte-order mark protocol was well-considered: if U+FEFFis interpreted as a character instead of a BOM, it's a pretty harmless character. > The Unicode spec says that FFFF is guaranteed not to be a valid Unicode > character, but does not explain why. [ > http://unicode.org/charts/PDF/UFFF0.pdf] The Unicode specification is a lot more than code charts. See section 15.8, "Noncharacters", for discussion of these code points. FFFF (and U+xFFFF for all valid values of x up through 0x10) are invalid so they can be used as sentinel values within application memory, for instance. Whereas U+FFFE is illegal precisely because it's the inverse of the BOM. -- Mark J. Reed <markjr...@gmail.com>