2014-06-29 22:24 GMT+03:00 Asmus Freytag <[email protected]>: > but things get harder the more I think: >> >> 3. When the above text says “surrogate code points”, does that mean >> everything outside BMP? It reads so to me, but I’m surprised that >> characters in BMP and outside BMP have such differences, so I’m doubting my >> English skill. >> > > No, those would be supplementary code points. Surrogates are values that > are intended to be used in pairs as code units in UTF-16. Ill-formed data > may contain unpaired values, those are referred to as Surrogate code points. > > IIRC, after HTML parsing, validating and building DOM, no any single surrogate code point could be met in, since presence of any ill-formed data in the Unicode text makes the whole text ill-formed. It's a security recommendation to decoders to replace any unpaired surrogate code point with U+FFFD instead, thus making the text well-formed. As a side effect, the unpaired surrogate code point becomes visible (usually as a square box fallback glyph). What the consideration regarding U+FFFD in CSS?
Konstantin
_______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

