Re: Characters that should be displayed?

Konstantin Ritt Mon, 30 Jun 2014 09:09:12 -0700

2014-06-29 22:24 GMT+03:00 Asmus Freytag <[email protected]>:

> but things get harder the more I think:
>>
>> 3. When the above text says “surrogate code points”, does that mean
>> everything outside BMP? It reads so to me, but I’m surprised that
>> characters in BMP and outside BMP have such differences, so I’m doubting my
>> English skill.
>>
>
> No, those would be supplementary code points. Surrogates are values that
> are intended to be used in pairs as code units in UTF-16. Ill-formed data
> may contain unpaired values, those are referred to as Surrogate code points.
>
>
IIRC, after HTML parsing, validating and building DOM, no any single
surrogate code point could be met in, since presence of any ill-formed data
in the Unicode text makes the whole text ill-formed.
It's a security recommendation to decoders to replace any
unpaired surrogate code point with U+FFFD instead, thus making the text
well-formed. As a side effect, the unpaired surrogate code point becomes
visible (usually as a square box fallback glyph).
What the consideration regarding U+FFFD in CSS?



Konstantin

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Characters that should be displayed?

Reply via email to