Thank you all for this a lot of great feedback. I learned a lot.

I, however, still don’t get one thing. In the spec text:

Surrogate code points, private-use characters, and control characters are not 
given the Default_Ignorable_Code_Point property. To avoid security problems, 
such characters or code points, when not interpreted and not displayable by 
normal rendering, should be displayed in fallback rendering with a fallback 
glyph

How could displaying missing PUA glyph help security? I can imagine address bar 
could have such security risks, but this is about rendering. I can imagine 0x00 
could lead to buffer overflow attacks, but it looks to me that preventing such 
characters inserted into DOM is safer, though I admit that I’m not professional 
in security at all.

I understand some here wants to display them to help users to identify broken 
characters, some consider it doesn’t help users at all. I tend to agree with 
the later, but either way, it’s about helping users to fix their documents.

Anyone knows what security risks the spec is talking about?

/koji


On Jul 1, 2014, at 1:33 AM, Philippe Verdy 
<[email protected]<mailto:[email protected]>> wrote:

I generally agree with your comment.

For your question U+FFFD is not special in CSS, it's just a standard character 
that will be mapped to some symbol (from any font, or synthetized from an 
internal font (or collection of glyphs) of the renderer according to other 
styles (there's no warranty that syles like itelaic or bold will look 
different, in fact there's no good way to exhibit alternatives if the renderer 
does not lookup a matching font, but at least the renderer should size it 
according to the computed "font-size:" setting). That symbol is often (but not 
necessaily a "white" question mark in a "black" diamond; replace "white" in 
fact by background color/image/shades, and "black" by the "color:" setting, 
just like in regular fonts mapping any other symbol).
This symbol should also have an inherited direction, not a strong LTR 
direction: it should not alter the direction of text on either side (or break 
runs of text) for Bidi rendering, but it may eventually be mirrored in resolved 
RTL runs (if this is appropriate for the chosen glyph (not always easy to 
determine if the symbol is chosen from a matching font in context ; but as the 
symbol to use is quite arbitrary, and should be enough distinctive from other 
characters, this mirroring is not really necessary, unless the symbol shows 
some explicit text is a specific style; something to avoid as the character is 
not specific to any script or language).


2014-06-30 17:59 GMT+02:00 Konstantin Ritt 
<[email protected]<mailto:[email protected]>>:
2014-06-29 22:24 GMT+03:00 Asmus Freytag 
<[email protected]<mailto:[email protected]>>:
but things get harder the more I think:

3. When the above text says “surrogate code points”, does that mean everything 
outside BMP? It reads so to me, but I’m surprised that characters in BMP and 
outside BMP have such differences, so I’m doubting my English skill.

No, those would be supplementary code points. Surrogates are values that are 
intended to be used in pairs as code units in UTF-16. Ill-formed data may 
contain unpaired values, those are referred to as Surrogate code points.


IIRC, after HTML parsing, validating and building DOM, no any single surrogate 
code point could be met in, since presence of any ill-formed data in the 
Unicode text makes the whole text ill-formed.
It's a security recommendation to decoders to replace any unpaired surrogate 
code point with U+FFFD instead, thus making the text well-formed. As a side 
effect, the unpaired surrogate code point becomes visible (usually as a square 
box fallback glyph).
What the consideration regarding U+FFFD in CSS?


Konstantin


_______________________________________________
Unicode mailing list
[email protected]<mailto:[email protected]>
http://unicode.org/mailman/listinfo/unicode



_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Reply via email to