Re: Characters that should be displayed?

Jukka K. Korpela Sun, 29 Jun 2014 14:05:37 -0700

2014-06-29 21:44, Koji Ishii wrote:

The spec currently has the following text[2]:

Control characters (Unicode class Cc) other than tab (U+0009), line
feed (U+000A), and carriage return (U+000D) are ignored for the
purpose of rendering. (As required by [UNICODE], unsupported
Default_ignorable characters must also be ignored for rendering.)


and there’s a feedback saying that CSS should display visible glyphs
for these control characters.

That would change the identity of the characters. They are by definition“control characters”, i.e. they have no visible glyphs, but they mayhave control effects. However, it might be argued that rendering themsomehow would not mean normal rendering but be a diagnostic indicationof an error. Those characters are invalid in HTML and XML (except XML1.1, but who uses it?).

However, the tradition of web browsers is permissive in order to beuser-friendly. E.g., a casual control character somewhere might beinteresting to a *developer* or maintainer to notice, so that he couldanalyze and fix the problem that caused it, but to a *user* (visitor),it would mostly be just disturbing. He can’t fix the problem, and ismostly useless to him to see that the page has some control character inthe source. So *developer tools* should indicate should problems orprovide ways to detect, but it seems correct to ignore them in normalrendering.

Since all major browsers do not display
them today, this is a breaking-change

Well, I would not take that as strong argument. This would be a changein error processing. But it would not be useful for other reasons.

I found the following text in Unicode 6.3, p. 185, "5.21 Ignoring
Characters in Processing”[3]:

Surrogate code points, private-use characters, and control
characters are not given the Default_Ignorable_Code_Point property.
To avoid security problems, such characters or code points, when
not interpreted and not displayable by normal rendering, should be
displayed in fallback rendering with a fallback glyph


By looking at this, my questions are as follows:

1. Should control characters that browsers do not interpret be
displayed in fallback rendering?

It is reasonable to interpret that there are no such control characters,because all control characters except those with special handling areinterpreted as being invalid data and therefore ignored.


2. Should private-use characters

(U+E000-F8FF, 0F0000-0FFFFD, 100000-10FFFD) without glyphs be
displayed in fallback rendering?

They might be seen as “not displayable by normal rendering”, so yes. Onthe practical side, although Private Use characters should not be usedin public information interchange, they are increasingly popular in“icon font” tricks. Whatever we think of such tricks, users should notbe punished for them. If the trick fails (usually because a page uses adownloadable font for icon glyphs allocated to Private Use codepointsbut something prevents the use of such a font), it is relevant to theuser to know that there is *some* data, which can be crucial (e.g., anitem in a navigation menu). So some dull fallback rendering is probablybetter than simply ignoring the characters.

3. When the above text says “surrogate code points”, does that mean
everything outside BMP?

No, it means code points that do not represent *any* characters due tobeing in certain special areas in the coding space. They are invalid inHTML and in XML. If they appear in data, the reason is usually thatUTF-16 encoded data containing non-BMP characters is being processed ina wrong way. At the level of interpreting a byte stream as a stream ofcharacters, surrogate code *units* in UTF-16 should be processed andinterpreted in pairs so that one pair is taken as one character. Andwhen CSS gets at it, it only sees the character in the DOM.

It is adequate to ignore surrogate code points, since they are invalidand signalling them to users (as opposite to developers) would hardly doany good.

4. Should every code point that are not
given the Default_Ignorable_Code_Point property and that without
interpretations nor glyphs displayed in fallback rendering? I could
not find such statement in Unicode spec, but there are some people
who believe so.

> 5. Is there anything else Unicode recommends to

display in fallback rendering, or not to display? This must be RTFM,
but pointing out where to read would be appreciated.

From the Unicode point of view, an implementation may decide whatcharacters it supports. What it does to characters that it does notsupport seems to be generally up to the implementation to decide asregards to rendering. Here, too, I would consider the practical impacton users. If a page contain characters that have no glyphs in the fontsthat are used, then the page has data that is probably valid but cannotbe rendered in a particular situation. Showing some indication of thisis relevant, because the user knows he is missing something real, and hemight be able to fix the situation in various ways (e.g., changingbrowser settings, downloading an installing extra fonts, or justswitching to a different browser – browsers are known to differ in theirabilities to use the fonts installed in a system).


Yucca


_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Characters that should be displayed?

Reply via email to