On 8/18/2011 7:29 AM, Doug Ewell wrote:
Karl Pentzlin<karl dash pentzlin at acssoft dot de>  wrote:

The quoted indicators for benefit were part of a concern of the German
NB regarding the Wingding/Webding proposals. The concern expressed in
WG2 N4085 is that some characters proposed there conform neither to
the policy statements by UTC or WG2, nor to the indicators of benefit
which the German NB would accept as an additional reason to encode
Wingding/Webding characters beyond the formal policies of UTC and WG2.
Nevertheless, N4085 is a German NB document, the criteria in question
are those suggested by the German NB and not WG2 (and the document makes
note of this distinction), and it is an error to portray this passage as
representing either a change or a lack of clarity in UTC or WG2 policy.


Karl makes no such claim. The document states that 2093-2096 appear to be in violation of the character glyph model. I believe that's the section (or one of the sections) in the document that Karl summarizes here as "policy statements by UTC or WG2" - at least it would fit.

Anyway, it's more useful to focus on the actual concerns, not about whether Karl summarized them correctly in his email.

The German NB introduces the concept of "indicator" of "benefit [to] the user", and then defines that as:
- evidence of actual use
- evidence that it's likely a wrong character might be used for lack of an encoded character
- conformance to other standards
(I've slightly rephrased for clarity).

I have several problems with this approach.

First, these "indicators" are rather haphazardly compiled. Overwhelming evidence of plain text use, and conformance requirements are already recognized as valid reasons to encode characters (not just symbols). They do not, however, help in evaluating those proposals where more nuanced judgement is required. The third element, that the wrong character might be mistakenly used, is of overriding concern only in particular cases where questions of unification or disambiguation need to be decided.

Second, it's really unsatisfactory if each NB has their own criteria for when to add characters to the standard, and it's especially unsettling when such criteria seem to be "ad-hoc" applied to a given repertoire. WG2 and Unicode have had lengthy discussions and broad consensus about the kinds of criteria to take into account when encoding characters in general or symbols in particular.

The result has been captured in a number of documents, for example, here's the original one from the UTC: http://unicode.org/pending/symbol-guidelines.html. (with links to more recent versions).

Unlike the list in N4085, the criteria adopted by UTC and WG2 are not formulated as PASS / FAIL. Instead, they were carefully designed to be used in assigning weight in favor or in disfavor of encoding a particular symbol as a character. This recognizes an important principle, which has been notably absent in much recent discussion: it is generally not possible to create any set of criteria that can be applied mechanistically (or algorithmically). The decision to encode a character is and remains a judgement call. Some calls are easy, because the evidence is overwhelming and direct, some calls are more difficult, because the evidence may be uncertain or indirect, or the nature of the proposed character may not be as well understood as one would ideally prefer.

Recognizing these inherent difficulties in the encoding work and the need for a set of weighing factors instead of simplistic PASS / FAIL criteria was one the early break-throughs in the work of WG2 and UTC. Accordingly the documents speak not of criteria "whether" to encode characters, but criteria that "strengthen (resp. weaken) the case for encoding". That's a crucial difference.

While the details of these criteria (or factors) can and should be evaluated from time to time for continued appropriateness, the soundness of the general methodology is not in question, and UTC and WG2 should resist any attempts (directly or indirectly) to abandon them in favor of an unworkable, simplistic, and ad-hoc PASS / FAIL approach.

What are relevant criteria?

The document I cited lists the original set of criteria as follows


         What criteria strengthen the case for encoding?

   The symbol:

     * is typically used as part of computer applications (e.g. CAD
       symbols)
     * has well defined user community / usage
     * always occurs together with text or numbers (unit, currency,
       estimated)
     * must be searchable or indexable
     * is customarily used in tabular lists as shorthand for
       characteristics (e.g. check mark, maru etc.)
     * is part of a notational system
     * has well-defined semantics
     * has semantics that lend themselves to computer processing
     * completes a class of symbols already in the standard
     * is letterlike (i.e. should vary with the surrounding font style)


         What criteria weaken the case for encoding?

   There is evidence that:

     * the symbol is primarily used freestanding (traffic signs)
     * the notational system is not widely used on computers (dance
       notation, traffic signs)
     * the symbol is part of a set undergoing rapid changes
     * the symbol is trademarked (unless requested by the owner)
       (logos, Der grüne Punkt, CE symbol, UL symbol, etc)
     * is purely decorative
     * it’s ok to ignore its identity in processing
     * font shifting is the preferred access and the user community is
       happy with it (logos, etc.)

   Or, conversely, there is not enough evidence for its usage or its
   user community.

These criteria as originally formulated don't spell out how to evaluate "widely used in plain text" and "required for compatibility with another standard or for round-trip mapping", because the criteria were concerned with issues that are specific to symbols. Wide usage and requirements of compatibility usage apply to characters of any kind and tend to short-circuit detailed evaluation of individual characteristics of characters anyway.

Requirements for compatibility is the primary factor that should apply to the characters 2093-2096 discussed in the German document. If one agrees with the premise of encoding the Web/Wingding sets as "compatibility sets" then the compatibility requirement covers all the characters in them, just as other compatibility characters, already encoded, they may violate some aspects of the character-glyph model.

A./

Reply via email to