Re: What are the present criteria...

Asmus Freytag Thu, 18 Aug 2011 10:41:24 -0700

On 8/18/2011 7:29 AM, Doug Ewell wrote:

Karl Pentzlin<karl dash pentzlin at acssoft dot de>  wrote:

The quoted indicators for benefit were part of a concern of the German
NB regarding the Wingding/Webding proposals. The concern expressed in
WG2 N4085 is that some characters proposed there conform neither to
the policy statements by UTC or WG2, nor to the indicators of benefit
which the German NB would accept as an additional reason to encode
Wingding/Webding characters beyond the formal policies of UTC and WG2.

Nevertheless, N4085 is a German NB document, the criteria in question
are those suggested by the German NB and not WG2 (and the document makes
note of this distinction), and it is an error to portray this passage as
representing either a change or a lack of clarity in UTC or WG2 policy.

Karl makes no such claim. The document states that 2093-2096 appear tobe in violation of the character glyph model. I believe that's thesection (or one of the sections) in the document that Karl summarizeshere as "policy statements by UTC or WG2" - at least it would fit.

Anyway, it's more useful to focus on the actual concerns, not aboutwhether Karl summarized them correctly in his email.

The German NB introduces the concept of "indicator" of "benefit [to] theuser", and then defines that as:

- evidence of actual use

- evidence that it's likely a wrong character might be used for lack ofan encoded character

- conformance to other standards
(I've slightly rephrased for clarity).

I have several problems with this approach.

First, these "indicators" are rather haphazardly compiled. Overwhelmingevidence of plain text use, and conformance requirements are alreadyrecognized as valid reasons to encode characters (not just symbols).They do not, however, help in evaluating those proposals where morenuanced judgement is required. The third element, that the wrongcharacter might be mistakenly used, is of overriding concern only inparticular cases where questions of unification or disambiguation needto be decided.

Second, it's really unsatisfactory if each NB has their own criteria forwhen to add characters to the standard, and it's especially unsettlingwhen such criteria seem to be "ad-hoc" applied to a given repertoire.WG2 and Unicode have had lengthy discussions and broad consensus aboutthe kinds of criteria to take into account when encoding characters ingeneral or symbols in particular.

The result has been captured in a number of documents, for example,here's the original one from the UTC:http://unicode.org/pending/symbol-guidelines.html. (with links to morerecent versions).

Unlike the list in N4085, the criteria adopted by UTC and WG2 are notformulated as PASS / FAIL. Instead, they were carefully designed to beused in assigning weight in favor or in disfavor of encoding aparticular symbol as a character. This recognizes an importantprinciple, which has been notably absent in much recent discussion: itis generally not possible to create any set of criteria that can beapplied mechanistically (or algorithmically). The decision to encode acharacter is and remains a judgement call. Some calls are easy, becausethe evidence is overwhelming and direct, some calls are more difficult,because the evidence may be uncertain or indirect, or the nature of theproposed character may not be as well understood as one would ideallyprefer.

Recognizing these inherent difficulties in the encoding work and theneed for a set of weighing factors instead of simplistic PASS / FAILcriteria was one the early break-throughs in the work of WG2 and UTC.Accordingly the documents speak not of criteria "whether" to encodecharacters, but criteria that "strengthen (resp. weaken) the case forencoding". That's a crucial difference.

While the details of these criteria (or factors) can and should beevaluated from time to time for continued appropriateness, the soundnessof the general methodology is not in question, and UTC and WG2 shouldresist any attempts (directly or indirectly) to abandon them in favor ofan unworkable, simplistic, and ad-hoc PASS / FAIL approach.


What are relevant criteria?

The document I cited lists the original set of criteria as follows


         What criteria strengthen the case for encoding?

   The symbol:

     * is typically used as part of computer applications (e.g. CAD
       symbols)
     * has well defined user community / usage
     * always occurs together with text or numbers (unit, currency,
       estimated)
     * must be searchable or indexable
     * is customarily used in tabular lists as shorthand for
       characteristics (e.g. check mark, maru etc.)
     * is part of a notational system
     * has well-defined semantics
     * has semantics that lend themselves to computer processing
     * completes a class of symbols already in the standard
     * is letterlike (i.e. should vary with the surrounding font style)


         What criteria weaken the case for encoding?

   There is evidence that:

     * the symbol is primarily used freestanding (traffic signs)
     * the notational system is not widely used on computers (dance
       notation, traffic signs)
     * the symbol is part of a set undergoing rapid changes
     * the symbol is trademarked (unless requested by the owner)
       (logos, Der grüne Punkt, CE symbol, UL symbol, etc)
     * is purely decorative
     * it’s ok to ignore its identity in processing
     * font shifting is the preferred access and the user community is
       happy with it (logos, etc.)

   Or, conversely, there is not enough evidence for its usage or its
   user community.

These criteria as originally formulated don't spell out how to evaluate"widely used in plain text" and "required for compatibility with anotherstandard or for round-trip mapping", because the criteria were concernedwith issues that are specific to symbols. Wide usage and requirements ofcompatibility usage apply to characters of any kind and tend toshort-circuit detailed evaluation of individual characteristics ofcharacters anyway.

Requirements for compatibility is the primary factor that should applyto the characters 2093-2096 discussed in the German document. If oneagrees with the premise of encoding the Web/Wingding sets as"compatibility sets" then the compatibility requirement covers all thecharacters in them, just as other compatibility characters, alreadyencoded, they may violate some aspects of the character-glyph model.

A./

Re: What are the present criteria...

Reply via email to