Asmus Freytag wrote:

Nobody disputes that subheaders are informative. However, subheaders
do not define a character property.

Janusz was making a point that the CLDR data sometimes treats them as such, or at least as a kind of supplementary property.

There are several good reasons:

1. They do not "classify" characters in a uniform way: For some ranges
they give the purpose for which the character was encoded (as in your
example), for others, they give the type of character (vowel,
consonant), and in some cases they are free of information
("Miscellaneous addition").

2. Even where they give the purpose for which the character was
encoded, they do not necessarily attest that the characters in that
range are never used for other purposes.

3. The information is purely editorial, and as such, changed by the
editors as needed, not assigned as result of a vote in the Unicode
Technical Committee.

4. They appear to be more "formal" than they are, just because they
are presented with semantic markup in the input file to the code chart
layout tool; with the file being a rather structured file, only
because it describes a tabular presentation of data. However, see
points (1) through (3) on why this superficial appearance of formality
is misleading.

It seems that the main concern about using NamesList.txt to obtain information beyond what is available in other UCD sources is that people might treat that additional information as normative and immutable, when it is not.

It is understood that UTC members draw important distinctions between normative and informative material, and between material that is immutable and that which may change over time. For many purposes, these distinctions are crucial. However, there are uses for Unicode character data that do not depend on these distinctions. Often it is simply not a problem if, say, CAT FACE WITH WRY SMILE acquires a new informative cross-reference in one Unicode release, and that cross-reference suddenly changes or disappears in the next release.

My suggestion to assuage these fears is for UTC to add additional warnings to the file header (right below "This file is semi-automatically derived...") or to NamesList.html, or both, basically stating that any information in NamesList.txt beyond that which can be found in other UCD files is informative and subject to change without notice. Then the burden, if such it is, will be on users to heed these warnings.

--
Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

Reply via email to