On 12/27/2016 8:03 AM, Marcel Schneider wrote:
On 27/12/16 01:11, Richard Wordingham wrote:
On Sun, 25 Dec 2016 19:31:28 +0200
"Jukka K. Korpela"  wrote:
[…]
If some graphic symbol is by convention used to represent a lacuna,
then the issue, as regards to Unicode, is simply whether that symbol
exists as an encoded character or whether there is need to add that
graphic symbol to Unicode. But it would be a matter of encoding
graphic characters (irrespectively of their meaning in some content),
not about encoding abstract ideas like “an unrecognized character”.
Unicode encodes pictograms, directives and abstract characters, not
glyphs. There are few, if any characters, that have no semantics,
though several characters can be ambiguous and context-sensitive as to
what semantics they occur. If it was just a matter of appearance,
then U+26C6 RAIN would be the character to use. It has the graphic
used for characters in damaged inscriptions.
As far as my todayʼs understanding of Unicode goes, I believe that the
“not encode glyphs but abstract characters” principle has a counterpart
that makes Unicode characters polysemic by design, as results from
TUS 3.3, D2. This compromise led to abandon the initially considered
extensive disunification policy in favor of reasonable unifications that
provided a correct benefit-cost ratio, Mark Davis explained on this List:

http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0145.html

TUS 3.2, C4 and C5 (Conformance Requirements: Interpretation) seems to me
to be specifying that the meanings of a given character are free and may be
defined by any human convention, provided that they donʼt conflict with
the Unicode character properties of that character.

(Most) character properties can be adjusted, so the statement above would need to
be drawn much more narrowly.

The generic issue that Unicode runs into is that there are things like "letters" that have well-defined identities (the letter A), but, perhaps because of that, have a very wide ranging set of real images - some of the fanciful ones may bear scant relation to the archetypal shape. However, because they are members of bounded, and extremely well-known sets (alphabets) users are tolerant of artistic license. In addition, they are generally used in longer contexts (words) where their identity is reaffirmed, independent of their shape, by occurring in the expected juxtapositions (and mostly not occurring in
other, unexpected ones).

However, the conventions where and when to use one of these letters are not fixed,
not even their phonetic equivalents.

Contrast that with many marks. The really common ones, like the period, are well-
known enough that fonts can substitute small squares or other shapes without
impeding their use in normal text. However, outside standard sentence punctuation, they can be re-used for many other purposes. Some such uses, like the Swedish use of ":" in the middle of an abbreviation, may be unusual enough to not readily be
catered to by all text-processing software (e.g. in word-segmentation).

Nevertheless, the same thing applies as with letters: where and when to use one of these marks is not fixed as part of their encoding, not even their functions.

Many other "simple" marks: lines, circles, triangles, hooks, and squares, or groups
of them, are likewise subject to frequent reuse. Some of them may have been
incorrectly encoded more than once. Like the standard punctuation marks, both their precise shapes and precise functions are subject to stylistic or other conventions.

When it comes to marks (or symbols) of less generic or more complex shapes, the presumption that the mark only has "one" shape may be more common, and examples of the mark being repurposed may be less common. Not being as common, fewer readers will recognize all stylistic variations as being "the same thing". A variant form will be more likely to be understood as a related, but not identical symbol. That in turn fuels the
misperception that Unicode somehow encodes symbols based on a single
conventional usage.

A./


Reply via email to