Re: a character for an unknown character

Asmus Freytag Tue, 27 Dec 2016 21:38:23 -0800

On 12/27/2016 8:03 AM, Marcel Schneider wrote:

On 27/12/16 01:11, Richard Wordingham wrote:

On Sun, 25 Dec 2016 19:31:28 +0200
"Jukka K. Korpela"  wrote:

[…]

If some graphic symbol is by convention used to represent a lacuna,
then the issue, as regards to Unicode, is simply whether that symbol
exists as an encoded character or whether there is need to add that
graphic symbol to Unicode. But it would be a matter of encoding
graphic characters (irrespectively of their meaning in some content),
not about encoding abstract ideas like “an unrecognized character”.

Unicode encodes pictograms, directives and abstract characters, not
glyphs. There are few, if any characters, that have no semantics,
though several characters can be ambiguous and context-sensitive as to
what semantics they occur. If it was just a matter of appearance,
then U+26C6 RAIN would be the character to use. It has the graphic
used for characters in damaged inscriptions.

As far as my todayʼs understanding of Unicode goes, I believe that the
“not encode glyphs but abstract characters” principle has a counterpart
that makes Unicode characters polysemic by design, as results from
TUS 3.3, D2. This compromise led to abandon the initially considered
extensive disunification policy in favor of reasonable unifications that
provided a correct benefit-cost ratio, Mark Davis explained on this List:


http://www.unicode.org/mail-arch/unicode-ml/y2015-m06/0145.html

TUS 3.2, C4 and C5 (Conformance Requirements: Interpretation) seems to me
to be specifying that the meanings of a given character are free and may be
defined by any human convention, provided that they donʼt conflict with
the Unicode character properties of that character.

(Most) character properties can be adjusted, so the statement abovewould need to

be drawn much more narrowly.

The generic issue that Unicode runs into is that there are things like"letters" that havewell-defined identities (the letter A), but, perhaps because of that,have a very wideranging set of real images - some of the fanciful ones may bear scantrelation tothe archetypal shape. However, because they are members of bounded, andextremelywell-known sets (alphabets) users are tolerant of artistic license. Inaddition, they aregenerally used in longer contexts (words) where their identity isreaffirmed, independentof their shape, by occurring in the expected juxtapositions (and mostlynot occurring in

other, unexpected ones).

However, the conventions where and when to use one of these letters arenot fixed,

not even their phonetic equivalents.

Contrast that with many marks. The really common ones, like the period,are well-

known enough that fonts can substitute small squares or other shapes without

impeding their use in normal text. However, outside standard sentencepunctuation,they can be re-used for many other purposes. Some such uses, like theSwedish useof ":" in the middle of an abbreviation, may be unusual enough to notreadily be

catered to by all text-processing software (e.g. in word-segmentation).

Nevertheless, the same thing applies as with letters: where and when touse one ofthese marks is not fixed as part of their encoding, not even theirfunctions.

Many other "simple" marks: lines, circles, triangles, hooks, andsquares, or groups

of them, are likewise subject to frequent reuse. Some of them may have been

incorrectly encoded more than once. Like the standard punctuation marks,boththeir precise shapes and precise functions are subject to stylistic orother conventions.

When it comes to marks (or symbols) of less generic or more complexshapes, thepresumption that the mark only has "one" shape may be more common, andexamples of the markbeing repurposed may be less common. Not being as common, fewer readerswillrecognize all stylistic variations as being "the same thing". A variantform will be morelikely to be understood as a related, but not identical symbol. That inturn fuels the

misperception that Unicode somehow encodes symbols based on a single
conventional usage.

A./

Re: a character for an unknown character

Reply via email to