On Sun, 25 Dec 2016 19:31:28 +0200 "Jukka K. Korpela" <[email protected]> wrote:
> When it is not certain what character there is in some text to be > encoded, there is a wide range of possible situations. For example, > it might be a thing like “there is letter U or letter V, probably the > latter” or “there is some Latin letter but no hint of what it might > be” or even “there is an alphanumerical character” (though I find it > difficult to imagine such a situation). Such things can hardly be > described using new characters; rather, they need to be expressed > using verbal descriptions (which are about the encoded text, not part > of it) or some formal notations or both. This does not appear to be the situation we are being asked about. I suspect the context is rather that of a document damaged by fire or mould. > If some graphic symbol is by convention used to represent a lacuna, > then the issue, as regards to Unicode, is simply whether that symbol > exists as an encoded character or whether there is need to add that > graphic symbol to Unicode. But it would be a matter of encoding > graphic characters (irrespectively of their meaning in some content), > not about encoding abstract ideas like “an unrecognized character”. Unicode encodes pictograms, directives and abstract characters, not glyphs. There are few, if any characters, that have no semantics, though several characters can be ambiguous and context-sensitive as to what semantics they occur. If it was just a matter of appearance, then U+26C6 RAIN would be the character to use. It has the graphic used for characters in damaged inscriptions. Of course, there is one character that is already widely used in this rôle - U+003F QUESTION MARK. Some of its Unicode properties are not suitable, and its informal 'unknown character' semantic conflicts with its rôle as a punctuation mark. If I understand correctly, these issues are already addressed by the Leiden Conventions. Why do they not suffice? Richard.

