U+FFFD REPLACEMENT CHARACTER � On Wed, Dec 21, 2016 at 3:05 AM Philippe Verdy <[email protected]> wrote:
> there's a "replacement" control, whose rendering is undefined. It may > represent any missing part covering more than one character, such as parts > that have been burned, or overstrikken. This Unicode character can act as a > substitute but its rendering is purposely undefined. An application may > show some greyed box there, but it should not be the tofu box used for > characters not mapped in the specified fonts. > Older encoduing used the ASCII control "SUB" for representing this > function. Some terminals displayed it as a filled box Other documents have > used the ASCII DEL control for the same purpose. However for Unicode > encodings ASCII controls should be avoided. > > This is not an Emoji, as Emojis have a clear visual representation and > semantics (and often specific colors). But you're right, it should be a > symbol in Unicode (like Emojis, but unlike ASCII controls) > > 2016-12-21 3:29 GMT+01:00 Martin Mueller <[email protected]>: > > > > > > > > > > > > > > > > > > > > > > I’m new to this list. Please excuse my technical incompetence. > > > Is there a Unicode character that says “I represent an alphanumerical > character, but I don’t know which”. This is a very common problem in the > transcription of historical texts where you have lacunas. Often, > > the extent of the lacuna is known, and the alphabet is known as well. The > EEBO TCP transcriptions of English texts before 1700 are good examples. > They are SGML transcriptions, where missing stuff is represented by <gap/> > elements with attributes about this > > or that. This is efficient when it comes to pages, very inefficient when > it comes to individual characters. > > > There is a Web character—a diamond with a question mark inside it—which > means “I may know what this character represents, but I can’t display it”. > Which is a very different message. On the other hand, if you > > extened the use of that character, it probably wouldn’t’ create much > ambiguity. > > > > In the TCP project, various code points from the Geometrical were used to > represent lacunae. The black circle (\u25cf) has been used as the character > for a missing character.This is OK and unambiguous in its > > context. But would be nice to have a special character for just that > purpose, and given the number of emoji, this doesn’t seem to be a > particularly frivolous request. Which alphabet, you might ask. But that > doesn’t really matter. There is a very high probability > > that the missing character comes from the character set of the surrounding > words. And if that isn’t the case, the transcriber wouldn’t know it. S/he > sees that there is something, perhaps even that there is just one of it, > but doesn’t know which > > > > > > > Martin Mueller > > > Professor emeritus of English and Classics > > > Northwestern University > > > > > > > > > > > >

