On Wed, 5 Nov 2003 12:24:00 +0100, "Philippe Verdy" wrote: > > The obliterated character needed for paleolitic studies, or to encode any > texts in which the character is not recognizable already exists: isn't it > the REPLACEMENT CHARACTER? >
The problem of how to represent missing/obliterated characters in Unicode when transcribing manuscript/printed texts and inscriptions, etc. has always perplexed me. U+FFFD [Replacement Character] is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode", and is definitely not the correct character to use to represent a missing or obliterated character in a non-electronic source text. For Chinese the standard glyph for a missing/obliterated/unclear ideograph is a full-width hollow square (i.e. the same size as a CJK ideograph). This glyph is very common in modern printed Chinese texts, from scholarly editions of ancient texts unearthed from 2,000 year old tombs to popular typeset reprints of 19th century novels. Several examples of the usage of this glyph in modern printed texts from the PRC can be found at http://uk.geocities.com/babelstone1357/CJK/missing.html The problem is how to represent this glyph in electronic texts. Browsing the internet there seem to be two, both unsatisfactory, ways of representing this "missing ideograph" glyph : 1. Using U+25A1 �� [WHITE SQUARE] (although any of the other white square graphic symbols encoded in Unicode, such as U+25A2, U+25FB or U+25FD, could also be used I suppose). The problems with this character are : a) it has the wrong character properties for use within running CJK text. b) with CJK fonts such as SimSun U+25A1 is rendered the same height and width as a CJK ideograph, but with non-Chinese fonts such as Arial Unicode MS U+25A1 may be rendered much smaller than a CJK ideograph, which looks totally wrong. 2. Using U+56D7 �� [a CJK ideograph, rarely used other than as a radical = U+2F1E], which has the right character properties, and renders at the correct size; but the glyph shape may not be completely square depending upon the font style, and basically it is just the wrong character for the job. It would be extremely useful to have a dedicated Unicode character for "missing CJK ideograph" with the right character properties, and I have considered making a proposal for such a character, but have hesitated as if there really is such a great need for it (and I personally have web pages which transcribe texts with missing/obliterated ideographs where such a character is desperately needed) then why does it not already exist in Unicode or pre-existing Chinese encoding standards ? Andrew

