Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

Andrew C. West Wed, 21 Jan 2004 08:12:29 -0800

On Tue, 20 Jan 2004 10:32:06 -0700, John Jenkins wrote:
> 
> 1)  U+9CE6 is a traditional Chinese character (a kind of swallow) 
> without a SC counterpart encoded.  However, applying the usual rules 
> for simplifications, it would be easy to derive a simplified form which 
> one could conceivably see in a book printed in the PRC.  Rather than 
> encode the simplified form, the UTC would prefer to represent the SC 
> form using U+9CE6 + a variation selector.
>


If a simplified form of a given CJK ideograph is used, then it deserves encoding
properly. There are newly-coined simplified forms in CJK-B and CJK-C, so why not
add newly used simplified forms to CJK-C or whereever if they are really needed
? To borrow Michael's term, this use of variation selectors is simply
pseudo-coding.

If a Chinese publishing house were going to print a book in simplified
characters that included a simplified form of U+9CE6, would they go the lengths
of applying to Unicode to define an appropriate standardised variant for U+9CE6,
and then trying to create a font that implemented variation selectors ? Or would
they simply use a font that mapped a simplified glyph form to U+9CE6 (or the
PUA) ? If it is so important to formally define the existence of a simplified
form of an existing character, then why not encode it properly ??

> 2) Your best friend has the last name of "turtle," but he doesn't use 
> any of the encoded forms for the turtle character to represent it.  He 
> insists on writing it in yet another way and wants to be able to 
> include his name as he writes it in the source code he edits.  The UTC 
> ends up accommodating him using U+2A6C9 (which is the closest turtle to 
> his last name) + a variation selector.

1. Unicode Design Principle 3 : "The Unicode Standard encodes characters, not
glyphs."
This is simple glyph variant. I insist on writing the "A" in my name with two
cross-bars. Will the UTC kindly accommodate me by providing an appropriate
standardised variant for U+0041 ? (In fact, come to think of it I have
idiosyncratic ways of writing all of the letters in my name ...)

The plain fact of the matter is that the *character* turtle is already encoded,
and if someone wants to use a different glyph form for this character then he or
she should design their own font with the appropriate glyph mapped to U+9F9C or
U+9F9F.

2. Unicode does not encode private-use characters.
I can't find chapter and verse for it, but I was always under the impression
that Unicode did not encode private-use characters.

> 3)  You're editing a critical edition of an ancient MS, and you find 
> that your author, who talks a lot about handkerchiefs, uses U+5E28 
> quite a bit, but varies between the "ears-in" form and the "ears-out" 
> form almost at random.  Rather than lose the distinction which *may* be 
> meaningful, you (with the UTC's blessing) use U+5E28 for the ears-in 
> form (as Unicode uses) and U+5E28 + a variation selector for the 
> ears-out form.

This example actually opens up the biggest can of worms.

As someone who has a passion for transcribing ancient manuscripts, in Chinese
and other scripts, I fully appreciate the desire to be able to represent every
little idiosyncrasy of a manuscript or inscription in plain text Unicode. But
the simple fact of the matter is that you can't. My apologies for repeating
myself, but Unicode Design Principle 3 states that "The Unicode Standard encodes
characters, not glyphs." (and Section 2.2 of TUS elaborates on this statement).

Unless Unicode becomes a Glyph Encoding Standard instead of a Character Encoding
Standard, then how on earth can the UTC allow VSs to be used for simple glyph
variants ? And if it's OK for CJK ideographs, then why not for every other
Unicoded script ?

Glyph variations are of paramount interest to textual scholars and epigraphers
of all scripts, not just Chinese. To take a random example from the Celtic
Inscribed Stones Project (CISP), this is a palaeographgic description of a cross
slab at Kirk Maughold in the Isle of Man, inscribed [--]I IN CHRISTI NOMINE
CRUCIS CHRISTI IMAGENEM :

Kermode/1907, 112: `we have here the diamond-shaped O, the N like an H, and the
M like a double H, all characteristics of the Hiberno-Saxon manuscripts and
sculptured stones of the period. Other characteristic forms are the
square-shaped C and the peculiar G, the like of which I have not seen elsewhere.
But some of the letters are minuscules, as p, d, b, r, and a; while in the
contraction for CHRISTI, in each case the R differs from the ordinary small R in
CRUCIS, representing, in fact, the Greek Rho!'. 

[http://www.ucl.ac.uk/archaeology/cisp/database/stone/maugh_4.html]

If we go down the road of encoding epigraphic and palaeographic glyph variants
for CJK and other scripts I'm afraid that we'll soon find that 256 Variation
Selectors just isn't enough.

Andrew

Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

Reply via email to