On Mon, 30 Apr 2012 16:42:51 -0700
Ken Whistler <[email protected]> wrote:

> On 4/30/2012 3:33 PM, Richard Wordingham wrote:
> >> One is not compelled to construct U+3039 (〹) ,twenty' from two
> >> U+3038
> >> >  (〸) ,ten', so a CUNEIFORM TWO U may well be missing.
> > It looks as though it is.
> 
> No, it isn't.
> 
> > It was present in Proposal N2664
> > (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2664), as CUNEIFORM NUMERIC
> > SIGN NISH, but is missing from the next revision, Proposal N2698
> > (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2698).  Between these two,
> > the sign for '30' changed from CUNEIFORM NUMERIC SIGN USHU2 to
> > CUNEIFORM SIGN U U U.  It could be an accidental omission of *SIGN
> > TWO U/SIGN MAN
> > - the Unicode Cuneiform list does not appear to have been archived,
> > so I can't work out why it should have been deliberately removed.
> 
> The document you are looking for is "Rationale for changes to N2664R",
> by Steve Tinney. L2/04-080.

Thanks - I don't know how I missed that one when I looked through the
registers.  Unfortunately, it doesn't explicitly state that CUNEIFORM
NUMERIC SIGN NISH had been removed.  I wonder how CUNEIFORM SIGN MIN
escaped the same fate!

Assuming that NUMERIC SIGN NISH was removed as being U + U, I
am still little the wiser as to what the equivalent coding one should
use is. The sequence <SIGN U, SIGN U> does not kern tightly, and <SIGN
U, SIGN U, SIGN U> looks nothing like <SIGN U U U>.  In one there are
clear gaps between the impressions, whereas in the other the
impressions touch. I've checked some drawings and they tend to show a
clear gap between an U impression at the end of one symbol and an U
impression at the start of the next symbol.

The note L2/04-080 recommends the sequence <SIGN U, CGJ, SIGN U>, but it
is not clear how that should help.  Referring to
http://www.unicode.org/faq/char_combmark.html, I read the following
questions (Q) and answers (A), to which I must regretfully add some
remarks (R). 

1. Q: Does U+034F COMBINING GRAPHEME JOINER affect display of combining
character sequences?

A: No. <snip> It does not impact cursive joining or ligation (contrast
U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER).

R: It does however affect how combining characters combine - TUS6.1
Section 7.9 gives its effect between a ligature tie and a single
diacritic, and TUS Section 16.2 gives an example of its effects on
Hebrew marks and accents.  (The next answer but one actually gives
the Hebrew example!)

R: I'm confused by the display of <a,CGJ,umlaut>.  Is a Fraktur font
that, in its normal setting, displays it differently from <a,umlaut> in
breach of the Unicode standard?

2. Q: Does U+034F COMBINING GRAPHEME JOINER join graphemes?

A: No. <snip> It has no impact on line breaking, except that as for
other combining marks, it should be kept with its base when breaking a
line.

R: According to http://www.unicode.org/Public/UNIDATA/LineBreak.txt,
CGJ is of linebreak class GL.  Class GL prohibits line breaks
immediately before or afterwards, unless the preceding character is a
space or ZWSP.

I'd be inclined to go for <SIGN U, ZWJ, SIGN U>, but for one little
nagging worry.

In the final proposal for Cuneiform, it was proposed that the Cuneiform
symbols have line-breaking class ID.  This is quite reasonable for text
that is written without inter-word gaps, but some text does have
inter-word gaps, and this text benefits from the current line-breaking
class of AL (ordinary alphabetical and symbol characters).  AL v. ID is
tailorable, and, curiously, ZWJ has no effect on line breaking.

I have a natural preference for WORD JOINER over CGJ (and it can be
easier to enter in some word-processors).  Also, when normal
line-breaking fails, as CGJ between canonical combining class 0
characters frequently marks a syllable boundary, I fear it might be a
magnet for emergency line-breaking.  So, please, which of the following
is suitable, and which is better:

<SIGN U, CGJ, ZWJ, SIGN U>
<SIGN U, WJ, ZWJ, SIGN U>
<SIGN U, ZWJ, WJ, SIGN U>

Richard.


Reply via email to