On Mon, 30 Apr 2012 16:42:51 -0700 Ken Whistler <[email protected]> wrote:
> On 4/30/2012 3:33 PM, Richard Wordingham wrote: > >> One is not compelled to construct U+3039 (〹) ,twenty' from two > >> U+3038 > >> > (〸) ,ten', so a CUNEIFORM TWO U may well be missing. > > It looks as though it is. > > No, it isn't. > > > It was present in Proposal N2664 > > (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2664), as CUNEIFORM NUMERIC > > SIGN NISH, but is missing from the next revision, Proposal N2698 > > (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n2698). Between these two, > > the sign for '30' changed from CUNEIFORM NUMERIC SIGN USHU2 to > > CUNEIFORM SIGN U U U. It could be an accidental omission of *SIGN > > TWO U/SIGN MAN > > - the Unicode Cuneiform list does not appear to have been archived, > > so I can't work out why it should have been deliberately removed. > > The document you are looking for is "Rationale for changes to N2664R", > by Steve Tinney. L2/04-080. Thanks - I don't know how I missed that one when I looked through the registers. Unfortunately, it doesn't explicitly state that CUNEIFORM NUMERIC SIGN NISH had been removed. I wonder how CUNEIFORM SIGN MIN escaped the same fate! Assuming that NUMERIC SIGN NISH was removed as being U + U, I am still little the wiser as to what the equivalent coding one should use is. The sequence <SIGN U, SIGN U> does not kern tightly, and <SIGN U, SIGN U, SIGN U> looks nothing like <SIGN U U U>. In one there are clear gaps between the impressions, whereas in the other the impressions touch. I've checked some drawings and they tend to show a clear gap between an U impression at the end of one symbol and an U impression at the start of the next symbol. The note L2/04-080 recommends the sequence <SIGN U, CGJ, SIGN U>, but it is not clear how that should help. Referring to http://www.unicode.org/faq/char_combmark.html, I read the following questions (Q) and answers (A), to which I must regretfully add some remarks (R). 1. Q: Does U+034F COMBINING GRAPHEME JOINER affect display of combining character sequences? A: No. <snip> It does not impact cursive joining or ligation (contrast U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER). R: It does however affect how combining characters combine - TUS6.1 Section 7.9 gives its effect between a ligature tie and a single diacritic, and TUS Section 16.2 gives an example of its effects on Hebrew marks and accents. (The next answer but one actually gives the Hebrew example!) R: I'm confused by the display of <a,CGJ,umlaut>. Is a Fraktur font that, in its normal setting, displays it differently from <a,umlaut> in breach of the Unicode standard? 2. Q: Does U+034F COMBINING GRAPHEME JOINER join graphemes? A: No. <snip> It has no impact on line breaking, except that as for other combining marks, it should be kept with its base when breaking a line. R: According to http://www.unicode.org/Public/UNIDATA/LineBreak.txt, CGJ is of linebreak class GL. Class GL prohibits line breaks immediately before or afterwards, unless the preceding character is a space or ZWSP. I'd be inclined to go for <SIGN U, ZWJ, SIGN U>, but for one little nagging worry. In the final proposal for Cuneiform, it was proposed that the Cuneiform symbols have line-breaking class ID. This is quite reasonable for text that is written without inter-word gaps, but some text does have inter-word gaps, and this text benefits from the current line-breaking class of AL (ordinary alphabetical and symbol characters). AL v. ID is tailorable, and, curiously, ZWJ has no effect on line breaking. I have a natural preference for WORD JOINER over CGJ (and it can be easier to enter in some word-processors). Also, when normal line-breaking fails, as CGJ between canonical combining class 0 characters frequently marks a syllable boundary, I fear it might be a magnet for emergency line-breaking. So, please, which of the following is suitable, and which is better: <SIGN U, CGJ, ZWJ, SIGN U> <SIGN U, WJ, ZWJ, SIGN U> <SIGN U, ZWJ, WJ, SIGN U> Richard.

