I want to correct some misperceptions about CGJ; it should not be used for ligatures.
>From http://www.unicode.org/versions/Unicode4.0.0/ch15.pdf#G12985, down on page 392 (sorry for the boxes, that's Acrobat). U+034F ïïïïïïïïï ïïïïïïïï ïïïïïï is used to indicate that adjacent characters are to be treated as a unit for the purposes of language-sensitive collation and searching. In language- sensitive collation and searching, the combining grapheme joiner should be ignored unless it specifically occurs within a tailored collation element mapping. Thus it is given a completely ignorable collation element in the default collation table, like ïïïï (see Unicode Technical Standard #10, âUnicode Collation Algorithm,â and also ISO/IEC 14651). However, it can be entered into the tailoring rules for any given language, using the tailoring capabilities of the collation standards. For rendering, the combining grapheme joiner is invisible. However, some older implementations may treat a sequence of grapheme clusters linked by combining grapheme joiners as a single unit for the application of enclosing combining marks. For more information on grapheme clusters, see Unicode Technical Report #29, âText Boundaries.â For more information on enclosing combining marks, see Section 3.11, Canonical Ordering Behavior. The combining grapheme joiner must not be confused with the zero width joiner or the word joiner, which have very different functions. In particular, inserting a combining grapheme joiner between two characters should have no effect on their ligation or cursive joining behavior. Where the prevention of line breaking is the desired effect, the word joiner should be used. For more information on the behavior of these characters in line breaking, see Unicode Standard Annex #14, âLine Breaking Properties.â âMark ----- Original Message ----- From: "Doug Ewell" <[EMAIL PROTECTED]> To: "Unicode Mailing List" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Wednesday, November 24, 2004 22:09 Subject: Re: CGJ , RLM > "kefas" <pmr at informatik dot uni dash frankfurt dot de> wrote: > > > 1. U+034F CGJ, Combining Grapheme Joiner, is > > displayed as a tall rectangle in MSKLCexe-test and as > > a capital square in OutlookExpress AÍE aÍeÍaÍe. But > > CGJ "has no visible glyph"! Thus CGJ is not > > implemented correctly in Arial Unicode MS. Or are the > > editors not implemented correctly? > > U+034F was added to Unicode 3.2 in March 2002. Your copy of Arial > Unicode MS may have been released before that date. Or it may be that > Microsoft has chosen not to implement U+034F in this particular font, > which is not the same as implementing it incorrectly. > > > Should A+CGJ+E > > yield the Danish double letter a+(e-attached) ? Or > > do I hope in vain. > > Someone, some day may choose to render A + CGJ + E as Ã. Don't be > misled into thinking they are equivalent, however. > > > Is there a general rule how graphically to join 2 > > arbitrary characters? Normal tf looks already joined > > to me, and causes me problems of recognizing t and f > > as distinct letters. (I have astigmatism: cyl -3.0, > > which is not that rare) m and rn look the same from > > normal reading distance!. Some editors / some fonts > > display an m with uneven spacing of legs, which looks > > to me as if r+n is written. Any help in planning (you > > font-designers)? > > There probably could not be a general rule about this, because it is too > dependent on individual typeface designs. Sans-serif fonts like Arial > will likely have many more "joined" combinations than serif fonts like > Times, because the serifs interrupt the joining behavior. Whether the > horizontal strokes on a "t" and an "f" line up with each other is also > highly font-dependent. In many cases they do not. > > I think I have your astigmatism beat, at least in one eye. > > > 2. RLM, the Right to Left marker, seems to have no > > effect yet. Hebrew bet+RLM+SPace should leave the > > Cursor at Left and not 'jump' to the right of bet as > > it does for good or worse for bet+SP. If this is a > > correct expectation, then how can I tell (e.g. via > > MSKLC.exe) to insert RLM+SPace on CAPS+SPace ? > > This may have more to do with the rendering engine than with the font. > > -Doug Ewell > Fullerton, California > http://users.adelphia.net/~dewell/ > > > >

