Thank you, Ken, for the long and helpful explanation of which this is an extract.
...In Unicode 4.0, CGJ has been stripped of all interpretation except as an invisible mark which can be used to tailor collation (and searching), so as to distinguish digraphic units from sequences of the same characters.
One question arises. If CGJ is used as proposed, so we have sequences such as patah CGJ hiriq and perhaps meteg CGJ vowel, does this imply that these sequences will necessarily be treated in collation as distinct from simple patah hiriq and meteg vowel sequences (the latter would of course be reversed by normalisation)? This is a simple question. I'm not yet sure if this would be desirable or not. Well, it would probably be better for meteg CGJ vowel to be collated the same as vowel meteg, as the distinction here is graphical but not semantic. As for patah CGJ hiriq, an advantage of collating this sequence the same as hiriq patah would be that existing texts which do not have CGJ here would be collated together with ones which do, and perhaps that users doing searches would not have to type the CGJ. But is this perhaps something for which specific collation rules can be tailored?
-- Peter Kirk [EMAIL PROTECTED] http://web.onetel.net.uk/~peterkirk/