I was asked this and wasn't entirely sure about the answer, or even sure if I knew of a doc in which it was discussed. (Note: this is being asked in relation to Devanagari.)
<quote> 1. Are ZWJ and ZWNJ invisible in terms of searching, sorting, etc? In other words, the sequence <consonant> <virama> <consonant> and <consonant> <virama> <ZWJ|ZWNJ> <consonant> are semantically exactly equivalent. The ZWJ and ZWNJ are simply controlling the appearance on the screen. When searching for that cluster, I won't necessarily know whether the user inserted a ZWJ to keep them from combining or not, and I don't want to have to enter it into the search string. Does the Unicode standard say anything about this, or is it up to the application developer as to how his searching and sorting works? </quote> I didn't see anythink like it mentioned in 5.17 of TUS3.0. ZW(N)J are mentioned in UTR18 in relation to the definition of grapheme clusters, so apparently it is assumed that a regular expression wildcard search can see these without distinguishing between them, but can they be ignored? I don't know. Anybody know what the answer is on this? TIA - Peter --------------------------------------------------------------------------- Peter Constable Non-Roman Script Initiative, SIL International 7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA Tel: +1 972 708 7485 E-mail: <[EMAIL PROTECTED]>

