I was asked this and wasn't entirely sure about the answer, or even sure 
if I knew of a doc in which it was discussed. (Note: this is being asked 
in relation to Devanagari.)

<quote>
1. Are ZWJ and ZWNJ invisible in terms of searching, sorting, etc? In 
other 
words, the sequence <consonant> <virama> <consonant> and <consonant> 
<virama> <ZWJ|ZWNJ> <consonant> are semantically exactly equivalent. The 
ZWJ and ZWNJ are simply controlling the appearance on the screen. When 
searching for that cluster, I won't necessarily know whether the user 
inserted a ZWJ to keep them from combining or not, and I don't want to 
have 
to enter it into the search string. Does the Unicode standard say anything 

about this, or is it up to the application developer as to how his 
searching and sorting works?
</quote>

I didn't see anythink like it mentioned in 5.17 of TUS3.0. ZW(N)J are 
mentioned in UTR18 in relation to the definition of grapheme clusters, so 
apparently it is assumed that a regular expression wildcard search can see 
these without distinguishing between them, but can they be ignored? I 
don't know.

Anybody know what the answer is on this?

TIA
- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>


Reply via email to