I don't think that it is important that the user not be aware of the encoding, since it is only intended for Biblical scholars.
Jony > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Kenneth Whistler > Sent: Saturday, July 26, 2003 3:50 AM > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] > Subject: Re: Yerushala(y)im - or Biblical Hebrew > > > Peter wrote: > > > One thought: Ken has suggested CGJ be used to prevent reordering of > > combining marks in fixed position classes such as the > Hebrew vowels, > > and also suggested that users should not need to be aware > of the need > > for CGJ for this purpose but that software can be > implemented in a way > > that hides that detail. I'm not sure how that will work, > > Details TBD, of course, but the essence of it is that you > want the user experience of inserting patah + hiriq > to correspond to the backing store insertion of <patah, CGJ, > hiriq>, without making them explicitly have to know about or > type a "CGJ" key. There are various input and editing > strategies to accomplish this -- effectively the problem is > similar to other needs to tuck hidden characters away in the > backing store for bidirectional text. > > The situation for searching is a little different. While the > editing tools may be smart about the Biblical Hebrew points, > a typical query widget might not, so in that instance, you > want a query on <patah, hiriq> to match the repository store > instance of <patah, CGJ, hiriq>. Well, format controls and > some other characters (including CGJ) are ordinarily supposed > to be ignored for searching -- unless you have specialized > tailorings for them. So the ordinary strategy would be to > keep the repository normalized, and then before local > comparison against the query string, strip out the CGJ for > the match. The situation is more complicated if the query > string doesn't use a CGJ *and* gets normalized. In that > situation, you lose the distinction in order, of course, but > the search strategy should be to strip out the CGJ locally > and renormalize. That could result in false positive matches, > of course, but at least you will find what you were looking for. > > > but it's making me wonder if > > effectively we'd be looking at some amendment to the normalization > > algorithms to insert CGJ in certain enumerated contexts. > > No. > > --Ken > > > > >