On 08/02/2002 03:17:56 PM "Sean B. Palmer" wrote:

>If anyone has any comments on this, or any references to previous
>discussions, they would be gladly recieved.

Any discussion of encoding Latin digraphs as units makes an unvalidated assumption that there is some benefit to be gained. We've gone for several decades of English text processing never having encoded English digraphs (th, ch, ph, wh, ff, gh, tt, ck, ou, ei, ie, ea, ee, oo, oa, etc.  and arguably a...e, e...e, i...e, o...e, u...e as well) as single characters, and never having felt a need. We have decades of experience dealing with implementations of Latin script, and less time dealing with implementations of Indic scripts. But regarding these scripts with which we have less experience, we encode some complex multi-graphs (especially representing vowels) in scripts such as Thai as multiple character sequences never saying there's a problem that needs encoding of digraphs to obtain a solution. Why is it, then, that for the script for which we have rather more experience people feel encoding of digraphs is necessary?

(Those are my thoughts, at an rate.)



- Peter


---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <[EMAIL PROTECTED]>

Reply via email to