Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

Peter Kirk Fri, 10 Oct 2003 04:51:13 -0700

On 09/10/2003 21:22, Gautam Sengupta wrote:

... Yes, but not just programmers who are concerned with how a Unicode text should be encoded, but also those who are going to have to process these texts for various purposes. Let us first introduce a small notational convention and then consider a rather minor example. Let the lowercase vowels henceforth denote *combining* vowels. In Bangla K+R+i and J+aa+I mean "I do" and "I go" respectively. Given these two forms as input, a morphological analyzer should ideally yield the following analyses: KRi = KR<VIRAMA> + I, JaaI = Jaa + I. (I am assuming orthographic - not phonemic/phonetic - input-output). In other words, the analyzer would have to insert an explicit virama after KR and somehow recognize the final <i> in KRi as <I>. Now let's consider the same pair of inputs in *my* representation. They would be K+R+VIRAMA+I and J+VIRAMA+AA+I. All that the morphological analyzer would have to do is chop off the rightmost <I>. The leftovers are exactly what we need: K+R+VIRAMA and J+VIRAMA+AA. Isn't it amazing how evidence from diverse fields of inquiry seem to converge on the *correct* solution? > > I hope this makes sense... -Gautam

It would surely be trivial for any morphological analyser to understand i as a ligature or contraction of <VIRAMA, I>, split it into the sequence, and then analyse the version with the sequence. Any morphological analyser is going to have to deal with ligatures and contractions. It could be programmed as a morphophonemic contraction, even if that is not technically linguistically correct.

--
Peter Kirk
[EMAIL PROTECTED] (personal)
[EMAIL PROTECTED] (work)
http://www.qaya.org/

Re: Bangla: [ZWJ], [VIRAMA] and CV sequences

Reply via email to