On 08/15/2011 01:48 AM, Richard Wordingham wrote:
<la, virama, candrabindu, la>

is strictly speaking *the* sequence recommended *across* Indic
scripts for representation of Sanskrit clusters involving a nasal
and non-nasal "semivowel".

However, people working with Indic rendering in a major operating
system support the concept (see
http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0153.html).

Thanks, that's useful as a reference - it helps me find it later.

You should also see Peter Constable's mail to me (and the list):

http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0143.html

"""As you and I independently commented, it makes sense to see the virama as being in a distribution class together with matras and, therefore, for the virama to precede candrabindu."""

The issues is on the relative ordering of candrabindu and virama.  For
a C1-conjoining form (i.e. C2 relatively unmodified),<la virama
candrabindu la>  is easier to handle.  For a C2-conjoining form,<la
candrabindu virama la>  is easier to work with.

Hmm -- perhaps you mean this is so because it would be possible to easily map Virama + LA to the C2-conjoining form? This is true enough, but it is advisable to have a single uniform representation across Indic scripts and that is LA + Virama + Candrabindu + LA (because of the reasons outlined by Peter and me in the previous mails I have linked to from the archives).

Further, the situation, especially with C2-conjoining scripts, is not all that simple. See the attachment: the underlying phonetic consonant cluster is nasal-V + V + LA. It would be represented in encoded form as: VA + Virama + Candrabindu + VA + Virama + LA. As you say it would indeed be *easier* (in some way) if the Candrabindu were to precede the virama but I am not sure that is *advisable*.

Vowels and the like already occur within Tai Tham and Khmer consonant
clusters with C2-conjoining forms.

I know that and that is why I distinguish "Indian" Indic scripts and "non-Indian" (i.e. South East Asian [SEA]) Indic (i.e. Brahmic) scripts, especially in Unicode. It seems that at least in Khmer (I didn't check the other charts/chapters) one vocalic R/L vowel is represented by the independent vowel presented as a sub-base (which you call C2, but as it is not a consonant it is not actually a "C"-2) form, but haven't you noticed that it is true of all Indic scripts so far encoded that the vowel sign for vocalic L/LL is the same as the independent vowel placed (albeit in a somewhat smaller size) below the base? It is simply a choice of encoding model -- in Indic (I'll stop calling it "Indian" Indic as I feel "Brahmic" is the coverall term and Indic is a specific term) all these vowel signs are encoded as separate characters whereas in SEA they are handled by a virama-like character like sub-base consonants.

 Normally the virama equivalent (CCC
9) occurs immediately before C2, but that can already be displaced in
normalised text, e.g. the Northern Thai loan word ᩈᩮᩥᩁ᩠᩺ᨷ (from English
'serve') normalised to ᩈᩮᩥᩁ᩠᩺ᨷ<U+1A48 TAI THAM LETTER HIGH SA, U+1A6E
TAI THAM VOWEL SIGN E, U+1A65 TAI THAM VOWEL SIGN I, U+1A41 TAI THAM
LETTER RA, U+1A60 TAI THAM SIGN SAKOT, U+1A7A TAI THAM SIGN RA HAAM,
U+1A37 TAI THAM LETTER BA>.  The rendering on p155 of Bunkhit
Watcharasat's 'Northern Thai Teach-Yourself Book' (Siamese:
ภาษาเมืองล้านนา ฉบับเรียนด้วยตนเอง) makes it clear that the ra haam
(vowel killer, here acting as a consonant killer) acts on the letter ra.

Hmmm -- I'm not sure I entirely grok the SEA situation with Thai/Tai Tham/Khmer etc, but I'm sure the handling of vowelless consonants and conjoining forms in those scripts does deviate from the *Indic* model. For example, see that stuff about the Balinese Surang and how it is handled...

I've seen a claim that vowels within Tibetan consonant stacks can be
handled sensibly within the confines of Unicode - I didn't investigate
it.

I don't understand what you mean by "vowels within Tibetan consonant stacks". I also don't know whether Tibetan language written in Tibetan script requires the conjoining forms of vowels but I do know (to an extent) that Sanskrit written in Tibetan doesn't require "conjoining" forms of vowels per se generated by a virama-like character.

I think an official ruling should cover all Indic scripts,
ideally even those encoded in writing order, such as Thai.  (I'm
presuming the Thai script's subscript consonants will be supported one
day, and Lao already has one unambiguously subscript consonant.)

I would prefer for a ruling to first focus on the (Indian) Indic scripts and its extension to SEA scripts be done on a script-by-script basis after examination of the existing model for those scripts. It would not be appropriate IMHO to hastily apply the (Indian) Indic model to SEA scripts.

--
Shriramana Sharma

<<attachment: asam'vlayaaya.png>>

Reply via email to