Re: Non-standard Tibetan stacks (was Re: Sanskrit nasalized L)

2011-09-02 Thread Christopher Fynn
You can find quite a few "non-standard" stacks (those used in Tibetan
abbreviations) in the book བསྡུ་ཡིག་གསེར་གྱི་ཨ་ལོང།  which is freely
available in PDF format from


- Chris

On 17/08/2011, Asmus Freytag  wrote:
> On 8/16/2011 3:32 PM, Andrew West wrote:
>> On 16 August 2011 18:19, Asmus Freytag  wrote:
 "These stacks are highly unusual and are considered beyond the scope
 of plain text rendering. They may be handled by higher-level
 mechanisms".
>>> The question is: have any such "mechanisms" been defined and deployed by
>>> anyone?
>> In my opinion, until someone produces a scan of a Tibetan text with
>> multiple consonant-vowel sequences, and asks how they can represent it
>> in plain Unicode text there is no question to be answered.
>
> Thank you Andrew - that clarifies the issue for the non-specialist.
>
> A./
>
>>
>> Chris Fynn asked about certain non-standard stacks he was trying to
>> implement in the Tibetan Machine Uni font in an email to the Tibex
>> list on 2006-12-09, but these didn't involve multiple consonant-vowel
>> sequences (one stack sequence was<0F43 0FB1 0FB1 0FB2 0FB2 0F74 0F74
>> 0F71>  which would be reordered to<0F42 0FB7 0FB1 0FB1 0FB2 0FB2 0F71
>> 0F74 0F74>  by normalization which would display differently).
>>
>> Other non-standard stacks that I have seen involve horizontal
>> progression within the vertical stack (e.g. yang written horizontally
>> in a vertical stack).
>>
>> More recently, the user community needed help digitizing Tibetan texts
>> that used the superfixed letters U+0F88 and U+0F89 within non-standard
>> stacks, resulting in a proposal to encode additional letters
>> (http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3568.pdf).
>>
>> None of these non-standard stack use cases involved multiple
>> consonant-vowel sequences, and I'm not sure whether I have ever seen
>> an example of such a sequence.  I have learnt that there is little
>> point discussing a solution for a hypothetical problem, because when
>> the real problems arise they likely to be something different.
>>
>> Andrew
>>
>
>
>




Re: Non-standard Tibetan stacks (was Re: Sanskrit nasalized L)

2011-08-16 Thread Asmus Freytag

On 8/16/2011 3:32 PM, Andrew West wrote:

On 16 August 2011 18:19, Asmus Freytag  wrote:

"These stacks are highly unusual and are considered beyond the scope
of plain text rendering. They may be handled by higher-level
mechanisms".

The question is: have any such "mechanisms" been defined and deployed by
anyone?

In my opinion, until someone produces a scan of a Tibetan text with
multiple consonant-vowel sequences, and asks how they can represent it
in plain Unicode text there is no question to be answered.


Thank you Andrew - that clarifies the issue for the non-specialist.

A./



Chris Fynn asked about certain non-standard stacks he was trying to
implement in the Tibetan Machine Uni font in an email to the Tibex
list on 2006-12-09, but these didn't involve multiple consonant-vowel
sequences (one stack sequence was<0F43 0FB1 0FB1 0FB2 0FB2 0F74 0F74
0F71>  which would be reordered to<0F42 0FB7 0FB1 0FB1 0FB2 0FB2 0F71
0F74 0F74>  by normalization which would display differently).

Other non-standard stacks that I have seen involve horizontal
progression within the vertical stack (e.g. yang written horizontally
in a vertical stack).

More recently, the user community needed help digitizing Tibetan texts
that used the superfixed letters U+0F88 and U+0F89 within non-standard
stacks, resulting in a proposal to encode additional letters
(http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3568.pdf).

None of these non-standard stack use cases involved multiple
consonant-vowel sequences, and I'm not sure whether I have ever seen
an example of such a sequence.  I have learnt that there is little
point discussing a solution for a hypothetical problem, because when
the real problems arise they likely to be something different.

Andrew






Non-standard Tibetan stacks (was Re: Sanskrit nasalized L)

2011-08-16 Thread Andrew West
On 16 August 2011 18:19, Asmus Freytag  wrote:
>> "These stacks are highly unusual and are considered beyond the scope
>> of plain text rendering. They may be handled by higher-level
>> mechanisms".
>
> The question is: have any such "mechanisms" been defined and deployed by
> anyone?

In my opinion, until someone produces a scan of a Tibetan text with
multiple consonant-vowel sequences, and asks how they can represent it
in plain Unicode text there is no question to be answered.

Chris Fynn asked about certain non-standard stacks he was trying to
implement in the Tibetan Machine Uni font in an email to the Tibex
list on 2006-12-09, but these didn't involve multiple consonant-vowel
sequences (one stack sequence was <0F43 0FB1 0FB1 0FB2 0FB2 0F74 0F74
0F71> which would be reordered to <0F42 0FB7 0FB1 0FB1 0FB2 0FB2 0F71
0F74 0F74> by normalization which would display differently).

Other non-standard stacks that I have seen involve horizontal
progression within the vertical stack (e.g. yang written horizontally
in a vertical stack).

More recently, the user community needed help digitizing Tibetan texts
that used the superfixed letters U+0F88 and U+0F89 within non-standard
stacks, resulting in a proposal to encode additional letters
(http://std.dkuug.dk/jtc1/sc2/wg2/docs/n3568.pdf).

None of these non-standard stack use cases involved multiple
consonant-vowel sequences, and I'm not sure whether I have ever seen
an example of such a sequence.  I have learnt that there is little
point discussing a solution for a hypothetical problem, because when
the real problems arise they likely to be something different.

Andrew



Re: Sanskrit nasalized L

2011-08-16 Thread Philippe Verdy
2011/8/16 Asmus Freytag :
> On 8/16/2011 1:57 AM, Andrew West wrote:
>>
>> On 16 August 2011 02:59, Richard Wordingham
>>   wrote:
>>>
>>> All I've got to go on is the penultimate sentence in TUS 6.0 Section
>>> 10.2 - 'Rarely, stacks are seen that contain more than one such
>>> consonant-vowel combination in a vertical arrangement'.
>>
>> 
>>
>> Which is followed immediately by the caveat:
>>
>> "These stacks are highly unusual and are considered beyond the scope
>> of plain text rendering. They may be handled by higher-level
>> mechanisms".
>
> That's all well and good.
>
>
> The question is: have any such "mechanisms" been defined and deployed by
> anyone?

I had the same feeling when first reading this. Because it does not
say if the text fragments, containing parts of the vertical stack, can
effectively be encoded, and how... For now, I suspect that they can
only be represented by graphics, and not by some series of UCS code
points (except possibly a "defective" one, i.e. without a base letter
for the lower parts of the stack).

So, is there, for such use the possibility of encoding a null base
consonnant for holding the lower parts in fragments whose layout will
be controled by such "higher-level mechanism" ?? Or can we use, for
example, a zero-width space ?

-- Philippe.




Re: Sanskrit nasalized L

2011-08-16 Thread Asmus Freytag

On 8/16/2011 1:57 AM, Andrew West wrote:

On 16 August 2011 02:59, Richard Wordingham
  wrote:

All I've got to go on is the penultimate sentence in TUS 6.0 Section
10.2 - 'Rarely, stacks are seen that contain more than one such
consonant-vowel combination in a vertical arrangement'.



Which is followed immediately by the caveat:

"These stacks are highly unusual and are considered beyond the scope
of plain text rendering. They may be handled by higher-level
mechanisms".


That's all well and good.


The question is: have any such "mechanisms" been defined and deployed by 
anyone?


A./


The Tibetan script doesn't have a combining virama.  I would expect the
natural coding to be something like letter-vowel-subjoined
letter-vowel, e.g..

As the Unicode Standard explicitly states, non-standard stacks such as
this (which really are highly unusual, and only occur in a few
specific contexts) are outside the scope of plain text rendering, and
are not defined by the standard.  It therefore makes no sense for you
to try to specify character sequences for such non-standard stacks.

Andrew







Re: Sanskrit nasalized L

2011-08-16 Thread Andrew West
On 16 August 2011 02:59, Richard Wordingham
 wrote:
>
> All I've got to go on is the penultimate sentence in TUS 6.0 Section
> 10.2 - 'Rarely, stacks are seen that contain more than one such
> consonant-vowel combination in a vertical arrangement'.



Which is followed immediately by the caveat:

"These stacks are highly unusual and are considered beyond the scope
of plain text rendering. They may be handled by higher-level
mechanisms".

> The Tibetan script doesn't have a combining virama.  I would expect the
> natural coding to be something like letter-vowel-subjoined
> letter-vowel, e.g.  U, U+0FB2 TIBETAN SUBJOINED LETTER RA, U+0F74 TIBETAN VOWEL SIGN U>.

As the Unicode Standard explicitly states, non-standard stacks such as
this (which really are highly unusual, and only occur in a few
specific contexts) are outside the scope of plain text rendering, and
are not defined by the standard.  It therefore makes no sense for you
to try to specify character sequences for such non-standard stacks.

Andrew



Re: Sanskrit nasalized L

2011-08-16 Thread Shriramana Sharma

On 08/16/2011 07:29 AM, Richard Wordingham wrote:

The issues is on the relative ordering of candrabindu and virama.
For a C1-conjoining form (i.e. C2 relatively unmodified),   is easier to handle.  For a C2-conjoining form,   is easier to work with.


Hmm -- perhaps you mean this is so because it would be possible to
easily map Virama + LA to the C2-conjoining form?


That's my motivation.


I'm thinking more on this topic. Will get back if my ideas change.


This is not what I was talking about.  The best relevant examples in TUS
6.0 Section 11.4 are the words for "both" and "already".  The former
actually has nikahit + coeng!


I think these examples are exactly the region why one should not overly 
identify Khmer the *Indian* Indic scripts as the latter (in which I do 
not include Kharoshthi for this discussion) do not use subjoined 
consonants for final consonants.



All I've got to go on is the penultimate sentence in TUS 6.0 Section
10.2 - 'Rarely, stacks are seen that contain more than one such
consonant-vowel combination in a vertical arrangement'.

The Tibetan script doesn't have a combining virama.  I would expect the
natural coding to be something like letter-vowel-subjoined
letter-vowel, e.g..


I'm not sure what such a stack of a consonant + vowel-sign pair with 
another such pair would signify...


--
Shriramana Sharma



Re: Sanskrit nasalized L

2011-08-15 Thread Richard Wordingham
On Mon, 15 Aug 2011 07:21:20 +0530
Shriramana Sharma  wrote:

> On 08/15/2011 01:48 AM, Richard Wordingham wrote:

> > The issues is on the relative ordering of candrabindu and virama.
> > For a C1-conjoining form (i.e. C2 relatively unmodified), > candrabindu la>  is easier to handle.  For a C2-conjoining form, > candrabindu virama la>  is easier to work with.
> 
> Hmm -- perhaps you mean this is so because it would be possible to 
> easily map Virama + LA to the C2-conjoining form?

That's my motivation.

> This is true
> enough, but it is advisable to have a single uniform representation
> across Indic scripts and that is LA + Virama + Candrabindu + LA
> (because of the reasons outlined by Peter and me in the previous
> mails I have linked to from the archives).

I can't think of any characters that can be viewed as decomposing in
some sense to consonant + virama.  There are quite a few
characters that are functional equivalents to virama + consonant , and
some of these should be folded with virama + consonant in some
applications.

>> 

> I know that and that is why I distinguish "Indian" Indic scripts and 
> "non-Indian" (i.e. South East Asian [SEA]) Indic (i.e. Brahmic)
> scripts, especially in Unicode. It seems that at least in Khmer (I
> didn't check the other charts/chapters) one vocalic R/L vowel is
> represented by the independent vowel presented as a sub-base (which
> you call C2,...

This is not what I was talking about.  The best relevant examples in TUS
6.0 Section 11.4 are the words for "both" and "already".  The former
actually has nikahit + coeng!

> Hmmm -- I'm not sure I entirely grok the SEA situation with Thai/Tai 
> Tham/Khmer etc, but I'm sure the handling of vowelless consonants and 
> conjoining forms in those scripts does deviate from the *Indic*
> model. For example, see that stuff about the Balinese Surang and how
> it is handled...

Consider it a generalisation of anusvara!  The Limbu and Lepcha have an
array of final consonants, formally divorced from initial consonants.
Kharoshti apparently used conjoining forms for final consonants, though
examples are few and TUS 5.0 says virama cannot follow a vowel.  In
the Kharoshti script, the difference between a subscript MA and ANUSVARA
is slight to vanishing.

> > I've seen a claim that vowels within Tibetan consonant stacks can be
> > handled sensibly within the confines of Unicode - I didn't
> > investigate it.
 
> I don't understand what you mean by "vowels within Tibetan consonant 
> stacks".

All I've got to go on is the penultimate sentence in TUS 6.0 Section
10.2 - 'Rarely, stacks are seen that contain more than one such
consonant-vowel combination in a vertical arrangement'.

> I also don't know whether Tibetan language written in
> Tibetan script requires the conjoining forms of vowels but I do know
> (to an extent) that Sanskrit written in Tibetan doesn't require
> "conjoining" forms of vowels per se generated by a virama-like
> character.

The Tibetan script doesn't have a combining virama.  I would expect the
natural coding to be something like letter-vowel-subjoined
letter-vowel, e.g. .  A
formal analogue would be the Thai word , but it doesn't match visually - its
second vowel goes to the right of the consonants.

Richard.



Re: Sanskrit nasalized L

2011-08-15 Thread Philippe Verdy
Thai, Khmer, Lao and Tai Viet are already exceptions to the Unicode
character encoding model. This should remain bounded to the native
scripts of this region of Indochina. For the rest, all Indic scripts
using the logical encoding order (including those of Burma, and the
Philippines) should have the same coherent behavior.

So the case of the Khmer Coeng is not a good case, as Khmer does not
behave and is not included as a regular Indic script, depite of its
historic origin (anyway there's a split of representation as well
between Semitic scripts and Greek/Coptic, even if there's a common
historic origin). The split between Indochinan scripts and other
scripts with Brahmic origin is probably much more recent (and
justified by compatibility with legacy encodings), but it is
justifiable to consider those Indochinan scripts in a class separated
from "Indic" scripts, within the same large "Brahmic" family.

The so called "Unicode character model" already includes distinct
classes between alphabetic scripts, abjads, abugidas (Indic),
syllabaries, and sinographic scripts, within the phonographic family,
plus logographic scripts. This just adds another class for Indochinan
abugidas (using the visual encoding order), which should probably be
better formalized officially.

Philippe.

2011/8/14 Richard Wordingham :
> On Fri, 24 Jun 2011 18:24:01 +0530
> Shriramana Sharma  wrote:
>
>> The point is that the sequence:
>>
>> 
>>
>> is strictly speaking *the* sequence recommended *across* Indic
>> scripts for representation of Sanskrit clusters involving a nasal and
>> non-nasal "semivowel".
>
> Could you please quote me chapter and verse for this from the TUS or
> other relevant ruling.  It contradicts TUS 6.0 Section 11.4 Ordering of
> Syllable Components (p367), which treats U+17D2 KHMER SIGN COENG and
> its following consonant (or independent vowel) as inseparable.
>
> It also creates the further oddity that when using a 'consonant sign'
> (Tibetan, possibly Myanmar, and Tai Tham) one would have the sequence
> .  (Alas, I don't have any relevant
> Sanskrit examples in those scripts.)
>
> The problem may be what is meant by an 'Indic script'?  Do you include
> Tibetan and Further Indian Indic scripts (e.g. Myanmar, Tai Tham and
> Khmer), or do you just mean Indian Indic scripts?
>
> Richard.
>
>




Re: Sanskrit nasalized L

2011-08-14 Thread Shriramana Sharma

On 08/15/2011 01:48 AM, Richard Wordingham wrote:



is strictly speaking *the* sequence recommended *across* Indic
scripts for representation of Sanskrit clusters involving a nasal
and non-nasal "semivowel".



However, people working with Indic rendering in a major operating
system support the concept (see
http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0153.html).


Thanks, that's useful as a reference - it helps me find it later.


You should also see Peter Constable's mail to me (and the list):

http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0143.html

"""As you and I independently commented, it makes sense to see the 
virama as being in a distribution class together with matras and, 
therefore, for the virama to precede candrabindu."""



The issues is on the relative ordering of candrabindu and virama.  For
a C1-conjoining form (i.e. C2 relatively unmodified),  is easier to handle.  For a C2-conjoining form,  is easier to work with.


Hmm -- perhaps you mean this is so because it would be possible to 
easily map Virama + LA to the C2-conjoining form? This is true enough, 
but it is advisable to have a single uniform representation across Indic 
scripts and that is LA + Virama + Candrabindu + LA (because of the 
reasons outlined by Peter and me in the previous mails I have linked to 
from the archives).


Further, the situation, especially with C2-conjoining scripts, is not 
all that simple. See the attachment: the underlying phonetic consonant 
cluster is nasal-V + V + LA. It would be represented in encoded form as: 
VA + Virama + Candrabindu + VA + Virama + LA. As you say it would indeed 
be *easier* (in some way) if the Candrabindu were to precede the virama 
but I am not sure that is *advisable*.



Vowels and the like already occur within Tai Tham and Khmer consonant
clusters with C2-conjoining forms.


I know that and that is why I distinguish "Indian" Indic scripts and 
"non-Indian" (i.e. South East Asian [SEA]) Indic (i.e. Brahmic) scripts, 
especially in Unicode. It seems that at least in Khmer (I didn't check 
the other charts/chapters) one vocalic R/L vowel is represented by the 
independent vowel presented as a sub-base (which you call C2, but as it 
is not a consonant it is not actually a "C"-2) form, but haven't you 
noticed that it is true of all Indic scripts so far encoded that the 
vowel sign for vocalic L/LL is the same as the independent vowel placed 
(albeit in a somewhat smaller size) below the base? It is simply a 
choice of encoding model -- in Indic (I'll stop calling it "Indian" 
Indic as I feel "Brahmic" is the coverall term and Indic is a specific 
term) all these vowel signs are encoded as separate characters whereas 
in SEA they are handled by a virama-like character like sub-base consonants.



 Normally the virama equivalent (CCC
9) occurs immediately before C2, but that can already be displaced in
normalised text, e.g. the Northern Thai loan word ᩈᩮᩥᩁ᩠᩺ᨷ (from English
'serve') normalised to ᩈᩮᩥᩁ᩠᩺ᨷ.  The rendering on p155 of Bunkhit
Watcharasat's 'Northern Thai Teach-Yourself Book' (Siamese:
ภาษาเมืองล้านนา ฉบับเรียนด้วยตนเอง) makes it clear that the ra haam
(vowel killer, here acting as a consonant killer) acts on the letter ra.


Hmmm -- I'm not sure I entirely grok the SEA situation with Thai/Tai 
Tham/Khmer etc, but I'm sure the handling of vowelless consonants and 
conjoining forms in those scripts does deviate from the *Indic* model. 
For example, see that stuff about the Balinese Surang and how it is 
handled...



I've seen a claim that vowels within Tibetan consonant stacks can be
handled sensibly within the confines of Unicode - I didn't investigate
it.


I don't understand what you mean by "vowels within Tibetan consonant 
stacks". I also don't know whether Tibetan language written in Tibetan 
script requires the conjoining forms of vowels but I do know (to an 
extent) that Sanskrit written in Tibetan doesn't require "conjoining" 
forms of vowels per se generated by a virama-like character.



I think an official ruling should cover all Indic scripts,
ideally even those encoded in writing order, such as Thai.  (I'm
presuming the Thai script's subscript consonants will be supported one
day, and Lao already has one unambiguously subscript consonant.)


I would prefer for a ruling to first focus on the (Indian) Indic scripts 
and its extension to SEA scripts be done on a script-by-script basis 
after examination of the existing model for those scripts. It would not 
be appropriate IMHO to hastily apply the (Indian) Indic model to SEA 
scripts.


--
Shriramana Sharma
<>

Re: Sanskrit nasalized L

2011-08-14 Thread Richard Wordingham
On Sun, 14 Aug 2011 19:59:30 +0530
Shriramana Sharma  wrote:

> On 08/14/2011 06:02 PM, Richard Wordingham wrote:
> > On Fri, 24 Jun 2011 18:24:01 +0530
> > Shriramana Sharma  wrote:
> >
> >> The point is that the sequence:
> >>
> >> 
> >>
> >> is strictly speaking *the* sequence recommended *across* Indic
> >> scripts for representation of Sanskrit clusters involving a nasal
> >> and non-nasal "semivowel".
> >
> > Could you please quote me chapter and verse for this from the TUS or
> > other relevant ruling.
 
> 
> However, people working with Indic rendering in a major operating
> system support the concept (see 
> http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0153.html).

Thanks, that's useful as a reference - it helps me find it later.

> To make it official I'll submit a document for this matter to be 
> included in the published Standard.

To me, the issue is on the relative ordering of candrabindu and virama.

> For Indian Indic scripts we have attestations for both scripts using 
> C1-conjoining forms and C2-conjoining forms.

The issues is on the relative ordering of candrabindu and virama.  For
a C1-conjoining form (i.e. C2 relatively unmodified),  is easier to handle.  For a C2-conjoining form,  is easier to work with.

Vowels and the like already occur within Tai Tham and Khmer consonant
clusters with C2-conjoining forms.  Normally the virama equivalent (CCC
9) occurs immediately before C2, but that can already be displaced in
normalised text, e.g. the Northern Thai loan word ᩈᩮᩥᩁ᩠᩺ᨷ (from English
'serve') normalised to ᩈᩮᩥᩁ᩠᩺ᨷ .  The rendering on p155 of Bunkhit
Watcharasat's 'Northern Thai Teach-Yourself Book' (Siamese:
ภาษาเมืองล้านนา ฉบับเรียนด้วยตนเอง) makes it clear that the ra haam
(vowel killer, here acting as a consonant killer) acts on the letter ra.

I've seen a claim that vowels within Tibetan consonant stacks can be
handled sensibly within the confines of Unicode - I didn't investigate
it.

I think an official ruling should cover all Indic scripts,
ideally even those encoded in writing order, such as Thai.  (I'm
presuming the Thai script's subscript consonants will be supported one
day, and Lao already has one unambiguously subscript consonant.)

Richard.




Re: Sanskrit nasalized L

2011-08-14 Thread Shriramana Sharma

On 08/14/2011 06:02 PM, Richard Wordingham wrote:

On Fri, 24 Jun 2011 18:24:01 +0530
Shriramana Sharma  wrote:


The point is that the sequence:



is strictly speaking *the* sequence recommended *across* Indic
scripts for representation of Sanskrit clusters involving a nasal and
non-nasal "semivowel".


Could you please quote me chapter and verse for this from the TUS or
other relevant ruling.


I'm sorry -- perhaps I should not have written so presumptuously. So far 
there is no such official mention.


However, people working with Indic rendering in a major operating system 
support the concept (see 
http://www.unicode.org/mail-arch/unicode-ml/y2011-m06/0153.html).


To make it official I'll submit a document for this matter to be 
included in the published Standard.



It contradicts TUS 6.0 Section 11.4 Ordering of
Syllable Components (p367), which treats U+17D2 KHMER SIGN COENG and
its following consonant (or independent vowel) as inseparable.


That may be true for Khmer -- I am not a user of the script. I however 
can only speak for Vedic/Sanskrit writings in *Indian* Indic scripts (as 
you term them).



It also creates the further oddity that when using a 'consonant sign'
(Tibetan, possibly Myanmar, and Tai Tham) one would have the sequence
.  (Alas, I don't have any relevant
Sanskrit examples in those scripts.)


Yep -- it is highly unlikely that such samples exist in the first place. 
So it is not known what the "desired rendering is". If at all any 
rendering similar to the Indian Indic scripts is attested, then the 
section of Khmer which you quote may well have to be edited!


For Indian Indic scripts we have attestations for both scripts using 
C1-conjoining forms and C2-conjoining forms.



The problem may be what is meant by an 'Indic script'?  Do you include
Tibetan and Further Indian Indic scripts (e.g. Myanmar, Tai Tham and
Khmer), or do you just mean Indian Indic scripts?


See above.

--
Shriramana Sharma



Re: Sanskrit nasalized L

2011-08-14 Thread Richard Wordingham
On Fri, 24 Jun 2011 18:24:01 +0530
Shriramana Sharma  wrote:

> The point is that the sequence:
> 
> 
> 
> is strictly speaking *the* sequence recommended *across* Indic
> scripts for representation of Sanskrit clusters involving a nasal and
> non-nasal "semivowel".

Could you please quote me chapter and verse for this from the TUS or
other relevant ruling.  It contradicts TUS 6.0 Section 11.4 Ordering of
Syllable Components (p367), which treats U+17D2 KHMER SIGN COENG and
its following consonant (or independent vowel) as inseparable.

It also creates the further oddity that when using a 'consonant sign'
(Tibetan, possibly Myanmar, and Tai Tham) one would have the sequence
.  (Alas, I don't have any relevant
Sanskrit examples in those scripts.)

The problem may be what is meant by an 'Indic script'?  Do you include
Tibetan and Further Indian Indic scripts (e.g. Myanmar, Tai Tham and
Khmer), or do you just mean Indian Indic scripts?

Richard.



Re: Sanskrit nasalized L

2011-07-23 Thread tulasi
Second from left is correct!

Before ISCII, individuals used to create TTF font files using ASCI code.
Because there were not enough glyphs, that was the time-frame, when this
no-traditional art was evolved.

Unicode Inc shall NOT repeat it as excuse.

While Indus like Sharmaji Maheshji et al seating why is this happening :)

Is this ISO guidelines?

Thanks,

Tulasi


From: Philippe Verdy 
Date: Fri, Jul 1, 2011 at 6:31 AM
Subject: Re: Sanskrit nasalized L
To: tulasi 
Cc: Unicode Discussion , unic...@yahoogroups.com


2011/6/28 tulasi :
> first-image
> line-1 - first & last word, CANDRABINDU is tied up with 1st letter.

Effectively, CANDRABINDU is tied to the first letter (using its
half-form). This suggests effectively the encoding as
 for the first letter, followed by a regular
 for the second one.

> it is not tied up with WWA (or VVA). probably computer print using ASCII
> based TTF fonts.

No such font exist. Indic fonts are either built on top of regular
Unicode assignments, or based on legacy ISCII, not ASCII. Yes, the
half-form of the first letter (on the left) is probably not the best
rendering, when a vertically stacked conjunct would be preferable
(using a full form of the first letter, and stacking the second letter
below it in its subjoined form). But the scanned JPEG shows that the
non-stacked side-by-side rendering of the conjuct is acceptable.

Without the CANDRABINDU, the two letters would become ,
which would be also renderable side-by-side (with the first LA using
its half-form, and the second one using the regular full form), or in
a stacked conjuct (with the first LA using its full form, and the
second one using the subjoined form) which is preferably rendered.

> line-2 CANDRABINDU is not tied up with LLA but with vowel sign on top of
GA.

No. Even if CANDRABINDU is visually touching/colliding with the vowel
sign on top of GA, it is still separate from it. The collision is
accidental due to the limited space to place it (clearly preferably on
top of the center of the first LA in its half form).

All 3 occurences of candrabindu in the two scanned lines of text are
placing it over the half form of the first letter of an horizontally
rendered conjunct. It looks like the horizontal rendering is probably
the best choice here (and not a technical fallback) as it clearly
indicates with which letter of the conjunct the candrabindu is
associated.

> in all cases if a conjunct has a vowel sign like in 2nd image then
> CANDRABINDU moves right side of the vowel sign.
>
> question
> (a) then which one is incorrect in second image?

The first on the left side is effectively desired, the second one will
be incorrect for rendering the first nasalized halanted LA in
 but could be produced by
 for rendering the second full LA with
nasalisation.

And yes, stop calling conjuncts made of halanted LA plus full LA as
"LLA" : Shiramana is right. This conjunct (without candrabindu) must
be named "l_la.sub" (this would occur in standard glyph naming for
building OpenType feature tables) or just "L-LA" (preferable if
referencing it the conjuct in a Unicode sequence alias name, where the
dot is not allowed). If we were naming the full conjuct with the
candrabindu in OpenType with "standard" glyph naming rules, we would
refer to it as "l_candrabindu_la.sub".

> From: Shriramana Sharma 
> Date: Sat, Jun 25, 2011 at 9:32 AM
> Subject: Re: Sanskrit nasalized L
> To: Unicode Discussion List 
>
>
> On 2011-Jun-23 10:27, tulasi wrote:
>>
>> It is natural to place CANDRABINDU on top of vertical bar.
>> So CANDRABINDU in this case shall seat on top of regular LA of LLA.

Not here. Clearly.

>> Moreover that LLA is an ugly one.

May be. But within texts with limited line height, it is common to
replace vertically stacked conjuncts with subjoined letters, by
conjuncts using half forms. Both are correct, and are a matter of
preference (or technical limitations in the renderers, or preference
to avoid extending the line height to save paper space).

>> Two-tire (one top of other) LLA is preferred choice (neutral).

I agree. But still, this still means that the candrabindu must lie
over the middle of the first letter, even if then it is ambiguous to
which one (first halanted LA in full form, or second LA subjoined at
the same horizontal position) it is bound semantically. So the
horizontally rendered conjunct becomes much less ugly as it clearly
allows to visually identify with which LA the candrabindu will be
linked.

> This is "tire"some! ;)
>
> Even if the top-bottom ligature-style L.LA <http://l.la/> is used, the
candrabindu should
> go after LA + Virama and not at the end of the syllable as you are
> suggested.

I also agree with this encoding . Even if
some fonts are currently dropping the half-form of the first LA and
adopt the full form of the first LA

Re: Sanskrit nasalized L

2011-06-22 Thread tulasi
It is natural to place CANDRABINDU on top of vertical bar.
So CANDRABINDU in this case shall seat on top of regular LA of LLA. Moreover
that LLA is an ugly one. Two-tire (one top of other) LLA is preferred choice
(neutral).

CANDRABINDU takes center-place where a symbol (conjunct, consonant, vowel)
does not have vertical bar.

I shall question Unicode, Inc for flaws added to Nagari script.  :)

Tulasi


On Tue, Jun 21, 2011 at 10:48 PM, Peter Constable wrote:

>  I see no necessity for ZWJ, but since ZWJ can be used to select “half”
> forms and this involves a half-form la, that’s a valid sequence.
>
> ** **
>
> ** **
>
> Peter
>
> ** **
>
> *From:* unicode-bou...@unicode.org [mailto:unicode-bou...@unicode.org] *On
> Behalf Of *Vinodh Rajan
> *Sent:* Tuesday, June 21, 2011 10:11 AM
> *To:* Shriramana Sharma
> *Cc:* Unicode Discussion List
>
> *Subject:* Re: Sanskrit nasalized L
>
> ** **
>
> ल्‍ँल
>
> ** **
>
> LA + VIRAMA + ZWJ + CHANDRABINDU + LA
>
> ** **
>
> (BTW Without ZWJ ल्ँल)
>
> ** **
>
> V
>
> ** **
>
>
>


Re: Sanskrit nasalized L

2011-06-21 Thread tulasi
LA + VIRAMA + LA => LLA
LLA + CANDRABINDU => LLACANDRABINDU
CANDRABINDU sits on top of conjunct LLA

so,
LA + VIRAMA + LA + CANDRABINDU => LLACANDRABINDU
is natural writing.

Tulasi


From: Vinodh Rajan 
Date: Tue, Jun 21, 2011 at 10:10 AM
Subject: Re: Sanskrit nasalized L
To: Shriramana Sharma 
Cc: Unicode Discussion List 


ल्‍ँल

LA + VIRAMA + ZWJ + CHANDRABINDU + LA

(BTW Without ZWJ ल्ँल)

V


On Tue, Jun 21, 2011 at 9:57 PM, Shriramana Sharma wrote:

> On Monday 20 June 2011 12:54 AM, Peter Constable wrote:
>
>> Here's a related text element:
>> [cid:image001.png@01CC2E7B.B6DA0BA0]
>>
>> That’s another that the original Uniscribe implementation didn’t allow but
>> that we’ve become aware of as needing to be supported.
>>
>
>
> LA + VIRAMA + CANDRABINDU + LA
>
> or
>
> LA + CANDRABINDU + VIRAMA + LA
>
> I personally feel the former is the "correct" one but would like to hear
> your views.
>
> --
> Shriramana Sharma
>
>