RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 22:10 + 2003-12-01, [EMAIL PROTECTED] wrote:

We should rejoice that these TDIL reports exist and urge the
various authors to contribute to discussions on any edge-case
issues.
Yes.

Rather than revising history or revising encoding practices, maybe
the TDIL reports could be revised where appropriate.
Yes.

--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Oriya: mba / mwa ?

2003-12-01 Thread jameskass
.
Michael Everson wrote,

> You should implement according to what is on page 238 of the Unicode 
> Standard, and if there are people in India who think otherwise they 
> had better argue their case convincingly to the UTC.
> 
> >I don't personally care which character is used.
> 
> I *do*. Someone at the TDIL has decided he's got a bright idea about 
> how to use WA, and that changes the traditional orthography.

The TDIL document was published in April of 2002.  At that time,
page 238 of TUS 4.0 did not exist.  The authors of the Oriya section
of the report really only had the sparse information on page 227 of 
TUS 3.0 upon which to expand.

Perhaps many of us on this list have, in the past, attempted to
exptrapolate the direction the consortium might take -- only to
be surprised when a different path is chosen.

Other than the fine work by Maurice Bauhahn on Khmer, the existence
of these comprehensive TDIL reports written by technically-oriented
expert members of the script user communities who also are familiar
with computer encoding issues *and Unicode* appears to be unprecedented.

We should rejoice that these TDIL reports exist and urge the
various authors to contribute to discussions on any edge-case 
issues.

Rather than revising history or revising encoding practices, maybe 
the TDIL reports could be revised where appropriate.

Best regards,

James Kass
.



RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 11:52 -0800 2003-12-01, Peter Constable wrote:

 > Well, Peter, it's right there on the page.

What page?
Page 18 of Learn Oriya in 30 Days, what I have been quoting from.

 > KA with Virama + BA = KWA,
 in Oriya and with Latin transliterations. It's a BA. I swear.
And how do you know it's BA and not a distinct character that comes
after LLA?
A distinct character coming after LLA that looks just like BA? I know 
it's a BA because I can *read*. The book has alphabet charts. No WA. 
No VA. As expected, because they are innovations. The book shows 
examples of the constituent parts of the conjuncts in their full 
form, and it's a BA.

 > The revisionism would be in deciding that the innovated WA was to be
 used instead of BA. It isn't.
But if there are people in India that think these conjuncts are 
formed with WA, then there's an interop problem.
You should implement according to what is on page 238 of the Unicode 
Standard, and if there are people in India who think otherwise they 
had better argue their case convincingly to the UTC.

I don't personally care which character is used.
I *do*. Someone at the TDIL has decided he's got a bright idea about 
how to use WA, and that changes the traditional orthography.

I just need to worry about shipping an implementation that does one 
thing and having users come back saying it doesn't do what they 
expect, or it doesn't interoperate with other implementations they 
need to work with.
Well, I hope you are taking on board what I have been saying.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Oriya: mba / mwa ?

2003-12-01 Thread Peter Constable
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On

> Well, Peter, it's right there on the page.

What page?


> KA with Virama + BA = KWA,
> in Oriya and with Latin transliterations. It's a BA. I swear.

And how do you know it's BA and not a distinct character that comes
after LLA?



> The revisionism would be in deciding that the innovated WA was to be
> used instead of BA. It isn't.

But if there are people in India that think these conjuncts are formed
with WA, then there's an interop problem. I don't personally care which
character is used. I just need to worry about shipping an implementation
that does one thing and having users come back saying it doesn't do what
they expect, or it doesn't interoperate with other implementations they
need to work with.



> Um, I'll hunt them down shortly. Actually I haven't had an
> acknowledgement from the bookstore yet, which I figured I would just
> forward to you when it arrived.

Sounds great. Thanks.


Peter Constable




RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 10:24 -0800 2003-12-01, Peter Constable wrote:

 > Your suggestion that NYA could be involved is less plausible.

I didn't actually suggest it was nya; I merely pointed out that the same
shape is used for more than /o/.
But many WAs have differently shaped O-parts. I think your 
observation was a bit superficial. In this case.

 > I cited examples already:
 k. + ba (wa) = kwa
Your examples do not constitute clear evidence: the question is whether
the characters underlying /kwa/ are k + ba or k + something else, and
what you have written has to be taken either as presupposing the answer
(thus not eligible as evidence) or as ambiguous -- either it is ba or it
is wa.
Well, Peter, it's right there on the page. KA with Virama + BA = KWA, 
in Oriya and with Latin transliterations. It's a BA. I swear.

Perhaps I shall scan it for you. ;-)

 > I think we should avoid revisionist encodings, which will make it
 impossible to deal with older data.
Revisionist encodings? If the encoding is getting implemented for 
the first time, one can hardly talk of revisionist encodings. But 
this is a good question: are there Oriya implementation precedents? 
How were these conjuncts handled in ISCII and is there an official 
mapping between ISCII and Unicode for these sequences?
The revisionism would be in deciding that the innovated WA was to be 
used instead of BA. It isn't. WA is used word initially for foreign 
words. BA is used traditionally even when the reading rule says [w]. 
Did you read Tony Stone and my paper on VA and WA?

 > >I was hoping there might be some Indian -- Oriyan -- implementers or
 >users lurking that might want to comment. If not, then there's not
 >much more to say on this topic here. I'll try elsewhere;
 I did order dictionaries so that I can help you.
Most kind. I asked for details about the dictionaries, but I don't think
you replied to that.
Um, I'll hunt them down shortly. Actually I haven't had an 
acknowledgement from the bookstore yet, which I figured I would just 
forward to you when it arrived.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Oriya: mba / mwa ?

2003-12-01 Thread Peter Constable
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf

> Your suggestion that NYA could be involved is less plausible.

I didn't actually suggest it was nya; I merely pointed out that the same
shape is used for more than /o/.



> >I still haven't seen clear evidence; only an assertion of the former
> >based on a hypothesis that, granted, is certainly plausible.
> 
> I cited examples already:
> 
> k. + ba (wa) = kwa

Your examples do not constitute clear evidence: the question is whether
the characters underlying /kwa/ are k + ba or k + something else, and
what you have written has to be taken either as presupposing the answer
(thus not eligible as evidence) or as ambiguous -- either it is ba or it
is wa.


> I think we should avoid revisionist encodings, which will make it
> impossible to deal with older data.

Revisionist encodings? If the encoding is getting implemented for the
first time, one can hardly talk of revisionist encodings. But this is a
good question: are there Oriya implementation precedents? How were these
conjuncts handled in ISCII and is there an official mapping between
ISCII and Unicode for these sequences?

 
> >I was hoping there might be some Indian -- Oriyan -- implementers or
> >users lurking that might want to comment. If not, then there's not
> >much more to say on this topic here. I'll try elsewhere;
> 
> I did order dictionaries so that I can help you.

Most kind. I asked for details about the dictionaries, but I don't think
you replied to that.



Peter
 
Peter Constable
Globalization Infrastructure and Font Technologies
Microsoft Windows Division




RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 22:12 -0800 2003-11-30, Peter Constable wrote:
From: [EMAIL PROTECTED] on behalf of Michael Everson

What I haven't seen is clear evidence that the wa-phallaa is
considered to be related to nominal BA and not a distinct character
falling after LA.
WA has been added as a new independent letter, without a
decomposition to O+BA, although its graphic appearance and simple
phonetics shows us that it is an innovation based on that
combination.
No, the graphic appearance and phonetics reassure is this is a 
plausible hypothesis; they don't show us this must be how it is.
Your suggestion that NYA could be involved is less plausible.

 > If DBA = [dwa] surely OBA = [owa] > [wa]

But there's that underlying assumption which is what I have been 
questioning: is the written representation of /dwa/ really D.BA, or 
should it be considered D.WA?
It is traditionally, yes.

I still haven't seen clear evidence; only an assertion of the former 
based on a hypothesis that, granted, is certainly plausible.
I cited examples already:

k. + ba (wa) = kwa
j. + ba (va) = jva
dh. + ba (wa) = dhwa
m. + ba = mba
r. + ba = rba
sh. + ba = shba
But the more important question is how users and implementers, 
particularly those in India, will expect these conjuncts to be 
encoded, and that question remains. If I implement one thing and 
others another, we've got a problem.
I think we should avoid revisionist encodings, which will make it 
impossible to deal with older data.

I was hoping there might be some Indian -- Oriyan -- implementers or 
users lurking that might want to comment. If not, then there's not 
much more to say on this topic here. I'll try elsewhere;
I did order dictionaries so that I can help you.

in the meantime, I've got another similar question coming (encode 
based on sound or based on shapes?) involving some other conjuncts. 
I just need to get something scanned first.
:-)
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


RE: Oriya: mba / mwa ?

2003-12-01 Thread Peter Constable
From: [EMAIL PROTECTED] on behalf of Michael Everson

>>What I haven't seen is clear evidence that the wa-phallaa is
>>considered to be related to nominal BA and not a distinct character
>>falling after LA.
>
>WA has been added as a new independent letter, without a
>decomposition to O+BA, although its graphic appearance and simple
>phonetics shows us that it is an innovation based on that
>combination.
 
No, the graphic appearance and phonetics reassure is this is a plausible hypothesis; 
they don't show us this must be how it is.
 
 
> If DBA = [dwa] surely OBA = [owa] > [wa]

But there's that underlying assumption which is what I have been questioning: is the 
written representation of /dwa/ really D.BA, or should it be considered D.WA? I still 
haven't seen clear evidence; only an assertion of the former based on a hypothesis 
that, granted, is certainly plausible. 
 
But the more important question is how users and implementers, particularly those in 
India, will expect these conjuncts to be encoded, and that question remains. If I 
implement one thing and others another, we've got a problem.
 
 
I was hoping there might be some Indian -- Oriyan -- implementers or users lurking 
that might want to comment. If not, then there's not much more to say on this topic 
here. I'll try elsewhere; in the meantime, I've got another similar question coming 
(encode based on sound or based on shapes?) involving some other conjuncts. I just 
need to get something scanned first.
 
 
 
Peter Constable



RE: Oriya: mba / mwa ?

2003-11-30 Thread Michael Everson
At 12:09 -0800 2003-11-30, Peter Constable wrote:

 >>But there's some confusion thrown into the mix, though, by the fact
 >>that they list the shape twice in their "alphabet" (their ordered
 >>list of consonants), one being where you'd expect to find a wa;
 >
 >Who lists, where?
Lists in the two sources I had just mentioned: "Oriya Self-Taught" 
and "Caattassaalli Paattha"
I have not seen those.

 >Compare these to the chart in N2525

ya ra lla la VA WA
Which tells us what? That both the dotted-ba (VA) and the WA are 
attested as early as 1931, and considered by one source to be 
ordered after la.\
That VA and WA are two different characters (and they have been 
encoded so). That they both follow LA (VA follows LA anyway and the 
evidence in N2525 shows WA also following LA).

What I haven't seen is clear evidence that the wa-phallaa is 
considered to be related to nominal BA and not a distinct character 
falling after LA.
WA has been added as a new independent letter, without a 
decomposition to O+BA, although its graphic appearance and simple 
phonetics shows us that it is an innovation based on that 
combination. If DBA = [dwa] surely OBA = [owa] > [wa]

My contention is that it IS an innovation; that syllables in -[wa] 
were normally written with -BA and that WA was invented to cater for 
the need for initial [wa] in Urdu and English words.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Oriya: mba / mwa ?

2003-11-30 Thread Peter Constable
From: [EMAIL PROTECTED] on behalf of Michael Everson

>>Regardless of the etymology of that thing, though, what matters is
>>whether all of these should be encoded with BA, and I wouldn't find
>>it hard to go along with that: I've got a couple of sources ("Oriya
>>Self-Taught" and an Oriya booklet, "Caattassaalli Paattha") that
>>show a nominal form underlying this conjunct that looks like BA.

>>But there's some confusion thrown into the mix, though, by the fact
>>that they list the shape twice in their "alphabet" (their ordered
>>list of consonants), one being where you'd expect to find a wa;
>
>Who lists, where?
 
Lists in the two sources I had just mentioned: "Oriya Self-Taught" and "Caattassaalli 
Paattha"
 
 
>Compare these to the chart in N2525

>ya ra lla la VA WA

Which tells us what? That both the dotted-ba (VA) and the WA are attested as early as 
1931, and considered by one source to be ordered after la.
 
What I haven't seen is clear evidence that the wa-phallaa is considered to be related 
to nominal BA and not a distinct character falling after LA.
 
 
 
Peter Constable



RE: Oriya: mba / mwa ?

2003-11-30 Thread Michael Everson
At 00:38 -0800 2003-11-30, Peter Constable wrote:

 >Be thou not deceived by the glyph shapes. The etymology is O + BA =>
WA, not NYA + BA.
(Or NYA + something else...) It would be just so cool if you 
would provide references to accessible sources that present evidence 
and analysis to support that statement. :-)
Your linguistics training is not enough to see this? Initial [wa] is 
required, the script uses subscript BA to represent it, so subscript 
BA is suffixed to independent O to permit it. And that makes sense, 
while suggesting that NYA has anything to do with it makes no sense.

Regardless of the etymology of that thing, though, what matters is 
whether all of these should be encoded with BA, and I wouldn't find 
it hard to go along with that: I've got a couple of sources ("Oriya 
Self-Taught" and an Oriya booklet, "Caattassaalli Paattha") that 
show a nominal form underlying this conjunct that looks like BA.
That's the traditional orthography.

But there's some confusion thrown into the mix, though, by the fact 
that they list the shape twice in their "alphabet" (their ordered 
list of consonants), one being where you'd expect to find a wa;
Who lists, where?

and then there're sources like 
http://www1.cs.columbia.edu/~deba/misc/vasa.shtml that have the 
dotted ba form (0B35) as the second of these letters of the 
"alphabet"; and then there's Mahapatra 1996 (in Bright & Daniels) 
and the various other sources I have, including recent learning 
books used for children, and the TDIL doc, that have the WA (U+0B71) 
in that second place in the "alphabet".
On the page you cite, the first and second alphabets given read:

a aa i ii u uu r
e ai o au
ka kha ga gha nga
ca cha ja jha nya
tta ttha dda ddha nna
ta tha da dha na
pa pha BA bha ma
ya ra lla WA
sha ssa sa ha
anusvara visarga candrabindu
yya la ksha
a aa i ii u uu
r rr ll e ai o au (missing short vocalic l)
ka kha ga gha nga
ca cha ja jha nya
tta ttha dda ddha nna
ta tha da dha na
pa pha BA bha ma
ya ra lla VA sha
ssa sa ha anusvara visarga candrabindu
ksha rra rha yya la
Compare these to the chart in N2525

a aa i ii u uu
r rr l e ai o au (missing long vocalic ll)
ka kha ga gha nga
ca cha ja jha nya
tta ttha dda ddha nna
ta tha da dha na
pa pha BA bha ma
ya ra lla la VA WA
yya sa sha ssa ha ksha
anusvara visarga candrabindu
??a rra rha
Hm. I don't know what the first character in the last row is. It 
appears to be a nuktated CA. It is not in Unicode.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Oriya: mba / mwa ?

2003-11-30 Thread Peter Constable


From: [EMAIL PROTECTED] on behalf of Michael Everson

> Peter, I would take those TDIL publications with a very large grain
> of salt...
 
I didn't say that I accepted that doc unquestioned. But when they say conjuncts are 
made with WA and you come along and say, "It's BA, not WA," I need more than the word 
of Michael Everson to convince I should simply disregard them. Just as it would take 
more than the word of Peter Constable for you to believe lots of assertions I might 
make. 
 
What would be convincing might be a specialist in the Oriya language explaining that 
the morphological processes or historical derivations that have led to sequences of C 
+ "wa" are such that the character underlying the rhyme must be BA. Or a range of 
sources that are in agreement on BA. Or, perhaps more than anything, would be an 
agreement amongst key parties that all of these things are going to get encoded as BA; 
since that is ultimately what will provide interoperability.
 


>Be thou not deceived by the glyph shapes. The etymology is O + BA =>
>WA, not NYA + BA.

(Or NYA + something else...) It would be just so cool if you would provide 
references to accessible sources that present evidence and analysis to support that 
statement. :-) 
 
Regardless of the etymology of that thing, though, what matters is whether all of 
these should be encoded with BA, and I wouldn't find it hard to go along with that: 
I've got a couple of sources ("Oriya Self-Taught" and an Oriya booklet, "Caattassaalli 
Paattha") that show a nominal form underlying this conjunct that looks like BA.  
 
But there's some confusion thrown into the mix, though, by the fact that they list the 
shape twice in their "alphabet" (their ordered list of consonants), one being where 
you'd expect to find a wa; and then there're sources like 
http://www1.cs.columbia.edu/~deba/misc/vasa.shtml that have the dotted ba form (0B35) 
as the second of these letters of the "alphabet"; and then there's Mahapatra 1996 (in 
Bright & Daniels) and the various other sources I have, including recent learning 
books used for children, and the TDIL doc, that have the WA (U+0B71) in that second 
place in the "alphabet". All of these things point to something in addition to BA that 
several describe as "wa" and seem to use as the component in these conjuncts. Yet 
because the first two of these use the same shape as BA and because M.E. tells me it's 
BA, perhaps that's enough to convince me that's the right thing to do...
 
On the other hand, maybe it seems less than completely settled to me.
 
What concerns me most is the teaching materials aimed at schoolchildren. However 
recent an innovation it might be, one gets the impression that kids are learning WA as 
part of their 'alphabet'. And if Oriya speakers grow up with the idea that this is the 
thing that forms their conjuncts, then I need to ask whether that's how they're going 
to expect to be able to encode their documents.
 
 

>I have just ordered two large Oriya dictionaries which should arrive
>in a fortnight.

I'd be interested in knowing what you found and where you found them.
 
 
Peter Constable



RE: Oriya: mba / mwa ?

2003-11-29 Thread Michael Everson
At 13:17 -0800 2003-11-29, Peter Constable wrote:

 > I think the TDIL chart is wrong.

It seems reasonable that one should need extra persuasion to take 
the word of an American living in Ireland over Indians. (Sorry.)
Peter, I would take those TDIL publications with a very large grain 
of salt. Textual evidence is not given and there's all sorts of of 
stuff which really doesn't fit in well with the way we do things in 
Unicode. Like their *U+0B3A ORIYA INVISIBLE LETTER.

Just because it comes from India doesn't mean it's not revisionist.

 > Traditionally (as in Learn Oriya in 30 Days) subjoined BA is used in
 > this context although the reading rules say to pronounce it [w].
So, you're saying that all of these should be encoded as C + virama + BA?
Yes, I am. KA + BA = KBA pronounced [kwa]. That's what Learn Oriya in 
30 days shows explicitly.

 > Now an original ligature of O and BA has been pressed into service

I've seen elsewhere that you've described this as a ligature involving
O, but are you sure it's that?
Yes, I am.

Note that the same shape is used for NYA
and NNA (e.g. conjuncts for NN.NNA and SS.NNA).
Be thou not deceived by the glyph shapes. The etymology is O + BA => 
WA, not NYA + BA.

 >The traditional BA should be used for that unless we have better
 >evidence than the TDIL newsletter that such should be the practice.
I could be convinced of that; but if people in India aren't convinced of
that, the boat may not float.
WA is an innovation, unattested in earlier Oriya. You won't find it 
in Learn Oriya in 30 Days, for instance. Yet syllables in -[wa] have 
been written in Oriya for a long time, with BA.

Note that a historical VA exists and predates the WA, and the TDIL 
does not take this into account. We did encode it however.

I have just ordered two large Oriya dictionaries which should arrive 
in a fortnight.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



RE: Oriya: mba / mwa ?

2003-11-29 Thread Peter Constable
> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On
> Behalf Of Michael Everson


> I think the TDIL chart is wrong.

It seems reasonable that one should need extra persuasion to take the
word of an American living in Ireland over Indians. (Sorry.)

 
> Traditionally (as in Learn Oriya in 30 Days) subjoined BA is used in
> this context although the reading rules say to pronounce it [w].

So, you're saying that all of these should be encoded as C + virama +
BA?


> Now an original ligature of O and BA has been pressed into service

I've seen elsewhere that you've described this as a ligature involving
O, but are you sure it's that? Note that the same shape is used for NYA
and NNA (e.g. conjuncts for NN.NNA and SS.NNA).


> The traditional BA should be used for that unless we have better
> evidence than the TDIL newsletter that such should be the practice.

I could be convinced of that; but if people in India aren't convinced of
that, the boat may not float.

 

Peter Constable





Re: Oriya: mba / mwa ?

2003-11-28 Thread Michael Everson
At 21:10 + 2003-11-28, [EMAIL PROTECTED] wrote:
.
Peter Constable wrote,
The question, then, is how "MBA" should be encoded: as <0B2E MA, 
0B4D VIRAMA, 0B2C BA >, or as < 0B2E MA, 0B4D VIRAMA, 0B71 WA>?


MA + VIRAMA + BA, according to TUS 4.0, page 238.
Heh. I wrote that.

Well, it just goes to show that my thinking is consistent on this point.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com


Re: Oriya: mba / mwa ?

2003-11-28 Thread Michael Everson
At 11:34 -0800 2003-11-28, Peter Constable wrote:
A similar issue to the nndda: starting on page 54 of the TDIL newsletter
(http://tdil.mit.gov.in/ori-guru-telu.pdf) and continuing onto the next
page, they list conjuncts that have BA or WA as the second element. I've
shown those from the bottom of p. 54 in the attached image.
The shape for the conjoined component is the same for both of these.
They describe the first as involving BA, however, while all the others
involve WA. The question, then, is how "MBA" should be encoded: as 
<0B2E MA, 0B4D VIRAMA, 0B2C BA >, or as < 0B2E MA, 0B4D VIRAMA, 0B71 
WA>?
I think the TDIL chart is wrong.

Traditionally (as in Learn Oriya in 30 Days) subjoined BA is used in 
this context although the reading rules say to pronounce it [w]. 
Examples from this book:

k. + ba (wa) = kwa
j. + ba (va) = jva
dh. + ba (wa) = dhwa
m. + ba = mba
r. + ba = rba
sh. + ba = shba
Now an original ligature of O and BA has been pressed into service as 
a syllable initial WA for foreign words, and encoded at U+0B71, but I 
do not think this should be used to form conjuncts in -[wa].

o + ba (wa) = wa (this is not an example in Learn Oriya in 30 Days)

The traditional BA should be used for that unless we have better 
evidence than the TDIL newsletter that such should be the practice.

[mba] and an eventual [mwa] would be encoded MBA and the reading rule 
would be learned.

So I don't think that ORIYA LETTER WA has a conjunct form identical 
with ORIYA LETTER BA in the same way that DDA and TA do.
--
Michael Everson * * Everson Typography *  * http://www.evertype.com



Re: Oriya: mba / mwa ?

2003-11-28 Thread jameskass
.
Peter Constable wrote,

> The question, then, is how "MBA" should be encoded: as <
> 0B2E MA, 0B4D VIRAMA, 0B2C BA >, or as < 0B2E MA, 0B4D VIRAMA, 0B71 WA
> >?
> 

MA + VIRAMA + BA, according to TUS 4.0, page 238.

Best regards,

James Kass
.