RE: Oriya: mba / mwa ?
At 22:10 + 2003-12-01, [EMAIL PROTECTED] wrote: We should rejoice that these TDIL reports exist and urge the various authors to contribute to discussions on any edge-case issues. Yes. Rather than revising history or revising encoding practices, maybe the TDIL reports could be revised where appropriate. Yes. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
. Michael Everson wrote, > You should implement according to what is on page 238 of the Unicode > Standard, and if there are people in India who think otherwise they > had better argue their case convincingly to the UTC. > > >I don't personally care which character is used. > > I *do*. Someone at the TDIL has decided he's got a bright idea about > how to use WA, and that changes the traditional orthography. The TDIL document was published in April of 2002. At that time, page 238 of TUS 4.0 did not exist. The authors of the Oriya section of the report really only had the sparse information on page 227 of TUS 3.0 upon which to expand. Perhaps many of us on this list have, in the past, attempted to exptrapolate the direction the consortium might take -- only to be surprised when a different path is chosen. Other than the fine work by Maurice Bauhahn on Khmer, the existence of these comprehensive TDIL reports written by technically-oriented expert members of the script user communities who also are familiar with computer encoding issues *and Unicode* appears to be unprecedented. We should rejoice that these TDIL reports exist and urge the various authors to contribute to discussions on any edge-case issues. Rather than revising history or revising encoding practices, maybe the TDIL reports could be revised where appropriate. Best regards, James Kass .
RE: Oriya: mba / mwa ?
At 11:52 -0800 2003-12-01, Peter Constable wrote: > Well, Peter, it's right there on the page. What page? Page 18 of Learn Oriya in 30 Days, what I have been quoting from. > KA with Virama + BA = KWA, in Oriya and with Latin transliterations. It's a BA. I swear. And how do you know it's BA and not a distinct character that comes after LLA? A distinct character coming after LLA that looks just like BA? I know it's a BA because I can *read*. The book has alphabet charts. No WA. No VA. As expected, because they are innovations. The book shows examples of the constituent parts of the conjuncts in their full form, and it's a BA. > The revisionism would be in deciding that the innovated WA was to be used instead of BA. It isn't. But if there are people in India that think these conjuncts are formed with WA, then there's an interop problem. You should implement according to what is on page 238 of the Unicode Standard, and if there are people in India who think otherwise they had better argue their case convincingly to the UTC. I don't personally care which character is used. I *do*. Someone at the TDIL has decided he's got a bright idea about how to use WA, and that changes the traditional orthography. I just need to worry about shipping an implementation that does one thing and having users come back saying it doesn't do what they expect, or it doesn't interoperate with other implementations they need to work with. Well, I hope you are taking on board what I have been saying. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Well, Peter, it's right there on the page. What page? > KA with Virama + BA = KWA, > in Oriya and with Latin transliterations. It's a BA. I swear. And how do you know it's BA and not a distinct character that comes after LLA? > The revisionism would be in deciding that the innovated WA was to be > used instead of BA. It isn't. But if there are people in India that think these conjuncts are formed with WA, then there's an interop problem. I don't personally care which character is used. I just need to worry about shipping an implementation that does one thing and having users come back saying it doesn't do what they expect, or it doesn't interoperate with other implementations they need to work with. > Um, I'll hunt them down shortly. Actually I haven't had an > acknowledgement from the bookstore yet, which I figured I would just > forward to you when it arrived. Sounds great. Thanks. Peter Constable
RE: Oriya: mba / mwa ?
At 10:24 -0800 2003-12-01, Peter Constable wrote: > Your suggestion that NYA could be involved is less plausible. I didn't actually suggest it was nya; I merely pointed out that the same shape is used for more than /o/. But many WAs have differently shaped O-parts. I think your observation was a bit superficial. In this case. > I cited examples already: k. + ba (wa) = kwa Your examples do not constitute clear evidence: the question is whether the characters underlying /kwa/ are k + ba or k + something else, and what you have written has to be taken either as presupposing the answer (thus not eligible as evidence) or as ambiguous -- either it is ba or it is wa. Well, Peter, it's right there on the page. KA with Virama + BA = KWA, in Oriya and with Latin transliterations. It's a BA. I swear. Perhaps I shall scan it for you. ;-) > I think we should avoid revisionist encodings, which will make it impossible to deal with older data. Revisionist encodings? If the encoding is getting implemented for the first time, one can hardly talk of revisionist encodings. But this is a good question: are there Oriya implementation precedents? How were these conjuncts handled in ISCII and is there an official mapping between ISCII and Unicode for these sequences? The revisionism would be in deciding that the innovated WA was to be used instead of BA. It isn't. WA is used word initially for foreign words. BA is used traditionally even when the reading rule says [w]. Did you read Tony Stone and my paper on VA and WA? > >I was hoping there might be some Indian -- Oriyan -- implementers or >users lurking that might want to comment. If not, then there's not >much more to say on this topic here. I'll try elsewhere; I did order dictionaries so that I can help you. Most kind. I asked for details about the dictionaries, but I don't think you replied to that. Um, I'll hunt them down shortly. Actually I haven't had an acknowledgement from the bookstore yet, which I figured I would just forward to you when it arrived. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Your suggestion that NYA could be involved is less plausible. I didn't actually suggest it was nya; I merely pointed out that the same shape is used for more than /o/. > >I still haven't seen clear evidence; only an assertion of the former > >based on a hypothesis that, granted, is certainly plausible. > > I cited examples already: > > k. + ba (wa) = kwa Your examples do not constitute clear evidence: the question is whether the characters underlying /kwa/ are k + ba or k + something else, and what you have written has to be taken either as presupposing the answer (thus not eligible as evidence) or as ambiguous -- either it is ba or it is wa. > I think we should avoid revisionist encodings, which will make it > impossible to deal with older data. Revisionist encodings? If the encoding is getting implemented for the first time, one can hardly talk of revisionist encodings. But this is a good question: are there Oriya implementation precedents? How were these conjuncts handled in ISCII and is there an official mapping between ISCII and Unicode for these sequences? > >I was hoping there might be some Indian -- Oriyan -- implementers or > >users lurking that might want to comment. If not, then there's not > >much more to say on this topic here. I'll try elsewhere; > > I did order dictionaries so that I can help you. Most kind. I asked for details about the dictionaries, but I don't think you replied to that. Peter Peter Constable Globalization Infrastructure and Font Technologies Microsoft Windows Division
RE: Oriya: mba / mwa ?
At 22:12 -0800 2003-11-30, Peter Constable wrote: From: [EMAIL PROTECTED] on behalf of Michael Everson What I haven't seen is clear evidence that the wa-phallaa is considered to be related to nominal BA and not a distinct character falling after LA. WA has been added as a new independent letter, without a decomposition to O+BA, although its graphic appearance and simple phonetics shows us that it is an innovation based on that combination. No, the graphic appearance and phonetics reassure is this is a plausible hypothesis; they don't show us this must be how it is. Your suggestion that NYA could be involved is less plausible. > If DBA = [dwa] surely OBA = [owa] > [wa] But there's that underlying assumption which is what I have been questioning: is the written representation of /dwa/ really D.BA, or should it be considered D.WA? It is traditionally, yes. I still haven't seen clear evidence; only an assertion of the former based on a hypothesis that, granted, is certainly plausible. I cited examples already: k. + ba (wa) = kwa j. + ba (va) = jva dh. + ba (wa) = dhwa m. + ba = mba r. + ba = rba sh. + ba = shba But the more important question is how users and implementers, particularly those in India, will expect these conjuncts to be encoded, and that question remains. If I implement one thing and others another, we've got a problem. I think we should avoid revisionist encodings, which will make it impossible to deal with older data. I was hoping there might be some Indian -- Oriyan -- implementers or users lurking that might want to comment. If not, then there's not much more to say on this topic here. I'll try elsewhere; I did order dictionaries so that I can help you. in the meantime, I've got another similar question coming (encode based on sound or based on shapes?) involving some other conjuncts. I just need to get something scanned first. :-) -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
From: [EMAIL PROTECTED] on behalf of Michael Everson >>What I haven't seen is clear evidence that the wa-phallaa is >>considered to be related to nominal BA and not a distinct character >>falling after LA. > >WA has been added as a new independent letter, without a >decomposition to O+BA, although its graphic appearance and simple >phonetics shows us that it is an innovation based on that >combination. No, the graphic appearance and phonetics reassure is this is a plausible hypothesis; they don't show us this must be how it is. > If DBA = [dwa] surely OBA = [owa] > [wa] But there's that underlying assumption which is what I have been questioning: is the written representation of /dwa/ really D.BA, or should it be considered D.WA? I still haven't seen clear evidence; only an assertion of the former based on a hypothesis that, granted, is certainly plausible. But the more important question is how users and implementers, particularly those in India, will expect these conjuncts to be encoded, and that question remains. If I implement one thing and others another, we've got a problem. I was hoping there might be some Indian -- Oriyan -- implementers or users lurking that might want to comment. If not, then there's not much more to say on this topic here. I'll try elsewhere; in the meantime, I've got another similar question coming (encode based on sound or based on shapes?) involving some other conjuncts. I just need to get something scanned first. Peter Constable
RE: Oriya: mba / mwa ?
At 12:09 -0800 2003-11-30, Peter Constable wrote: >>But there's some confusion thrown into the mix, though, by the fact >>that they list the shape twice in their "alphabet" (their ordered >>list of consonants), one being where you'd expect to find a wa; > >Who lists, where? Lists in the two sources I had just mentioned: "Oriya Self-Taught" and "Caattassaalli Paattha" I have not seen those. >Compare these to the chart in N2525 ya ra lla la VA WA Which tells us what? That both the dotted-ba (VA) and the WA are attested as early as 1931, and considered by one source to be ordered after la.\ That VA and WA are two different characters (and they have been encoded so). That they both follow LA (VA follows LA anyway and the evidence in N2525 shows WA also following LA). What I haven't seen is clear evidence that the wa-phallaa is considered to be related to nominal BA and not a distinct character falling after LA. WA has been added as a new independent letter, without a decomposition to O+BA, although its graphic appearance and simple phonetics shows us that it is an innovation based on that combination. If DBA = [dwa] surely OBA = [owa] > [wa] My contention is that it IS an innovation; that syllables in -[wa] were normally written with -BA and that WA was invented to cater for the need for initial [wa] in Urdu and English words. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
From: [EMAIL PROTECTED] on behalf of Michael Everson >>Regardless of the etymology of that thing, though, what matters is >>whether all of these should be encoded with BA, and I wouldn't find >>it hard to go along with that: I've got a couple of sources ("Oriya >>Self-Taught" and an Oriya booklet, "Caattassaalli Paattha") that >>show a nominal form underlying this conjunct that looks like BA. >>But there's some confusion thrown into the mix, though, by the fact >>that they list the shape twice in their "alphabet" (their ordered >>list of consonants), one being where you'd expect to find a wa; > >Who lists, where? Lists in the two sources I had just mentioned: "Oriya Self-Taught" and "Caattassaalli Paattha" >Compare these to the chart in N2525 >ya ra lla la VA WA Which tells us what? That both the dotted-ba (VA) and the WA are attested as early as 1931, and considered by one source to be ordered after la. What I haven't seen is clear evidence that the wa-phallaa is considered to be related to nominal BA and not a distinct character falling after LA. Peter Constable
RE: Oriya: mba / mwa ?
At 00:38 -0800 2003-11-30, Peter Constable wrote: >Be thou not deceived by the glyph shapes. The etymology is O + BA => WA, not NYA + BA. (Or NYA + something else...) It would be just so cool if you would provide references to accessible sources that present evidence and analysis to support that statement. :-) Your linguistics training is not enough to see this? Initial [wa] is required, the script uses subscript BA to represent it, so subscript BA is suffixed to independent O to permit it. And that makes sense, while suggesting that NYA has anything to do with it makes no sense. Regardless of the etymology of that thing, though, what matters is whether all of these should be encoded with BA, and I wouldn't find it hard to go along with that: I've got a couple of sources ("Oriya Self-Taught" and an Oriya booklet, "Caattassaalli Paattha") that show a nominal form underlying this conjunct that looks like BA. That's the traditional orthography. But there's some confusion thrown into the mix, though, by the fact that they list the shape twice in their "alphabet" (their ordered list of consonants), one being where you'd expect to find a wa; Who lists, where? and then there're sources like http://www1.cs.columbia.edu/~deba/misc/vasa.shtml that have the dotted ba form (0B35) as the second of these letters of the "alphabet"; and then there's Mahapatra 1996 (in Bright & Daniels) and the various other sources I have, including recent learning books used for children, and the TDIL doc, that have the WA (U+0B71) in that second place in the "alphabet". On the page you cite, the first and second alphabets given read: a aa i ii u uu r e ai o au ka kha ga gha nga ca cha ja jha nya tta ttha dda ddha nna ta tha da dha na pa pha BA bha ma ya ra lla WA sha ssa sa ha anusvara visarga candrabindu yya la ksha a aa i ii u uu r rr ll e ai o au (missing short vocalic l) ka kha ga gha nga ca cha ja jha nya tta ttha dda ddha nna ta tha da dha na pa pha BA bha ma ya ra lla VA sha ssa sa ha anusvara visarga candrabindu ksha rra rha yya la Compare these to the chart in N2525 a aa i ii u uu r rr l e ai o au (missing long vocalic ll) ka kha ga gha nga ca cha ja jha nya tta ttha dda ddha nna ta tha da dha na pa pha BA bha ma ya ra lla la VA WA yya sa sha ssa ha ksha anusvara visarga candrabindu ??a rra rha Hm. I don't know what the first character in the last row is. It appears to be a nuktated CA. It is not in Unicode. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
From: [EMAIL PROTECTED] on behalf of Michael Everson > Peter, I would take those TDIL publications with a very large grain > of salt... I didn't say that I accepted that doc unquestioned. But when they say conjuncts are made with WA and you come along and say, "It's BA, not WA," I need more than the word of Michael Everson to convince I should simply disregard them. Just as it would take more than the word of Peter Constable for you to believe lots of assertions I might make. What would be convincing might be a specialist in the Oriya language explaining that the morphological processes or historical derivations that have led to sequences of C + "wa" are such that the character underlying the rhyme must be BA. Or a range of sources that are in agreement on BA. Or, perhaps more than anything, would be an agreement amongst key parties that all of these things are going to get encoded as BA; since that is ultimately what will provide interoperability. >Be thou not deceived by the glyph shapes. The etymology is O + BA => >WA, not NYA + BA. (Or NYA + something else...) It would be just so cool if you would provide references to accessible sources that present evidence and analysis to support that statement. :-) Regardless of the etymology of that thing, though, what matters is whether all of these should be encoded with BA, and I wouldn't find it hard to go along with that: I've got a couple of sources ("Oriya Self-Taught" and an Oriya booklet, "Caattassaalli Paattha") that show a nominal form underlying this conjunct that looks like BA. But there's some confusion thrown into the mix, though, by the fact that they list the shape twice in their "alphabet" (their ordered list of consonants), one being where you'd expect to find a wa; and then there're sources like http://www1.cs.columbia.edu/~deba/misc/vasa.shtml that have the dotted ba form (0B35) as the second of these letters of the "alphabet"; and then there's Mahapatra 1996 (in Bright & Daniels) and the various other sources I have, including recent learning books used for children, and the TDIL doc, that have the WA (U+0B71) in that second place in the "alphabet". All of these things point to something in addition to BA that several describe as "wa" and seem to use as the component in these conjuncts. Yet because the first two of these use the same shape as BA and because M.E. tells me it's BA, perhaps that's enough to convince me that's the right thing to do... On the other hand, maybe it seems less than completely settled to me. What concerns me most is the teaching materials aimed at schoolchildren. However recent an innovation it might be, one gets the impression that kids are learning WA as part of their 'alphabet'. And if Oriya speakers grow up with the idea that this is the thing that forms their conjuncts, then I need to ask whether that's how they're going to expect to be able to encode their documents. >I have just ordered two large Oriya dictionaries which should arrive >in a fortnight. I'd be interested in knowing what you found and where you found them. Peter Constable
RE: Oriya: mba / mwa ?
At 13:17 -0800 2003-11-29, Peter Constable wrote: > I think the TDIL chart is wrong. It seems reasonable that one should need extra persuasion to take the word of an American living in Ireland over Indians. (Sorry.) Peter, I would take those TDIL publications with a very large grain of salt. Textual evidence is not given and there's all sorts of of stuff which really doesn't fit in well with the way we do things in Unicode. Like their *U+0B3A ORIYA INVISIBLE LETTER. Just because it comes from India doesn't mean it's not revisionist. > Traditionally (as in Learn Oriya in 30 Days) subjoined BA is used in > this context although the reading rules say to pronounce it [w]. So, you're saying that all of these should be encoded as C + virama + BA? Yes, I am. KA + BA = KBA pronounced [kwa]. That's what Learn Oriya in 30 days shows explicitly. > Now an original ligature of O and BA has been pressed into service I've seen elsewhere that you've described this as a ligature involving O, but are you sure it's that? Yes, I am. Note that the same shape is used for NYA and NNA (e.g. conjuncts for NN.NNA and SS.NNA). Be thou not deceived by the glyph shapes. The etymology is O + BA => WA, not NYA + BA. >The traditional BA should be used for that unless we have better >evidence than the TDIL newsletter that such should be the practice. I could be convinced of that; but if people in India aren't convinced of that, the boat may not float. WA is an innovation, unattested in earlier Oriya. You won't find it in Learn Oriya in 30 Days, for instance. Yet syllables in -[wa] have been written in Oriya for a long time, with BA. Note that a historical VA exists and predates the WA, and the TDIL does not take this into account. We did encode it however. I have just ordered two large Oriya dictionaries which should arrive in a fortnight. -- Michael Everson * * Everson Typography * * http://www.evertype.com
RE: Oriya: mba / mwa ?
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Behalf Of Michael Everson > I think the TDIL chart is wrong. It seems reasonable that one should need extra persuasion to take the word of an American living in Ireland over Indians. (Sorry.) > Traditionally (as in Learn Oriya in 30 Days) subjoined BA is used in > this context although the reading rules say to pronounce it [w]. So, you're saying that all of these should be encoded as C + virama + BA? > Now an original ligature of O and BA has been pressed into service I've seen elsewhere that you've described this as a ligature involving O, but are you sure it's that? Note that the same shape is used for NYA and NNA (e.g. conjuncts for NN.NNA and SS.NNA). > The traditional BA should be used for that unless we have better > evidence than the TDIL newsletter that such should be the practice. I could be convinced of that; but if people in India aren't convinced of that, the boat may not float. Peter Constable
Re: Oriya: mba / mwa ?
At 21:10 + 2003-11-28, [EMAIL PROTECTED] wrote: . Peter Constable wrote, The question, then, is how "MBA" should be encoded: as <0B2E MA, 0B4D VIRAMA, 0B2C BA >, or as < 0B2E MA, 0B4D VIRAMA, 0B71 WA>? MA + VIRAMA + BA, according to TUS 4.0, page 238. Heh. I wrote that. Well, it just goes to show that my thinking is consistent on this point. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Oriya: mba / mwa ?
At 11:34 -0800 2003-11-28, Peter Constable wrote: A similar issue to the nndda: starting on page 54 of the TDIL newsletter (http://tdil.mit.gov.in/ori-guru-telu.pdf) and continuing onto the next page, they list conjuncts that have BA or WA as the second element. I've shown those from the bottom of p. 54 in the attached image. The shape for the conjoined component is the same for both of these. They describe the first as involving BA, however, while all the others involve WA. The question, then, is how "MBA" should be encoded: as <0B2E MA, 0B4D VIRAMA, 0B2C BA >, or as < 0B2E MA, 0B4D VIRAMA, 0B71 WA>? I think the TDIL chart is wrong. Traditionally (as in Learn Oriya in 30 Days) subjoined BA is used in this context although the reading rules say to pronounce it [w]. Examples from this book: k. + ba (wa) = kwa j. + ba (va) = jva dh. + ba (wa) = dhwa m. + ba = mba r. + ba = rba sh. + ba = shba Now an original ligature of O and BA has been pressed into service as a syllable initial WA for foreign words, and encoded at U+0B71, but I do not think this should be used to form conjuncts in -[wa]. o + ba (wa) = wa (this is not an example in Learn Oriya in 30 Days) The traditional BA should be used for that unless we have better evidence than the TDIL newsletter that such should be the practice. [mba] and an eventual [mwa] would be encoded MBA and the reading rule would be learned. So I don't think that ORIYA LETTER WA has a conjunct form identical with ORIYA LETTER BA in the same way that DDA and TA do. -- Michael Everson * * Everson Typography * * http://www.evertype.com
Re: Oriya: mba / mwa ?
. Peter Constable wrote, > The question, then, is how "MBA" should be encoded: as < > 0B2E MA, 0B4D VIRAMA, 0B2C BA >, or as < 0B2E MA, 0B4D VIRAMA, 0B71 WA > >? > MA + VIRAMA + BA, according to TUS 4.0, page 238. Best regards, James Kass .