[Bug 35990] Schwa syncope rule in devanagari transliteration

bugzilla-daemon Sun, 22 Jul 2012 20:17:03 -0700

https://bugzilla.wikimedia.org/show_bug.cgi?id=35990


--- Comment #6 from Siddhartha Ghai <[email protected]> 2012-07-23 
03:16:55 UTC ---
(In reply to comment #5)
> (In reply to comment #4)
> > All, what should be implemented? Should we still add viram by default and 
> > have
> > consonant (only for 'a') to remove it? Or should we have '~' to be typed 
> > when
> > viram is required?
> 
> To the best of my understanding, the idea of a transliteration mapping is that
> only the actual sounds are typed and the viram should be used as little as
> possible.

Yes, ideally one should only have to write what one speaks. However, mapping
schwa syncope for all cases will be rather a bit of a headache. I'd originally
started this bug for schwa syncope at word-endings since that is the most
problematic. Correcting schwa syncope within words themselves is a problem at
an entirely different level of difficulty. As can be seen on the wikipedia
article [1] (see section "Common transcription and diction errors"), the
problem of syncope within words is much greater than at word-ends. I like the
current system for handling  ् as far as words are concerned. This is because
if we complete the consonants by default (i.e  ् isn't added by default), then
writing a lot of words becomes a problem, since rakshhA would become राक्षा
(i.e r+a will become equivalent to the current r+A). Similarly, wherever the
schwa is pronounced, typing an a in between (as is natural) would produce an
unintuitive ा in between. So although the current handling of schwa syncope
within words is imperfect, it is better than the other option.

However, I do believe we need to find a fix to schwa syncope at word ends.
Words may end with a space, a tab, a newline, dot, comma, semicolon, colon,
single-quote, double-quote, dash, equal sign, plus sign, any kind of braces, a
slash, a vertical pipe, a greater than or less than sign, or any of the other
symbols and numerals availaible on the keyboard. We basically need one rule to
handle a word being terminated in all these cases to default to removing the  ्
. However, the  ् shouldn't be removed if it has been explicitly added (by
pressing ~) before the pressing of any of these keys. The problem lies in being
able to separate the implicitly added  ् and the explicitly added  ् once the
next key is pressed. I'd tried to resolve this in
https://gerrit.wikimedia.org/r/#change,3514 patchset 3 by increasing the
keybuffer to 2 to detect the ~ keystroke. However, I was unsuccessful for some
reason :( and had to undo (see diff [2]) I don't know why that rule didn't
work, but if that can be made to work with some modification/correction, the
only further modification needed would be adding the various possible
word-endings to the rule.

[1] http://en.wikipedia.org/wiki/Schwa_deletion_in_Indo-Aryan_languages
[2]
https://gerrit.wikimedia.org/r/#/c/3514/3..4/resources/ext.narayam.rules.hi.js

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

[Bug 35990] Schwa syncope rule in devanagari transliteration

Reply via email to