https://bugzilla.wikimedia.org/show_bug.cgi?id=35990
--- Comment #6 from Siddhartha Ghai <[email protected]> 2012-07-23 03:16:55 UTC --- (In reply to comment #5) > (In reply to comment #4) > > All, what should be implemented? Should we still add viram by default and > > have > > consonant (only for 'a') to remove it? Or should we have '~' to be typed > > when > > viram is required? > > To the best of my understanding, the idea of a transliteration mapping is that > only the actual sounds are typed and the viram should be used as little as > possible. Yes, ideally one should only have to write what one speaks. However, mapping schwa syncope for all cases will be rather a bit of a headache. I'd originally started this bug for schwa syncope at word-endings since that is the most problematic. Correcting schwa syncope within words themselves is a problem at an entirely different level of difficulty. As can be seen on the wikipedia article [1] (see section "Common transcription and diction errors"), the problem of syncope within words is much greater than at word-ends. I like the current system for handling ् as far as words are concerned. This is because if we complete the consonants by default (i.e ् isn't added by default), then writing a lot of words becomes a problem, since rakshhA would become राक्षा (i.e r+a will become equivalent to the current r+A). Similarly, wherever the schwa is pronounced, typing an a in between (as is natural) would produce an unintuitive ा in between. So although the current handling of schwa syncope within words is imperfect, it is better than the other option. However, I do believe we need to find a fix to schwa syncope at word ends. Words may end with a space, a tab, a newline, dot, comma, semicolon, colon, single-quote, double-quote, dash, equal sign, plus sign, any kind of braces, a slash, a vertical pipe, a greater than or less than sign, or any of the other symbols and numerals availaible on the keyboard. We basically need one rule to handle a word being terminated in all these cases to default to removing the ् . However, the ् shouldn't be removed if it has been explicitly added (by pressing ~) before the pressing of any of these keys. The problem lies in being able to separate the implicitly added ् and the explicitly added ् once the next key is pressed. I'd tried to resolve this in https://gerrit.wikimedia.org/r/#change,3514 patchset 3 by increasing the keybuffer to 2 to detect the ~ keystroke. However, I was unsuccessful for some reason :( and had to undo (see diff [2]) I don't know why that rule didn't work, but if that can be made to work with some modification/correction, the only further modification needed would be adding the various possible word-endings to the rule. [1] http://en.wikipedia.org/wiki/Schwa_deletion_in_Indo-Aryan_languages [2] https://gerrit.wikimedia.org/r/#/c/3514/3..4/resources/ext.narayam.rules.hi.js -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
