jayvdb added subscribers: valhallasw, Andre_Engels. jayvdb added a comment.
The dictionary duplicate keys come from https://phabricator.wikimedia.org/rPWBO91ed5b405f03fe632eca8258d6f2c190ce0b5260 (2009), however they were mostly implicitly in the code before that. Interesting that before that changeset, the first value of the duplicate key was used; after it, the last value of the duplicate key was used. Note there appear to have been (Im 99% sure) some problems in the previous code (using `==` instead of `in` for multiple characters), such as if char == u"ҼҾ": return u"Ts" if char == u"ҽҿ": return u"ts" ... if char == u"ҰӸ": return u"U" if char == u"ұӹ": return u"u" The ұӹ combination above was added by https://phabricator.wikimedia.org/rPWBO172007a84109c7b1e61a0b6297b28b95dfcacfb6 (March 2007), and could have some special meaning, but it has since been lost, and means the duplicate key values are identical, and the mapping should be. "Ұ": "U" "ұ": "u" The Җ/җ duplicates also have identical values, and so can be easily solved by removing the duplicates, and the mapping will be. "Җ": "Zhj" "җ": "zhj" The ҿ problem is very interesting. In 2012, https://phabricator.wikimedia.org/rPWBOc1d467139d9435dc8f31c0c3dc7a54816ffaa177 by @valhallasw introduced the `"ҿ": "ä"` mapping, as a fix for https://sourceforge.net/p/pywikipediabot/bugs/1428/ & https://sourceforge.net/p/pywikipediabot/support-requests/33/ . To my untrained non-Cyrillic mind, that mapping doesn't look good, and we should return to this mapping: "Ҿ": "Ts" "ҿ": "ts" And we could revisit what the problematic `u"ä": u"ä"` should have been. However, the other duplicate keys have different values, so we need to determine which transliteration is desirable. (I havent analysed them yet). Ӣ/ӣ u"Ӣ": u"Y", u"ӣ": u"y" u"Ӣ": u"Ii", u"ӣ": u"ii" Ӯ/ӯ u"Ӯ": u"U", u"ӯ": u"u" u"Ӯ": u"Û", u"ӯ": u"û" TASK DETAIL https://phabricator.wikimedia.org/T115929 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jayvdb Cc: Andre_Engels, valhallasw, Aklapper, pywikibot-bugs-list, jayvdb, Jay8g _______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
