jayvdb added subscribers: valhallasw, Andre_Engels.
jayvdb added a comment.

The dictionary duplicate keys come from 
https://phabricator.wikimedia.org/rPWBO91ed5b405f03fe632eca8258d6f2c190ce0b5260 
(2009), however they were mostly implicitly in the code before that.  
Interesting that before that changeset, the first value of the duplicate key 
was used; after it, the last value of the duplicate key was used.

Note there appear to have been (Im 99% sure) some problems in the previous code 
(using `==` instead of `in` for multiple characters), such as

  if char == u"ҼҾ":
      return u"Ts"
  if char == u"ҽҿ":
      return u"ts"
  ...
  if char == u"ҰӸ":
      return u"U"
  if char == u"ұӹ":
      return u"u"

The ұӹ combination above was added by 
https://phabricator.wikimedia.org/rPWBO172007a84109c7b1e61a0b6297b28b95dfcacfb6 
(March 2007), and could have some special meaning, but it has since been lost, 
and means the duplicate key values are identical, and the mapping should be.

"Ұ": "U"
"ұ": "u"

The Җ/җ duplicates also have identical values, and so can be easily solved by 
removing the duplicates, and the mapping will be.

"Җ": "Zhj" 
"җ": "zhj"

The ҿ problem is very interesting.  In 2012, 
https://phabricator.wikimedia.org/rPWBOc1d467139d9435dc8f31c0c3dc7a54816ffaa177 
by @valhallasw introduced the `"ҿ": "ä"` mapping, as a fix for 
https://sourceforge.net/p/pywikipediabot/bugs/1428/ & 
https://sourceforge.net/p/pywikipediabot/support-requests/33/ .  To my 
untrained non-Cyrillic mind, that mapping doesn't look good, and we should 
return to this mapping:

"Ҿ": "Ts"
"ҿ": "ts"

And we could revisit what the problematic `u"ä": u"ä"` should have been.

However, the other duplicate keys have different values, so we need to 
determine which transliteration is desirable. (I havent analysed them yet).

Ӣ/ӣ

u"Ӣ": u"Y", u"ӣ": u"y"
u"Ӣ": u"Ii", u"ӣ": u"ii"

Ӯ/ӯ

u"Ӯ": u"U", u"ӯ": u"u"
u"Ӯ": u"Û", u"ӯ": u"û"


TASK DETAIL
  https://phabricator.wikimedia.org/T115929

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: Andre_Engels, valhallasw, Aklapper, pywikibot-bugs-list, jayvdb, Jay8g



_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to