jayvdb added a comment. In https://phabricator.wikimedia.org/T115929#1923905, @valhallasw wrote:
> > The ұӹ combination above was added by > > https://phabricator.wikimedia.org/rPWBO172007a84109c7b1e61a0b6297b28b95dfcacfb6 > > (March 2007), and could have some special meaning, but it has since been > > lost, and means the duplicate key values are identical, and the mapping > > should be. > > > > "Ұ": "U" > > "ұ": "u" > > that would make ұӹ show up as uu rather than u (because the Ұ and ӹ get > transliterated independently) That is what is happening currently. >>> from pywikibot import config >>> config.transliterate = True >>> from pywikibot.userinterfaces.terminal_interface_unix import UnixUI >>> ui = UnixUI() >>> ui.transliteration_target = 'ascii' >>> ui.output(u'ұӹ\n') uu ... >>> from pywikibot.userinterfaces.transliteration import transliterator >>> t = transliterator() >>> t = transliterator('ascii') >>> t.transliterate(u'ӹ', prev=u'ұ') u'u' We would need to add some voodoo in `transliterator.transliterate` to detect this sequence and only emit a single `u`, if that is desirable. However I suspect that `ұӹ` is a combination that should be reduced to a single `u`, as `ұӹ` has only 45 hits on Google, and five of those are this pywiki file, and many others look like SEO attempts. TASK DETAIL https://phabricator.wikimedia.org/T115929 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jayvdb Cc: Andre_Engels, valhallasw, Aklapper, pywikibot-bugs-list, jayvdb _______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
