jayvdb added a comment.

In https://phabricator.wikimedia.org/T115929#1923905, @valhallasw wrote:

> > The ұӹ combination above was added by 
> > https://phabricator.wikimedia.org/rPWBO172007a84109c7b1e61a0b6297b28b95dfcacfb6
> >  (March 2007), and could have some special meaning, but it has since been 
> > lost, and means the duplicate key values are identical, and the mapping 
> > should be.
> > 
> > "Ұ": "U"
> >  "ұ": "u"
>
> that would make ұӹ show up as uu rather than u (because the Ұ and ӹ get 
> transliterated independently)


That is what is happening currently.

  >>> from pywikibot import config
  >>> config.transliterate = True
  
  >>> from pywikibot.userinterfaces.terminal_interface_unix import UnixUI
  >>> ui = UnixUI()
  >>> ui.transliteration_target = 'ascii'
  >>> ui.output(u'ұӹ\n')
  uu
  ...
  >>> from pywikibot.userinterfaces.transliteration import transliterator
  >>> t = transliterator()
  >>> t = transliterator('ascii')
  >>> t.transliterate(u'ӹ', prev=u'ұ')
  u'u'

We would need to add some voodoo in `transliterator.transliterate` to detect 
this sequence and only emit a single `u`, if that is desirable.

However I suspect that `ұӹ` is a combination that should be reduced to a single 
`u`, as `ұӹ` has only 45 hits on Google, and five of those are this pywiki 
file, and many others look like SEO attempts.


TASK DETAIL
  https://phabricator.wikimedia.org/T115929

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: Andre_Engels, valhallasw, Aklapper, pywikibot-bugs-list, jayvdb



_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to