jayvdb added a comment.
Python issue 10254 occurs within `Link.__init__` where it calls `t =
unicodedata.normalize('NFC', t)` , which dates back to 2006 with
https://phabricator.wikimedia.org/rPWBOed5e739587b0bbc8374ff573edff7d5cdb6c7e3a,
when MediaWiki was version 1.5.
As RHEL Python 2.6.6 does not include the fix for `unicodedata.normalize`, yet
our RHEL users are not complaining, this bug is obviously not affecting normal
use. This is most likely because:
1. they are not using languages which have this bug, and/or
2. they are using MediaWiki versions which the API provides titles which do not
need to be normalised.
We can feature detect one or both of those, to prevent this bug from occurring.
It could be that MediaWiki versions 1.14+ do not need `unicodedata.normalize`
at all , which means we can simply remove this line.
http://bugs.python.org/issue10254 refers to three example strings that cause
the problem:
1. u'Li\u030dt-s\u1e73\u0301' = Li̍t-sṳ́
<https://hak.wikipedia.org/wiki/Li%CC%8Dt-s%E1%B9%B3%CC%81>
2. u'\u092e\u093e\u0930\u094d\u0915
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917' = मार्क
ज़ुकेरबर्ग
<https://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A4%BE%E0%A4%B0%E0%A5%8D%E0%A4%95_%E0%A4%9C%E0%A4%BC%E0%A5%81%E0%A4%95%E0%A5%87%E0%A4%B0%E0%A4%AC%E0%A4%B0%E0%A5%8D%E0%A4%97>
3.
u'\u0915\u093f\u0930\u094d\u0917\u093f\u091c\u093c\u0938\u094d\u0924\u093e\u0928'
= किर्गिज़स्तान
<https://hi.wikipedia.org/wiki/%E0%A4%95%E0%A4%BF%E0%A4%B0%E0%A5%8D%E0%A4%97%E0%A4%BF%E0%A4%9C%E0%A4%BC%E0%A4%B8%E0%A5%8D%E0%A4%A4%E0%A4%BE%E0%A4%A8>
(api langlinks from en.wp
<https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Kyrgyzstan&lllang=hi>)
Python issue 10254 is entirely about strings which `unicodedata.normalize`
should return **unmodified**. i.e. no normalisation is necessary, but it
returns an incorrectly normalised string, or crashes on 2.7.1!
TASK DETAIL
https://phabricator.wikimedia.org/T102461
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: jayvdb
Cc: Ricordisamoa, gerritbot, Aklapper, jayvdb, pywikibot-bugs-list, Anshoe,
Malyacko, P.Copp
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs