jayvdb added a comment.

Python issue 10254 occurs within `Link.__init__` where it calls `t = 
unicodedata.normalize('NFC', t)` , which dates back to 2006 with 
https://phabricator.wikimedia.org/rPWBOed5e739587b0bbc8374ff573edff7d5cdb6c7e3a,
 when MediaWiki was version 1.5.

As RHEL Python 2.6.6 does not include the fix for `unicodedata.normalize`, yet 
our RHEL users are not complaining, this bug is obviously not affecting normal 
use.  This is most likely because:

1. they are not using languages which have this bug, and/or
2. they are using MediaWiki versions which the API provides titles which do not 
need to be normalised.

We can feature detect one or both of those, to prevent this bug from occurring.

It could be that MediaWiki versions 1.14+ do not need `unicodedata.normalize` 
at all , which means we can simply remove this line.

http://bugs.python.org/issue10254 refers to three example strings that cause 
the problem:

1. u'Li\u030dt-s\u1e73\u0301' = Li̍t-sṳ́ 
<https://hak.wikipedia.org/wiki/Li%CC%8Dt-s%E1%B9%B3%CC%81>
2. u'\u092e\u093e\u0930\u094d\u0915 
\u091c\u093c\u0941\u0915\u0947\u0930\u092c\u0930\u094d\u0917' = मार्क 
ज़ुकेरबर्ग 
<https://hi.wikipedia.org/wiki/%E0%A4%AE%E0%A4%BE%E0%A4%B0%E0%A5%8D%E0%A4%95_%E0%A4%9C%E0%A4%BC%E0%A5%81%E0%A4%95%E0%A5%87%E0%A4%B0%E0%A4%AC%E0%A4%B0%E0%A5%8D%E0%A4%97>
3. 
u'\u0915\u093f\u0930\u094d\u0917\u093f\u091c\u093c\u0938\u094d\u0924\u093e\u0928'
 = किर्गिज़स्तान 
<https://hi.wikipedia.org/wiki/%E0%A4%95%E0%A4%BF%E0%A4%B0%E0%A5%8D%E0%A4%97%E0%A4%BF%E0%A4%9C%E0%A4%BC%E0%A4%B8%E0%A5%8D%E0%A4%A4%E0%A4%BE%E0%A4%A8>
 (api langlinks from en.wp 
<https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&titles=Kyrgyzstan&lllang=hi>)

Python issue 10254 is entirely about strings which `unicodedata.normalize` 
should return **unmodified**.  i.e. no normalisation is necessary, but it 
returns an incorrectly normalised string, or crashes on 2.7.1!


TASK DETAIL
  https://phabricator.wikimedia.org/T102461

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: Ricordisamoa, gerritbot, Aklapper, jayvdb, pywikibot-bugs-list, Anshoe, 
Malyacko, P.Copp



_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to