[Pywikipedia-bugs] [Maniphest] [Commented On] T94826: Data returned for another page

whym Sun, 12 Jun 2016 03:16:57 -0700

whym added a comment.

The source data is extracted from http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt.

I think it is reasonable to ask the API to expose whether $wgFixArabicUnicode is true or not (that is, to normalize or not).

The normalization procedure can be replicated in Python using the source data. MediaWiki selectively applies normalization, as shown in the PHP code of ./maintenance/language/generateNormalizerDataAr.php:

if ( ( $code >= 0xFB50 && $code <= 0xFDFF ) # Arabic presentation forms A
   || ( $code >= 0xFE70 && $code <= 0xFEFF ) # Arabic presentation forms B


As for an existing package to do the same - I'm not sure. We'd probably want something like a language-specific version of unicodedata.

TASK DETAIL

https://phabricator.wikimedia.org/T94826

EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: whym
Cc: whym, Dalba, Ladsgroup, jayvdb, Xqt, StudiesWorld, pywikibot-bugs-list, Aklapper, XZise, Mdupont

_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

[Pywikipedia-bugs] [Maniphest] [Commented On] T94826: Data returned for another page

Reply via email to