whym added a comment.

The source data is extracted from http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt.

I think it is reasonable to ask the API to expose whether $wgFixArabicUnicode is true or not (that is, to normalize or not).

The normalization procedure can be replicated in Python using the source data. MediaWiki selectively applies normalization, as shown in the PHP code of ./maintenance/language/generateNormalizerDataAr.php:

if ( ( $code >= 0xFB50 && $code <= 0xFDFF ) # Arabic presentation forms A
   || ( $code >= 0xFE70 && $code <= 0xFEFF ) # Arabic presentation forms B

As for an existing package to do the same - I'm not sure. We'd probably want something like a language-specific version of unicodedata.


TASK DETAIL
https://phabricator.wikimedia.org/T94826

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: whym
Cc: whym, Dalba, Ladsgroup, jayvdb, Xqt, StudiesWorld, pywikibot-bugs-list, Aklapper, XZise, Mdupont
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to