| whym added a comment. |
The source data is extracted from http://www.unicode.org/Public/6.0.0/ucd/UnicodeData.txt.
I think it is reasonable to ask the API to expose whether $wgFixArabicUnicode is true or not (that is, to normalize or not).
The normalization procedure can be replicated in Python using the source data. MediaWiki selectively applies normalization, as shown in the PHP code of ./maintenance/language/generateNormalizerDataAr.php:
if ( ( $code >= 0xFB50 && $code <= 0xFDFF ) # Arabic presentation forms A || ( $code >= 0xFE70 && $code <= 0xFEFF ) # Arabic presentation forms B
As for an existing package to do the same - I'm not sure. We'd probably want something like a language-specific version of unicodedata.
TASK DETAIL
EMAIL PREFERENCES
To: whym
Cc: whym, Dalba, Ladsgroup, jayvdb, Xqt, StudiesWorld, pywikibot-bugs-list, Aklapper, XZise, Mdupont
Cc: whym, Dalba, Ladsgroup, jayvdb, Xqt, StudiesWorld, pywikibot-bugs-list, Aklapper, XZise, Mdupont
_______________________________________________ pywikibot-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs
