https://bugzilla.wikimedia.org/show_bug.cgi?id=28427
Bug #: 28427
Summary: rewrite quickIsNFCVerify() to use preg_match() with an
offset to accommodate larger files
Product: MediaWiki
Version: 1.18-svn
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: Normal
Component: General/Unknown
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
Broken out from bug 28146, which started with a narrower focus which was solved
by a narrower fix.
Per notes & patches on that bug, the preg_match_all() in
UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII
strings such as one finds in languages using Latin scripts with accented or
other non-ASCII letters.
This results in hitting memory limits on largeish input strings, much sooner
than we really ought to.
Rewriting the function so that it works through the string in chunks as it's
splitting should avoid that huge memory bump, but my initial tests were too
slow using preg_match and an offset, and still slowish using
preg_replace_callback.
includes/normal/UtfNormalMemStress.php can be used to stress-test this.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l