https://bugzilla.wikimedia.org/show_bug.cgi?id=28427

             Bug #: 28427
           Summary: rewrite quickIsNFCVerify() to use preg_match() with an
                    offset to accommodate larger files
           Product: MediaWiki
           Version: 1.18-svn
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: General/Unknown
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


Broken out from bug 28146, which started with a narrower focus which was solved
by a narrower fix.

Per notes & patches on that bug, the preg_match_all() in
UtfNormal::quickIsNFCVerify uses a lot of memory for mixed ASCII/non-ASCII
strings such as one finds in languages using Latin scripts with accented or
other non-ASCII letters.

This results in hitting memory limits on largeish input strings, much sooner
than we really ought to.

Rewriting the function so that it works through the string in chunks as it's
splitting should avoid that huge memory bump, but my initial tests were too
slow using preg_match and an offset, and still slowish using
preg_replace_callback.

includes/normal/UtfNormalMemStress.php can be used to stress-test this.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to