https://bugzilla.wikimedia.org/show_bug.cgi?id=28146
--- Comment #7 from Brion Vibber <[email protected]> 2011-04-01 21:13:06 UTC --- Created attachment 8362 --> https://bugzilla.wikimedia.org/attachment.cgi?id=8362 Work in progress test patch (requires PHP 5.3) I did a quick try serially running preg_match, bumping the offset, and found it to be too slow to the point of running for at least several minutes on a large German test set (never reached completion). Redoing it to use preg_replace_callback() and bumping the loop into an anonymous function for convenience works, but still with a major performance regression for the German test set. (from 14 MB/sec to 0.5 MB/sec) Russian, Japanese, and Korean are slowed down much less, from about 2.2 MB/sec to about 1.9 MB/sec. This is likely due to splitting up the ASCII and non-ASCII sections being much more expensive for German, which like most European languages mixes ASCII and non-ASCII Latin characters together. The other scripts are mostly large non-ASCII blocks, so there are fewer pieces to split apart. Per-loop overhead seems to be a lot higher with preg_replace (and much more so with serial preg_match()) than the preg_match_all() + foreach... but the giant array will also be super inefficient for European languages because many of the chunks will be very very short strings, which probably contributes to running out of memory. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
