https://bugzilla.wikimedia.org/show_bug.cgi?id=36839
--- Comment #10 from Brad Jorsch <[email protected]> 2012-05-15 16:40:04 UTC --- I think I might have figured this out. In a post on enwiki from May 11,[1] we are told that Roan changed the "PCRE recursion limit" from the default 100k to 1k. I assume this is referring to PHP's "pcre.recursion_limit" setting,[2] which indeed has a default of 100000. One thing the recursion limit affects how often regexes with subexpressions like "(x)+" can match. It seems that each match by "+" there uses up 2 of the recursion limit; with a value of 1024, it can match at most 511 times. If it would match 512 times, preg_match will return false instead. You can test this easily enough if you have a recent-enough command-line PHP: php -r 'ini_set("pcre.recursion_limit", 1024); var_dump(preg_match("/(x)+/", str_repeat("x", 511)));' php -r 'ini_set("pcre.recursion_limit", 1024); var_dump(preg_match("/(x)+/", str_repeat("x", 512)));' The first will succeed, while the second will fail. But if you bump the 1024 to 1026, the second will start working. So what seems to be going on is this: The API uses the methods in WebRequest to get the parameters from the client, all of which seem to come down to getGPCVal. For any parameter that exists in $_GET (even if overridden by $_POST), getGPCVal passes the value through Language::checkTitleEncoding to make sure it's valid UTF-8. And due to the low recursion limit, the regex in Language::checkTitleEncoding that tries to check whether the value is valid UTF-8 will now think it is ''not'' valid if the value is more than 511 characters long, so it will treat it as the fallback 8-bit encoding (windows-1252 for most languages), which gives the familiar "è" mojibake. If I'm right, the fix for this bug would be to revert Roan's change to the "pcre.recursion_limit" setting (and fix whatever PageTriage's problem is in some other way), or at least turn it up to something more reasonable than 1024. I'd expect this is causing problems in other areas of the code, too. [1]: https://en.wikipedia.org/w/index.php?title=Wikipedia_talk:Database_reports&diff=491927371&oldid=491919743 [2]: http://us.php.net/manual/en/pcre.configuration.php#ini.pcre.recursion-limit -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
