https://bugzilla.wikimedia.org/show_bug.cgi?id=46773
--- Comment #11 from Antoine "hashar" Musso <[email protected]> --- Created attachment 12734 --> https://bugzilla.wikimedia.org/attachment.cgi?id=12734&action=edit PCRE unit tests without and with unicode mode The root cause is that PCRE does not look up unicode characters properties by default and would not recognize word boundaries in various scripts. To make PCRE matches the word boundaries, we need to have PCRE act in unicode mode using the 'u' regex modifiers. That will make PCRE to lookup the character properties in a huge table which might be a bit slow. So that is definitely doable, but we have to look at the performance impact. The change https://gerrit.wikimedia.org/r/71718 adds a lame test in MediaWiki core which shows the problem. $ php phpunit.php --testdox includes/bug46773Test.php PHPUnit 3.7.21 by Sebastian Bergmann. Configuration read from /Users/amusso/projects/mediawiki/core/tests/phpunit/suite.xml bug46773 [ ] Regex boundaries devanagari [x] Regex boundaries devanagari in unicode mode [x] Media wiki test case parent setup called $ (a 'x' denote test is passing). Attached is the --tap output of the test. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
