ID: 49036 Updated by: [email protected] Reported By: gvdefence-ncr at yahoo dot it -Status: Feedback +Status: Bogus Bug Type: PCRE related Operating System: WXP PHP Version: 5.2.10 New Comment:
http://uk.php.net/manual/en/regexp.reference.backslash.php clearly explains it: \w any "word" character \W any "non-word" character Each pair of escape sequences partitions the complete set of characters into two disjoint sets. Any given character matches one, and only one, of each pair. A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w. Previous Comments: ------------------------------------------------------------------------ [2009-07-23 17:59:28] gvdefence-ncr at yahoo dot it What's locale? [\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea. \W means matching any nonword character is the same of [^\w] which is the same of [^A-Za-z0-9_] [http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes] Also many other website that talks about reg exp say the same thing of Microsoft and wikipedia, you can search in Google. Sorry, but this is a bug! BTW: php manual is completly useless regarding regular expression sintax, it does not help in any way, that's why I added the Microsoft documentation link. ------------------------------------------------------------------------ [2009-07-23 17:41:11] [email protected] pcre is locale aware so there are some exceptions. What locale are you using? Also we use PCRE which is not the Microsoft regexp syntax, I suggest you read the PHP manual instead. ------------------------------------------------------------------------ [2009-07-23 17:34:25] gvdefence-ncr at yahoo dot it Description: ------------ According to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx) But preg_replace does not seem to work the same way. Reproduce code: --------------- <?php //according to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx) $result1 = preg_replace('/[^A-Za-z0-9_]*/', '', "test àèìòù test"); $result2 = preg_replace('/[\W]*/', '', "test àèìòù test"); echo "<pre>" . $result1 . "</pre>"; //ok, it shows: "testtest" echo "<pre>" . $result2 . "</pre>" //wrong it shows: "testàèìòùtest" ?> Expected result: ---------------- testtest testtest Actual result: -------------- testtest testàèìòùtest ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=49036&edit=1
