ID: 49036 User updated by: gvdefence-ncr at yahoo dot it Reported By: gvdefence-ncr at yahoo dot it Status: Bogus Bug Type: PCRE related Operating System: WXP PHP Version: 5.2.10 New Comment:
Ok, I read documentation at http://perldoc.perl.org/perlre.html#Regular-Expressions But this anyway is an issue (almost a bug), on the web environment. Because my regexp would change bahaviour depending upon the locale settings of the server. How do I make sure all servers running my web apllication have the same locale settings. Thanks anyway for the explanation. I'm gonna t add a comment on preg_replace PHP documentation. PHP forever! Previous Comments: ------------------------------------------------------------------------ [2009-07-23 20:05:18] [email protected] We have Perl Compatible Regular Expressions *NOT* POSIX regular expressions. ------------------------------------------------------------------------ [2009-07-23 18:54:58] gvdefence-ncr at yahoo dot it To me this only means that also the PHP documentation is wrong. 1st) there is a paradox: if [\w] (I tested same issue of \W) does the matching depending on local setting then also [A-Za-z_] (which is the same of [\w] should behave in the same way and match also accented character like àèìòù depending on local setting, since it does not happen this last one would be the bug. 2nd) I wonder how to acknowledge all websites on the internet (including Wikipedia) that PHP reg expression sintax is different from the common sense standard of the rest of the world! PS I adore PHP, just trying to help. Bye!) ------------------------------------------------------------------------ [2009-07-23 18:14:04] [email protected] http://uk.php.net/manual/en/regexp.reference.backslash.php clearly explains it: \w any "word" character \W any "non-word" character Each pair of escape sequences partitions the complete set of characters into two disjoint sets. Any given character matches one, and only one, of each pair. A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w. ------------------------------------------------------------------------ [2009-07-23 17:59:28] gvdefence-ncr at yahoo dot it What's locale? [\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea. \W means matching any nonword character is the same of [^\w] which is the same of [^A-Za-z0-9_] [http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes] Also many other website that talks about reg exp say the same thing of Microsoft and wikipedia, you can search in Google. Sorry, but this is a bug! BTW: php manual is completly useless regarding regular expression sintax, it does not help in any way, that's why I added the Microsoft documentation link. ------------------------------------------------------------------------ [2009-07-23 17:41:11] [email protected] pcre is locale aware so there are some exceptions. What locale are you using? Also we use PCRE which is not the Microsoft regexp syntax, I suggest you read the PHP manual instead. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/49036 -- Edit this bug report at http://bugs.php.net/?id=49036&edit=1
