ID:               49036
 Updated by:       [email protected]
 Reported By:      gvdefence-ncr at yahoo dot it
-Status:           Feedback
+Status:           Bogus
 Bug Type:         PCRE related
 Operating System: WXP
 PHP Version:      5.2.10
 New Comment:

http://uk.php.net/manual/en/regexp.reference.backslash.php clearly
explains it:

\w
    any "word" character
\W
    any "non-word" character

Each pair of escape sequences partitions the complete set of characters
into two disjoint sets. Any given character matches one, and only one,
of each pair.

A "word" character is any letter or digit or the underscore character,
that is, any character which can be part of a Perl "word". The
definition of letters and digits is controlled by PCRE's character
tables, and may vary if locale-specific matching is taking place. For
example, in the "fr" (French) locale, some character codes greater than
128 are used for accented letters, and these are matched by \w. 


Previous Comments:
------------------------------------------------------------------------

[2009-07-23 17:59:28] gvdefence-ncr at yahoo dot it

What's locale?

[\W] is identical to [^A-Za-z0-9_] is not only Microsoft idea.

\W means matching any nonword character is the same of [^\w] which is
the same of [^A-Za-z0-9_]
[http://en.wikipedia.org/wiki/Regular_expression#POSIX_character_classes]

Also many other website that talks about reg exp say the same thing of
Microsoft and wikipedia, you can search in Google.

Sorry, but this is a bug!


BTW: php manual is completly useless regarding regular expression
sintax, it does not help in any way, that's why I added the Microsoft
documentation link.

------------------------------------------------------------------------

[2009-07-23 17:41:11] [email protected]

pcre is locale aware so there are some exceptions. What locale are you
using?

Also we use PCRE which is not the Microsoft regexp syntax, I suggest
you read the PHP manual instead.

------------------------------------------------------------------------

[2009-07-23 17:34:25] gvdefence-ncr at yahoo dot it

Description:
------------
According to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can see:
http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)

But preg_replace does not seem to work the same way.


Reproduce code:
---------------
<?php
   //according to regexp sintax [\W] is identical to [^A-Za-z0-9_] (can
see: http://msdn.microsoft.com/en-us/library/1400241x(VS.85).aspx)
   
   $result1 = preg_replace('/[^A-Za-z0-9_]*/', '', "test àèìòù test");
   $result2 = preg_replace('/[\W]*/', '', "test àèìòù test");
      
   echo "<pre>" . $result1 . "</pre>"; //ok, it shows: "testtest"
   echo "<pre>" . $result2 . "</pre>" //wrong it shows:
"testàèìòùtest"
?>

Expected result:
----------------
testtest

testtest

Actual result:
--------------
testtest

testàèìòùtest


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=49036&edit=1

Reply via email to