Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-15 Thread Richard Quadling
On 14 March 2011 20:36, Hannes Landeholm landeh...@gmail.com wrote:
 What is more likely to be wrong? Your understanding of a specific
 regex pattern (which happens to be full of escapes making it
 incredibly hard to read) or the implementation of preg_replace?

 ~Hannes

 On 14 March 2011 16:18, Martin Scotta martinsco...@gmail.com wrote:

 I chose the simplest example to show the preg_replace behavior, there are
 better (and safer) ways to scape slash characters.
 Anyways, *is this the expected preg_replace behavior?*

  Martin

 ?php
 function test($str) {
    static $re = '/(^|[^])\'/';
    static $change = '$1\\\'';

    echo $str, PHP_EOL,
        preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
 }

 test(str '' str); // bug?
 test(str \\'\\' str); // ok
 test('str'); // ok
 test(\'str\'); // ok

 
 Expected:

 str '' str
 str \'\' str

 str \'\' str
 str \'\' str

 'str'
 \'str\'

 \'str\'
 \'str\'

 
 Result:

 str '' str
 str \'' str

 str \'\' str
 str \'\' str

 'str'
 \'str\'

 \'str\'
 \'str\'


  Martin Scotta

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php



Did no one see why the regex was wrong?

RegexBuddy (a windows app) explains regexes VERY VERY well.


-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-15 Thread Richard Quadling
On 15 March 2011 10:32, Richard Quadling rquadl...@gmail.com wrote:
 On 14 March 2011 20:36, Hannes Landeholm landeh...@gmail.com wrote:
 What is more likely to be wrong? Your understanding of a specific
 regex pattern (which happens to be full of escapes making it
 incredibly hard to read) or the implementation of preg_replace?

 ~Hannes

 On 14 March 2011 16:18, Martin Scotta martinsco...@gmail.com wrote:

 I chose the simplest example to show the preg_replace behavior, there are
 better (and safer) ways to scape slash characters.
 Anyways, *is this the expected preg_replace behavior?*

  Martin

 ?php
 function test($str) {
    static $re = '/(^|[^])\'/';
    static $change = '$1\\\'';

    echo $str, PHP_EOL,
        preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
 }

 test(str '' str); // bug?
 test(str \\'\\' str); // ok
 test('str'); // ok
 test(\'str\'); // ok

 
 Expected:

 str '' str
 str \'\' str

 str \'\' str
 str \'\' str

 'str'
 \'str\'

 \'str\'
 \'str\'

 
 Result:

 str '' str
 str \'' str

 str \'\' str
 str \'\' str

 'str'
 \'str\'

 \'str\'
 \'str\'


  Martin Scotta

 --
 PHP Internals - PHP Runtime Development Mailing List
 To unsubscribe, visit: http://www.php.net/unsub.php



 Did no one see why the regex was wrong?

 RegexBuddy (a windows app) explains regexes VERY VERY well.

The important bit (where the problem lies with regard to the regex) is ...

Match a single character NOT present in the list below «[^]»
A \ character «\\»
A \ character «\\»


The issue is the word _single_.




-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-15 Thread Ben Schmidt

static $re = '/(^|[^])\'/';


Did no one see why the regex was wrong?


I saw what the regex was. I didn't think like you that it was 'wrong'.

Once you unescape the characters in the PHP single-quoted string above
(where two backslashes count as one, and backslash-quote counts as a
quote), the actual pattern that reaches the preg_replace function is:

   /(^|[^\\])'/


RegexBuddy (a windows app) explains regexes VERY VERY well.


What kind of patterns? Does it support PCRE ones?


The important bit (where the problem lies with regard to the regex) is
...

Match a single character NOT present in the list below «[^]»
 A \ character «\\»
 A \ character «\\»


This is not the case.

1. As above, the pattern reaching preg_replace is /(^|[^\\])'/

2. PCRE, unlike many other regular expression implementations, allows
backslash-escaping inside character classes (square brackets). So the
doubled backslash only actually counts as a single backslash character
to be excluded from the set of characters the atom will match.

There is no error here. (And even if there were two backslashes being
excluded, of course, it wouldn't hurt anything or change the meaning of
the pattern.)


The issue is the word _single_.


I don't think anybody thought otherwise.

The problem was that, to a casual observer, the pattern seems to mean a
quote which doesn't already have a backslash before it. I believe this
was its intent. (And the replacement added the 'missing' backslash.)

But the pattern doesn't mean that. It actually means a character which
isn't a backslash, followed by a quote. This is subtly different.

And it's most noticeable when two quotes follow each other in the
subject string. In

   str''str

first the pattern matches r' (non-backslash followed by quote), and
then it keeps searching from that point, i.e. it searches 'str. Since
this isn't the beginning of the string, and there is no quote following
a non-backslash character, there are no further matches.

Now, here is a pattern which actually means a quote which doesn't
already have a backslash before it which is achieved by means of a
lookbehind assertion, which, even when searching the string after the
first match, 'str, still 'looks back' on the earlier part of the
string to recognise the second quote is not preceded by a backslash and
match a second time:

   /(^|(?!\\))'/

As a PHP single-quoted string this is:

   '/(^|(?!))\'/'

Hope this helps,

Ben.




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-15 Thread Ben Schmidt

Now, here is a pattern which actually means a quote which doesn't
already have a backslash before it which is achieved by means of a
lookbehind assertion, which, even when searching the string after the
first match, 'str, still 'looks back' on the earlier part of the
string to recognise the second quote is not preceded by a backslash and
match a second time:

/(^|(?!\\))'/

As a PHP single-quoted string this is:

'/(^|(?!))\'/'


And I should mention, as Martin did, that this actually isn't a good
idea. There are better/safer ways to escape quotes. In particular,
consider how this subject string

   str\\'; delete from users;

will not have the quote escaped, because it is preceded by *two*
backslashes. To match more carefully, you have to be careful to 'eat
backslashes in pairs'. Someone gave a pattern that attempted to do
something like that in an earlier post, too.

Ben.




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-15 Thread Dave Ingram
On 03/15/11 12:41, Ben Schmidt wrote:
 [snip]

 Hope this helps,

 Ben.

As an outsider in this discussion, I'd just like to applaud you for one
of the best, in-depth, most patient and most thorough explanations I
have ever seen on a mailing list.


Dave

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-15 Thread Richard Quadling
On 15 March 2011 12:41, Ben Schmidt mail_ben_schm...@yahoo.com.au wrote:
    static $re = '/(^|[^])\'/';

 Did no one see why the regex was wrong?

 I saw what the regex was. I didn't think like you that it was 'wrong'.

 Once you unescape the characters in the PHP single-quoted string above
 (where two backslashes count as one, and backslash-quote counts as a
 quote), the actual pattern that reaches the preg_replace function is:

   /(^|[^\\])'/

 RegexBuddy (a windows app) explains regexes VERY VERY well.

 What kind of patterns? Does it support PCRE ones?


Yep and MANY other flavours (C#,  C++,  Dephi, Groovy, Java,
Javascript, MySQL, ...)

 The important bit (where the problem lies with regard to the regex) is
 ...

 Match a single character NOT present in the list below «[^]»
         A \ character «\\»
         A \ character «\\»

 This is not the case.

 1. As above, the pattern reaching preg_replace is /(^|[^\\])'/

 2. PCRE, unlike many other regular expression implementations, allows
 backslash-escaping inside character classes (square brackets). So the
 doubled backslash only actually counts as a single backslash character
 to be excluded from the set of characters the atom will match.

 There is no error here. (And even if there were two backslashes being
 excluded, of course, it wouldn't hurt anything or change the meaning of
 the pattern.)

 The issue is the word _single_.

 I don't think anybody thought otherwise.

 The problem was that, to a casual observer, the pattern seems to mean a
 quote which doesn't already have a backslash before it. I believe this
 was its intent. (And the replacement added the 'missing' backslash.)

 But the pattern doesn't mean that. It actually means a character which
 isn't a backslash, followed by a quote. This is subtly different.

 And it's most noticeable when two quotes follow each other in the
 subject string. In

   str''str

 first the pattern matches r' (non-backslash followed by quote), and
 then it keeps searching from that point, i.e. it searches 'str. Since
 this isn't the beginning of the string, and there is no quote following
 a non-backslash character, there are no further matches.

 Now, here is a pattern which actually means a quote which doesn't
 already have a backslash before it which is achieved by means of a
 lookbehind assertion, which, even when searching the string after the
 first match, 'str, still 'looks back' on the earlier part of the
 string to recognise the second quote is not preceded by a backslash and
 match a second time:

   /(^|(?!\\))'/

 As a PHP single-quoted string this is:

   '/(^|(?!))\'/'

 Hope this helps,

 Ben.





If I say ...

?php
echo  '/(^|[^])\'/';
?

I get ...

/(^|[^\\])'/


which is explained as ...



(^|[^\\])'

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into
backreference number 1 «(^|[^\\])»
   Match either the regular expression below (attempting the next
alternative only if this one fails) «^»
  Assert position at the beginning of a line (at beginning of the
string or after a line break character) «^»
   Or match regular expression number 2 below (the entire group fails
if this one fails to match) «[^\\]»
  Match any character that is NOT a \ character «[^\\]»
Match the character “'” literally «'»

And that certainly makes a LOT more sense.

Decoding regexes and handling the escaping needed for the language is
a real headache sometimes.

Just imagine creating regex code for use by client side Javascript using PHP.

8 \ in a row for a single \ wouldn't be impossible.

Sorry for the confusion.


-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-14 Thread Richard Quadling
On 14 March 2011 15:18, Martin Scotta martinsco...@gmail.com wrote:
 ?php
 function test($str) {
    static $re = '/(^|[^])\'/';
    static $change = '$1\\\'';

    echo $str, PHP_EOL,
        preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
 }

 test(str '' str); // bug?
 test(str \\'\\' str); // ok
 test('str'); // ok
 test(\'str\'); // ok


Your regex is ...



(^|[^])'

Options: case insensitive; ^ and $ match at line breaks

Match the regular expression below and capture its match into
backreference number 1 «(^|[^])»
   Match either the regular expression below (attempting the next
alternative only if this one fails) «^»
  Assert position at the beginning of a line (at beginning of the
string or after a line break character) «^»
   Or match regular expression number 2 below (the entire group fails
if this one fails to match) «[^]»
  Match a single character NOT present in the list below «[^]»
 A \ character «\\»
 A \ character «\\»
Match the character “'” literally «'»



I think [^] is wrong and you want it to be ...

(?!)

or

(?!\\{2})


With that, the output is ...

str '' str
str \'\' str

str \'\' str
str \\'\\' str

'str'
\'str\'

\'str\'
\\'str\\'



-- 
Richard Quadling
Twitter : EE : Zend
@RQuadling : e-e.com/M_248814.html : bit.ly/9O8vFY

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-14 Thread Ben Schmidt

On 15/03/11 2:18 AM, Martin Scotta wrote:

I chose the simplest example to show the preg_replace behavior,


You've GOT to be kidding. The SIMPLEST?!

How about an example that doesn't require escaping ALL the interesting
characters involved?

Here's a modified version that I think it quite a bit simpler:

?php
function test($str) {
   static $re = '/(^|[^a])b/';
   static $change = '$1ab';

   echo $str, PHP_EOL; // input
   echo preg_replace($re, $change, $str), PHP_EOL, PHP_EOL; // output
}

test(str bb str); // bug?
test(str abab str); // ok
test(b str b); // ok
test(ab str ab); // ok
?

The way I interpret it, it should put an 'a' before every 'b' that is
not already preceded by an 'a'.

But the buggy case gives 'str abb str' rather than the expected
'str abab str'.

It does look like a bug to me.

Ben.




there are
better (and safer) ways to scape slash characters.
Anyways, *is this the expected preg_replace behavior?*

  Martin

?php
function test($str) {
 static $re = '/(^|[^])\'/';
 static $change = '$1\\\'';

 echo $str, PHP_EOL,
 preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
}

test(str '' str); // bug?
test(str \\'\\' str); // ok
test('str'); // ok
test(\'str\'); // ok


Expected:

str '' str
str \'\' str

str \'\' str
str \'\' str

'str'
\'str\'

\'str\'
\'str\'


Result:

str '' str
str \'' str

str \'\' str
str \'\' str

'str'
\'str\'

\'str\'
\'str\'


  Martin Scotta



--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-14 Thread Ben Schmidt

On 15/03/11 5:38 AM, Ben Schmidt wrote:

On 15/03/11 2:18 AM, Martin Scotta wrote:

I chose the simplest example to show the preg_replace behavior,


You've GOT to be kidding. The SIMPLEST?!

How about an example that doesn't require escaping ALL the interesting
characters involved?

Here's a modified version that I think it quite a bit simpler:

?php
function test($str) {
static $re = '/(^|[^a])b/';
static $change = '$1ab';

echo $str, PHP_EOL; // input
echo preg_replace($re, $change, $str), PHP_EOL, PHP_EOL; // output
}

test(str bb str); // bug?
test(str abab str); // ok
test(b str b); // ok
test(ab str ab); // ok
?

The way I interpret it, it should put an 'a' before every 'b' that is
not already preceded by an 'a'.

But the buggy case gives 'str abb str' rather than the expected
'str abab str'.

It does look like a bug to me.


Actually, no it doesn't.

The behaviour is correct.

Matches cannot overlap. Since the character preceding 'b' is part of the
match, there is only one match in the string 'str bb str'. The match is
' b'. After that match, the

You actually want an assertion. I think this:

static $re = '/(^|(?!a))b/';

Ben.




--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DEV] preg_replace does not replace all occurrences

2011-03-14 Thread Hannes Landeholm
What is more likely to be wrong? Your understanding of a specific
regex pattern (which happens to be full of escapes making it
incredibly hard to read) or the implementation of preg_replace?

~Hannes

On 14 March 2011 16:18, Martin Scotta martinsco...@gmail.com wrote:

 I chose the simplest example to show the preg_replace behavior, there are
 better (and safer) ways to scape slash characters.
 Anyways, *is this the expected preg_replace behavior?*

  Martin

 ?php
 function test($str) {
    static $re = '/(^|[^])\'/';
    static $change = '$1\\\'';

    echo $str, PHP_EOL,
        preg_replace($re, $change, $str), PHP_EOL, PHP_EOL;
 }

 test(str '' str); // bug?
 test(str \\'\\' str); // ok
 test('str'); // ok
 test(\'str\'); // ok

 
 Expected:

 str '' str
 str \'\' str

 str \'\' str
 str \'\' str

 'str'
 \'str\'

 \'str\'
 \'str\'

 
 Result:

 str '' str
 str \'' str

 str \'\' str
 str \'\' str

 'str'
 \'str\'

 \'str\'
 \'str\'


  Martin Scotta

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php