ID:               23946
 User updated by:  stuge-phpbugs at cdy dot org
 Reported By:      stuge-phpbugs at cdy dot org
 Status:           Verified
 Bug Type:         *Regular Expressions
 Operating System: Linux
 PHP Version:      4.3.2
 New Comment:

$ echo xyz|perl -e 'while(<STDIN>) { $_=~ s/z?$/a/;print}'
xya
$ echo xyz|perl -e 'while(<STDIN>) { $_=~ s/z?$/a/g;print}'
xyaa
a$ echo xyz|sed -e 's/z\?$/a/g'
xya
$ echo xyz|sed -e 's/z\?$/a/'
xya

perl and sed both behave the way I would expect.
[ep]reg_replace() don't. :)

A comment from php.net/manual/en/pcre.pattern.modifiers.php:
/g is default in PHP. You don't need to set it.

preg_replace() similarity with perl may not be desired, I only included
it because it showed the exact same behaviour while using a different
regex code base, indicating the same algorithm.

I believe Andrei's comment in 23903 is a bit quick, at least if this is
a dupe, which I would say it is.

Again, I believe the error to be that the regexp is "ran" one extra
time after all of the input string has already been matched and
processed inside [ep]reg_replace().
Because of this, any regexp that matches the empty string will cause
one extra replacement to occur at the end of the input string.

At least inside ereg_replace() the loop should just quit if the entire
string has been processed after the first iteration.

I'll try my patch and recomment.


Previous Comments:
------------------------------------------------------------------------

[2003-06-03 12:59:57] [EMAIL PROTECTED]

Verified with 4.3.2 (and 4.3.1 and 4.2.3). Similar to #23903, but here
I see why you want to use [ep]reg_replace.

------------------------------------------------------------------------

[2003-06-01 22:38:09] stuge-phpbugs at cdy dot org

Verified with 4.3.0 and 4.3.2rc2 but haven't tried 4.3.2. NEWS for
4.3.2 do not mention any *reg_replace changes. Reproducible with
eregi_replace().

<?php
  $f="xyz";
  $g=ereg_replace("z?$","a",$f);
  $h=preg_replace("/z?$/","a",$f);
  print $f; /* xyz */
  print " ".$g; /* xyaa, I want xya */
  print " ".$h; /* xyaa, I want xya */
?>

At least in CVS php4/ext/standard/reg.c I believe the problem is that
regexec() on line 306 will be called a second time after the first
iteration has already matched to the end of the string.

Matching from the end of the string should only be allowed on the very
first iteration.

If this analysis is correct, the following patch might work, but I
haven't tried it.

--- reg.c.org   2003-06-02 05:23:51.000000000 +0200
+++ reg.c       2003-06-02 05:26:06.000000000 +0200
@@ -302,7 +302,7 @@
 
        err = pos = 0;
        buf[0] = '\0';
-       while (!err) {
+       do {
                err = regexec(&re, &string[pos], re.re_nsub+1, subs,
(pos ? REG_NOTBOL : 0));
 
                if (err && err != REG_NOMATCH) {
@@ -396,7 +396,7 @@
                        /* stick that last bit of string on our output
*/
                        strcat(buf, &string[pos]);
                }
-       }
+       } while(!err && string[pos]);
 
        /* don't want to leak memory .. */
        efree(subs);


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=23946&edit=1

Reply via email to