[PHP-I18N] mb_ereg_replace bug?

Ezra Gilbert Tue, 30 Nov 2004 11:43:20 -0800

Has anyone noticed an issue with mb_ereg_replace when the pattern string
contains a # character?


The following problem is seen with php-4.2.2 with mbstring/mbregex enabled.

In (1) below, ereg_replace has no problem matching a pattern containing a #.
In (2), mb_ereg_replace ignores #52 from the pattern and replaces @ with
test
To fix the problem, we need to escape # to be \# as in (3).  I didn't think
# has special significance in POSIX regex and it worked ok in (1) with
ereg_replace.

1)
$s = 'blah @#52 blah';
print("s: $s \n ");
$s = ereg_replace('@#52','test',$s);
print("s: $s \n ");

s: blah @#52 blah
s: blah test blah

----------------------
2)
$s = 'blah @#52 blah';
print("s mb: $s \n ");
$s = mb_ereg_replace('@#52','test',$s);
print("s mb: $s \n ");

s mb: blah @#52 blah
s mb: blah test#52 blah

----------------------
3)
$s = 'blah @#52 blah';
print("s mb\: $s \n");
$s = mb_ereg_replace('@\#52','test',$s);
print("s mb\: $s \n");

s mb\: blah @#52 blah
s mb\: blah test blah
----------------------

The problem comes up when trying to create the following function:
    function html_special_decode($s) {
      $s = mb_ereg_replace('&gt;', '>', $s);
      $s = mb_ereg_replace('&lt;', '<', $s);
      $s = mb_ereg_replace('&quot;', '"', $s);
      $s = mb_ereg_replace('&#39;', '\'', $s);
      $s = mb_ereg_replace('&amp;', '&', $s);
      return $s;
    }


-Ezra

"Renato De Giovanni" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> > It's probable that it's a PHP...erm..."fact of life" right now. I ran
> > into similar problems with iso-8859-7 and -9, using both
> > htmlspecialchars and htmlentities with the (optional) 3rd parameter.
> > Things worked unpredictably. In the PHP build I have now (4.4ish, from
> > recent CVS), htmlspecialchars actually prints out a PHP error message
> > (E_WARNING, I believe) that:
> >
> > "ISO-8859-7 is not supported by htmlspecialchars(); assuming ISO-8859-1"
> >
> > So I wouldn't be surprised if you weren't running into this problem,
> > which wasn't officially recognized until after 4.2 was released. Look
> > at bugs.php.net for related bugs...it's the only good way to keep up on
> > the issue, which seems to be evolving...
> >
> > Cheers,
> > spud.
>
> Ok, so it's a known "missing feature".
>
> Meanwhile, it's possible to replace:
>
> $s = htmlspecialchars($s, ENT_COMPAT, 'UTF-8');
>
> with:
>
> mb_regex_encoding('UTF-8');
> $s = mb_ereg_replace('&', '&amp;', $s);
> $s = mb_ereg_replace('>', '&gt;', $s);
> $s = mb_ereg_replace('<', '&lt;', $s);
> $s = mb_ereg_replace('"', '&quot;', $s);
>
> ...which should decrease performance considerably, but I see no other
> workaround.
>
> Thanks,
> --
> Renato
>
> --
> This message has been scanned for viruses and
> dangerous content and is believed to be clean.
>

-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP-I18N] mb_ereg_replace bug?

Reply via email to