From:             giacomoread at hotmail dot com
Operating system: All
PHP version:      5.2.3
PHP Bug Type:     Scripting Engine problem
Bug description:  preg_replace crashes with large input

Description:
------------
I found a similar bug which was closed with status bogus. Unacceptable!
There is nothing in the documentation which states limits to the input of
preg_replace or any portable work arounds documented. Stating that 'it is
just a stack overflow' just to keep the bug count down is more than a
little unprofessional. A scripting language should either make the
workaround internal or document input limits NOT cause seg faults. This is
a bug whether the php community is willing to accept it or not.

Reproduce code:
---------------
function parse($html, &$title, &$text, &$anchors)
{
  $pstring1 = "'[^']*'";
  $pstring2 = '"[^"]*"';
  $pnstring = "[^'\">]";
  $pintag   = "(?:$pstring1|$pstring2|$pnstring)*";
  $pattrs   = "(?:\\s$pintag){0,1}";

  $pcomment = enclose("<!--", "-", "->");
  $pscript  = enclose("<script$pattrs>", "<", "\\/script>");
  $pstyle   = enclose("<style$pattrs>", "<", "\\/style>");
  $pexclude = "(?:$pcomment|$pscript|$pstyle)";

  $ptitle   = enclose("<title$pattrs>", "<", "\\/title>");
  $panchor  = "<a(?:\\s$pintag){0,1}>";
  $phref    = "href\\s*=[\\s'\"]*([^\\s'\">]*)";

  $html = preg_replace("/$pexclude/iX", " ", $html);

  if ($title !== false)
    $title = preg_match("/$ptitle/iX", $html, $title)
             ? $title[1] : '';

  if ($text !== false)
  {
    $text = preg_replace("/<$pintag>/iX",   " ", $html);
    $text = preg_replace("/\\s+|&nbsp;/iX", " ", $text);
  }

  if ($anchors !== false)
  {
    preg_match_all("/$panchor/iX", $html, $anchors);
    $anchors = $anchors[0];

    reset($anchors);
    while (list($i, $x) = each($anchors))
      $anchors[$i] =
        preg_match("/$phref/iX", $x, $x) ? $x[1] : '';

    $anchors = array_unique($anchors);
  }
}

function enclose($start, $end1, $end2)
{
  return "$start((?:[^$end1]|$end1(?!$end2))*)$end1$end2";
}

Expected result:
----------------
The code should clean the html pages into title, text and links. It works
fine until large pages are downloaded. Then it seg faults with gdb showing
the blame lying on preg_replace.


-- 
Edit bug report at http://bugs.php.net/?id=41896&edit=1
-- 
Try a CVS snapshot (PHP 4.4): 
http://bugs.php.net/fix.php?id=41896&r=trysnapshot44
Try a CVS snapshot (PHP 5.2): 
http://bugs.php.net/fix.php?id=41896&r=trysnapshot52
Try a CVS snapshot (PHP 6.0): 
http://bugs.php.net/fix.php?id=41896&r=trysnapshot60
Fixed in CVS:                 http://bugs.php.net/fix.php?id=41896&r=fixedcvs
Fixed in release:             
http://bugs.php.net/fix.php?id=41896&r=alreadyfixed
Need backtrace:               http://bugs.php.net/fix.php?id=41896&r=needtrace
Need Reproduce Script:        http://bugs.php.net/fix.php?id=41896&r=needscript
Try newer version:            http://bugs.php.net/fix.php?id=41896&r=oldversion
Not developer issue:          http://bugs.php.net/fix.php?id=41896&r=support
Expected behavior:            http://bugs.php.net/fix.php?id=41896&r=notwrong
Not enough info:              
http://bugs.php.net/fix.php?id=41896&r=notenoughinfo
Submitted twice:              
http://bugs.php.net/fix.php?id=41896&r=submittedtwice
register_globals:             http://bugs.php.net/fix.php?id=41896&r=globals
PHP 3 support discontinued:   http://bugs.php.net/fix.php?id=41896&r=php3
Daylight Savings:             http://bugs.php.net/fix.php?id=41896&r=dst
IIS Stability:                http://bugs.php.net/fix.php?id=41896&r=isapi
Install GNU Sed:              http://bugs.php.net/fix.php?id=41896&r=gnused
Floating point limitations:   http://bugs.php.net/fix.php?id=41896&r=float
No Zend Extensions:           http://bugs.php.net/fix.php?id=41896&r=nozend
MySQL Configuration Error:    http://bugs.php.net/fix.php?id=41896&r=mysqlcfg

Reply via email to