From: giacomoread at hotmail dot com Operating system: All PHP version: 5.2.3 PHP Bug Type: Scripting Engine problem Bug description: preg_replace crashes with large input
Description: ------------ I found a similar bug which was closed with status bogus. Unacceptable! There is nothing in the documentation which states limits to the input of preg_replace or any portable work arounds documented. Stating that 'it is just a stack overflow' just to keep the bug count down is more than a little unprofessional. A scripting language should either make the workaround internal or document input limits NOT cause seg faults. This is a bug whether the php community is willing to accept it or not. Reproduce code: --------------- function parse($html, &$title, &$text, &$anchors) { $pstring1 = "'[^']*'"; $pstring2 = '"[^"]*"'; $pnstring = "[^'\">]"; $pintag = "(?:$pstring1|$pstring2|$pnstring)*"; $pattrs = "(?:\\s$pintag){0,1}"; $pcomment = enclose("<!--", "-", "->"); $pscript = enclose("<script$pattrs>", "<", "\\/script>"); $pstyle = enclose("<style$pattrs>", "<", "\\/style>"); $pexclude = "(?:$pcomment|$pscript|$pstyle)"; $ptitle = enclose("<title$pattrs>", "<", "\\/title>"); $panchor = "<a(?:\\s$pintag){0,1}>"; $phref = "href\\s*=[\\s'\"]*([^\\s'\">]*)"; $html = preg_replace("/$pexclude/iX", " ", $html); if ($title !== false) $title = preg_match("/$ptitle/iX", $html, $title) ? $title[1] : ''; if ($text !== false) { $text = preg_replace("/<$pintag>/iX", " ", $html); $text = preg_replace("/\\s+| /iX", " ", $text); } if ($anchors !== false) { preg_match_all("/$panchor/iX", $html, $anchors); $anchors = $anchors[0]; reset($anchors); while (list($i, $x) = each($anchors)) $anchors[$i] = preg_match("/$phref/iX", $x, $x) ? $x[1] : ''; $anchors = array_unique($anchors); } } function enclose($start, $end1, $end2) { return "$start((?:[^$end1]|$end1(?!$end2))*)$end1$end2"; } Expected result: ---------------- The code should clean the html pages into title, text and links. It works fine until large pages are downloaded. Then it seg faults with gdb showing the blame lying on preg_replace. -- Edit bug report at http://bugs.php.net/?id=41896&edit=1 -- Try a CVS snapshot (PHP 4.4): http://bugs.php.net/fix.php?id=41896&r=trysnapshot44 Try a CVS snapshot (PHP 5.2): http://bugs.php.net/fix.php?id=41896&r=trysnapshot52 Try a CVS snapshot (PHP 6.0): http://bugs.php.net/fix.php?id=41896&r=trysnapshot60 Fixed in CVS: http://bugs.php.net/fix.php?id=41896&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=41896&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=41896&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=41896&r=needscript Try newer version: http://bugs.php.net/fix.php?id=41896&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=41896&r=support Expected behavior: http://bugs.php.net/fix.php?id=41896&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=41896&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=41896&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=41896&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=41896&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=41896&r=dst IIS Stability: http://bugs.php.net/fix.php?id=41896&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=41896&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=41896&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=41896&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=41896&r=mysqlcfg