ID: 36112 Updated by: [EMAIL PROTECTED] Reported By: pornel at despammed dot com -Status: Open +Status: Closed Bug Type: Documentation problem PHP Version: Irrelevant Assigned To: colder New Comment:
This bug has been fixed in the documentation's XML sources. Since the online and downloadable versions of the documentation need some time to get updated, we would like to ask you to be a bit patient. Thank you for the report, and for helping us make our documentation better. I simply removed the example for now. Previous Comments: ------------------------------------------------------------------------ [2006-03-12 17:06:18] [EMAIL PROTECTED] There are lot of inconsistencies in this example: 1) About @<script[^>]*?>.*?</script>@si : a) the first ? is useless. 2) About @<[\/\!]*?[^<>]*?>@si : a) / and ! don't have to be escaped. b) [\/\!]*? is useless, as it's already matched by [^<>]*?. c) the ? of [^<>]*? is useless. d) the PCRE_DOTALL modifier is useless, there is no dot. e) the PCRE_CASELESS modifier is useless. f) what is the point avoiding "<" in a tag? 3) About @([\r\n])[\s]+@ : a) no need to put \s in a char class. b) every \r\n will be changed to \r, as \s matches \n. I think the whole example has to be reconsidered, because there are already functions to do some of the job, like strip_tags() and html_entity_decode(). ------------------------------------------------------------------------ [2006-01-20 23:54:03] pornel at despammed dot com Description: ------------ The code on http://uk.php.net/preg_replace: $search = array ('@<script[^>]*?>.*?</script>@si', // Strip out javascript '@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags doesn't work as advertised. For example it will leave contents of: <script>xxx</script > and worse, it will output valid script tags if given: <<>script>evil<<>/script> If these patterns were used on some website (for stripping markup from user's comments for example), they'd allow XSS attack. Since it's near impossible to properly parse HTML with regular expressions I suggest: * renaming example from 'Convert HTML to text' to 'Remove HTML markup' * adding replacement of '<' as '>' * suggesting use of more robust methods, like strip_tags, nl2br, htmlspecialchars or DOM interface. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=36112&edit=1
