ID:          36112
 Updated by:  [EMAIL PROTECTED]
 Reported By: pornel at despammed dot com
-Status:      Open
+Status:      Closed
 Bug Type:    Documentation problem
 PHP Version: Irrelevant
 Assigned To: colder
 New Comment:

This bug has been fixed in the documentation's XML sources. Since the
online and downloadable versions of the documentation need some time
to get updated, we would like to ask you to be a bit patient.

Thank you for the report, and for helping us make our documentation
better.

I simply removed the example for now.


Previous Comments:
------------------------------------------------------------------------

[2006-03-12 17:06:18] [EMAIL PROTECTED]

There are lot of inconsistencies in this example:

1) About @<script[^>]*?>.*?</script>@si :
   a) the first ? is useless.

2) About @<[\/\!]*?[^<>]*?>@si :
   a) / and ! don't have to be escaped. 
   b) [\/\!]*? is useless, as it's already matched by [^<>]*?. 
   c) the ? of [^<>]*? is useless.
   d) the PCRE_DOTALL modifier is useless, there is no dot.
   e) the PCRE_CASELESS modifier is useless.
   f) what is the point avoiding "<" in a tag?

3) About @([\r\n])[\s]+@ :
   a) no need to put \s in a char class.
   b) every \r\n will be changed to \r, as \s matches \n.

I think the whole example has to be reconsidered, because there are
already functions to do some of the job, like strip_tags() and
html_entity_decode().

------------------------------------------------------------------------

[2006-01-20 23:54:03] pornel at despammed dot com

Description:
------------
The code on http://uk.php.net/preg_replace:

$search = array ('@<script[^>]*?>.*?</script>@si', // Strip 
out javascript
                 '@<[\/\!]*?[^<>]*?>@si',          // Strip 
out HTML tags

doesn't work as advertised. For example it will leave 
contents of:
<script>xxx</script       >
and worse, it will output valid script tags if given:
<<>script>evil<<>/script>

If these patterns were used on some website (for stripping 
markup from user's comments for example), they'd allow XSS 
attack.


Since it's near impossible to properly parse HTML with 
regular expressions I suggest:
* renaming example from 'Convert HTML to text' to 'Remove 
HTML markup'
* adding replacement of '<' as '&gt;'
* suggesting use of more robust methods, like strip_tags, 
nl2br, htmlspecialchars or DOM interface.




------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=36112&edit=1

Reply via email to