ID:               21027
 Updated by:       [EMAIL PROTECTED]
 Reported By:      [EMAIL PROTECTED]
-Status:           Open
+Status:           Feedback
 Bug Type:         Scripting Engine problem
 Operating System: All
 PHP Version:      4.3.0RC3
 New Comment:

What is wrong with that output of your test script?

Derick


Previous Comments:
------------------------------------------------------------------------

[2002-12-15 10:08:17] [EMAIL PROTECTED]

htmlspecialchars() handles '&' char incorrectly - it doesn't care if it
is aready part of entity or not. It results in very "funny" things when
this function is being called several times for the same string. For
example:

echo
htmlspecialchars(htmlspecialchars(htmlspecialchars(htmlspecialchars(htmlspecialchars('text
& text')))));

will produce: 
text & text 

Most correct bahaviour will be to check, if it is followed by any valid
entity as they're described in HTML specification. However it can be
quite hard to do, because there is lots of entities. So another way is
also possible (it should be faster but more dirdy): just check if '&'
char is started some abstract entity. Here is 2 regular expressions
which are implements correct '&' char handling:

1. This is correct way to handle entities:
preg_replace('/\&(?!((#\d{1,5})|(#(x|X)[\dA-Fa-f]{1,4})|[aA]acute|[aA]circ|acute|(ae|AE)lig|

[aA]grave|alefsym|[aA]lpha|amp|an[dg]|[aA]ring|asymp|[aA]tilde|[aA]uml|
bdquo|[bB]eta|brvbar|bull|cap|[cC]cedil|cedil|cent|[cC]hi|circ|clubs|cong|
copy|crarr|cup|curren|[dD]agger|d[aA]rr|deg|[dD]elta|diams|divide|[eE]acute|
[eE]circ|[eE]grave|empty|e[mn]sp|[eE]psilon|equiv|[eE]ta|eth|ETH|[eE]uml|
euro|exist|fnof|forall|frac1[24]|frac34|frasl|[gG]amma|g[et]|h[aA]rr|hearts|
hellip|[iI]acute|[iI]circ|iexcl|[iI]grave|image|infin|int|[iI]ota|iquest|
isin|[iI]uml|[kK]appa|[lL]ambda|lang|laquo|l[aA]rr|lceil|ldquo|le|lfloor|
lowast|loz|lrm|lsa?quo|lt|macr|mdash|micro|middot|minus|[mM]u|nabla|nbsp|
ndash|n[ei]|not(in)?|nsub|[nN]tilde|[nN]u|[oO]acute|[oO]circ|(oe|OE)lig|
[oO]grave|oline|[oO]mega|[oO]micron|oplus|or|ord[fm]|[oO]slash|[oO]tilde|
otimes|[oO]uml|par[at]|permil|perp|[pP]hi|[pP]i|piv|plusmn|pound|[pP]rime|
pro[dp]|[pP]si|quot|radic|rang|raquo|r[aA]rr|rceil|rdquo|real|reg|rfloor|
[rR]ho|rlm|rsaquo|rsquo|sbquo|[sS]caron|sdot|sect|shy|[sS]igma|sigmaf|sim|
spades|sube?|sum|sup[123e]?|szlig|[tT]au|there4|[tT]heta|thetasym|thinsp|
thorn|THORN|tilde|times|trade|[uU]acute|u[aA]rr|[uU]circ|[uU]grave|uml|
upsih|[uU]psilon|[uU]uml|weierp|[xX]i|[yY]acute|yen|[yY]uml|[zZ]eta|zwn?j);)/','&',$str);


2. This is less correct, but still better way to handle them:
preg_replace('/&(?!(([A-Za-z_:][A-Za-z0-9\.\-_:]*)|(#\d+)|(#(x|X)[\dA-Fa-f]+));)/','&',$str);


 Good thing about second regexp is that in a case this way will be
implemented by htmlspecialchars() function - it will be possible to use
it to handle XML entities aswell.

------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=21027&edit=1

Reply via email to