ID: 25707
Updated by: [EMAIL PROTECTED]
Reported By: Bjorn dot Victor at it dot uu dot se
-Status: Bogus
+Status: Verified
Bug Type: Feature/Change Request
Operating System: Solaris 8
PHP Version: 4.3.3
New Comment:
html_entity_decode(htmlentities("<")) returns "<", but IMHO it
should return the original "<".
The unhtmlentities() function given on
http://www.php.net/html_entity_decode works like it should (in my
eyes).
Previous Comments:
------------------------------------------------------------------------
[2003-10-01 03:31:42] Bjorn dot Victor at it dot uu dot se
Sorry, this is not an RTFM error, and has nothing to do with the
optional parameters of the function. I have changed the summary to
refer to "lt", to avoid confusion with ENT_QUOTES etc - believe me, I
tried this before looking at the source and figuring out what the error
really was.
The current code works like this: iterate over the 6 "basic_entities",
replace the entity with its character in the string. "&" is the
first item in basic_entities, which is good when you're doing
htmlentities (the reverse operation).
Given a string "&lt;", it will first become "<", and then
(because "<" is handled after "&"), "<".
Consider doing "&" last, e.g. by traversing basic_entities
backwards:
"&lt;" becomes "<", which is the expected.
------------------------------------------------------------------------
[2003-09-30 15:00:59] [EMAIL PROTECTED]
RTFM: http://www.php.net/html_entity_decode
(the 2nd optional parameter..)
------------------------------------------------------------------------
[2003-09-30 14:52:20] Bjorn dot Victor at it dot uu dot se
Description:
------------
Symptom:
html_entity_decode("&quot;") returns '"', while the expected value
would be """. Corresponding (wrong) behaviour for & followed
by "lt;", "gt;" etc.
Another example is html_entity_decode(htmlentities("<")) which
returns "<" rather than "<" as expected.
As a result, html_entity_decode can not be used as the inverse of
htmlentities.
Diagnosis:
The function (php_unescape_html_entities in ext/standard/html.c)
replaces each entity in basic_entities with its corresponding
character, but starts by replacing "&" with "&", the resulting
string being """, which is then replaced by '"'.
Solution:
php_unescape_html_entities in ext/standard/html.c traverses the
basic_entities from the wrong end; it must replace "&" *last*, not
*first*.
Reproduce code:
---------------
print html_entity_decode("&quot;&lt;&gt;");
Expected result:
----------------
"<>
Actual result:
--------------
"<>
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=25707&edit=1