Edit report at https://bugs.php.net/bug.php?id=65072&edit=1

 ID:                 65072
 Updated by:         ni...@php.net
 Reported by:        kstirn at gmail dot com
 Summary:            strlen result wrong after passing through
                     html_entity_decode
-Status:             Open
+Status:             Not a bug
 Type:               Bug
 Package:            Unknown/Other Function
 Operating System:   Windows 7
 PHP Version:        5.4.16
 Block user comment: N
 Private report:     N

 New Comment:

html_entity_decode(' ') returns a U+00A0 NO-BREAK SPACE, which encoded in 
UTF8 is "\xc2\xa0". strlen() returns the length in bytes (not code points), as 
such strlen("\xc2\xa0") will be 2, not 1.

If you want to have the output in a different encoding (e.g. ISO-8859-1) you 
can specify it via the last argument to the function.


Previous Comments:
------------------------------------------------------------------------
[2013-06-20 16:47:46] kstirn at gmail dot com

Description:
------------
If you run string "X X" through html_entity_decode then check its' length 
it returns 4 instead of 3.

PHP 5.3.x gives correct result.

PHP 5.4.x and 5.5.x give incorrect result.

Possibly the same problem with other HTML entities, such as » etc...

Test script:
---------------
<?php
echo 'strlen(a) = ' .strlen("X X") . "<br>\n";
echo 'strlen(b) = ' .strlen(html_entity_decode('X&nbsp;X')) . "<br>\n";
?>

Expected result:
----------------
strlen(a) = 3
strlen(b) = 3

Actual result:
--------------
strlen(a) = 3
strlen(b) = 4


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=65072&edit=1

Reply via email to