ID: 43957
Updated by: [EMAIL PROTECTED]
Reported By: [EMAIL PROTECTED]
-Status: Open
+Status: Assigned
Bug Type: Unknown/Other Function
Operating System: linux debian 4.0
PHP Version: 5.2.5
-Assigned To:
+Assigned To: rasmus
New Comment:
I see the bug in the code. I still think this function needs to die,
but I guess we have to continue supporting it. I'll fix it.
Previous Comments:
------------------------------------------------------------------------
[2008-01-29 02:35:03] [EMAIL PROTECTED]
Description:
------------
utf8_decode() outputs a random character when supplied with bad input.
When invalid sequences are added, utf8_decoded() usually replace the
sequence with the character "?". But when a lonely highbit character is
present in the end the output seem to be a random character.
Reproduce code:
---------------
for($a=0;$a<20;$a++)printf("%02x ",utf8_decode(chr(0xE0)));
Expected result:
----------------
3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f 3f
(utf8_decode() returns a question mark)
Actual result:
--------------
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
or
09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09 09
or
05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05 05
or
02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02
or some other random value
It seem to differ more with individual runs:
$ for a in `seq 1 20`; do php -r 'printf("%02x
",utf8_decode(chr(0xE0)));';
done
08 00 00 02 00 00 00 00 00 05 00 00 00 05 00 00 07 00 09 00
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=43957&edit=1