ID: 49528 Updated by: moriyo...@php.net Reported By: moriyo...@php.net -Status: Closed +Status: Open Bug Type: mbstring related Operating System: * PHP Version: 5.3SVN-2009-09-11 (SVN) Assigned To: moriyoshi New Comment:
I left this open because patch is not merged to 5.2 yet. Previous Comments: ------------------------------------------------------------------------ [2009-09-11 14:34:30] j...@php.net Fixed -> closed (?) (or did you leave this open just for fun?) ------------------------------------------------------------------------ [2009-09-11 08:22:20] s...@php.net Automatic comment from SVN on behalf of moriyoshi Revision: http://svn.php.net/viewvc/?view=revision&revision=288260 Log: - Fix bug #49528 (UTF-16 strings prefixed by BOM wrongly converted). ------------------------------------------------------------------------ [2009-09-11 08:21:16] j...@php.net Moriyoshi propably added this report as reminder for himself. ------------------------------------------------------------------------ [2009-09-11 08:18:38] sjo...@php.net It can be argued that the BOM character U+FEFF should never be converted, as it is no real character. I don't think it is the task of mb_convert_encoding to detect the byte order and interpret the BOM. ------------------------------------------------------------------------ [2009-09-11 07:45:05] moriyo...@php.net Description: ------------ The first character of a UTF-16 string prefixed by "\xff\xfe" (LE BOM) gets converted to wrong Unicode codepoint. Moreover, the resulting string contains the BOM itself while it is uncalled for. Reproduce code: --------------- <?php var_dump(bin2hex(mb_convert_encoding("\xff\xfe\x01\x02\x03\x04", "UCS-2", "UTF-16"))); ?> Expected result: ---------------- string(8) "02010403" Actual result: -------------- string(12) "feffff010403" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=49528&edit=1