ID: 44014
Comment by: d_kelsey at uk dot ibm dot com
Reported By: michael202 at gmx dot de
Status: No Feedback
Bug Type: mbstring related
Operating System: Win XP
PHP Version: 5.2.5
Assigned To: hirokawa
New Comment:
My understanding of UTF-16 is that the BOM is a mandatory. For mbstring
I have found that if I input a UTF-16 string for conversion in
mb_convert_encoding for example to UTF-8, it treats the BOM as UTF-16
data and converts it.
MBString doesn't generate the BOM when converting to UTF-16, so as I
thought the BOM was mandatory, it isn't generating valid UTF-16 bytes.
I see that MBString uses UTF-16BE effectively when you specify UTF-16.
If mbstring doesn't support BOM then UTF-16 cannot be handled properly.
Should this at least be documented and recommend considering using
UTF-16BE as the encoding so that you are explicit in what is supportable
?
Previous Comments:
------------------------------------------------------------------------
[2008-02-24 01:00:00] php-bugs at lists dot php dot net
No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".
------------------------------------------------------------------------
[2008-02-16 12:17:13] [EMAIL PROTECTED]
BOM of Unicode is not supported by encoding conversion function
in mbstring.
And big endian is default in UTF-16. Please specify 'UTF-16LE'
if you need to specify little endian format.
Try,
<?php
$utf16 = chr(0).chr(0x4d).chr(0).chr(0x6f); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');
echo($utf8 . "\n"); // -> Mo
?>
or
<?php
$utf16 = chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16LE');
echo($utf8 . "\n"); // -> Mo
?>
------------------------------------------------------------------------
[2008-02-05 05:10:37] [EMAIL PROTECTED]
Assigned to the mbstring maintainer.
------------------------------------------------------------------------
[2008-02-01 12:08:07] michael202 at gmx dot de
Description:
------------
mb_convert_encoding 'destroys' first character when
converting from UTF16 to UTF8
(iconv works).
Reproduce code:
---------------
$utf16 = chr(0xFF).chr(0xFE).chr(0x4d).chr(0).chr(0x6f).chr(0); //'Mo'
$utf8 = mb_convert_encoding($utf16, 'UTF-8', 'UTF-16');
echo($utf8 . "\n"); // -> ´++´¢ìo
$utf8 = iconv('UTF-16', 'UTF-8', $utf16);
echo($utf8 . "\n"); // -> Mo
Expected result:
----------------
mb: (BOM8)Mo
iconv: Mo
(BOM8) is a placeholder
Actual result:
--------------
mb: (BOM8)´¢ìo (copied from cmd shell)
iconv: Mo
(BOM8) is a placeholder
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=44014&edit=1