Edit report at http://bugs.php.net/bug.php?id=34776&edit=1

 ID:                 34776
 Comment by:         me+phpbugs at ryanmccue dot info
 Reported by:        narzeczony at zabuchy dot net
 Summary:            mb_convert_encoding() - wrong convertion from UTF-16
                     (problem with BOM)
 Status:             No Feedback
 Type:               Bug
 Package:            mbstring related
 Operating System:   Linux, Windows
 PHP Version:        5.0.5
 Block user comment: N
 Private report:     N

 New Comment:

Alternatively:



Reproduce code:

---------------

bin2hex(mb_convert_encoding("\xfe\xff\x22\x1e", 'UTF-8', 'UTF-16'));





Expected result:

----------------

e2889e





Actual result:

--------------

efbbbfe2889e


Previous Comments:
------------------------------------------------------------------------
[2011-04-06 15:20:14] me+phpbugs at ryanmccue dot info

We're also able to reproduce this, with a much smaller test case:



Reproduce code:

---------------

mb_convert_encoding("\xfe\xff\x22\x1e", 'UTF-8', 'UTF-16');





Expected result:

----------------

\xe2\x88\x9e





Actual result:

--------------

\xef\xbb\xbf\xe2\x88\x9e

------------------------------------------------------------------------
[2008-02-18 17:20:00] jdephix at polenord dot com

I forgot to add that I did manage to deal with the UTF-16BE file by
reversing everything.



$s = file_get_contents($fileUTF16BE);

$s = mb_convert_encoding($s, 'UTF-8', "UTF-16LE");

//some operations on $s

file_put_contents($anotherUTF16BEfile, mb_convert_encoding($s,

'UTF-16LE', "UTF-8"));



I need to specify "UTF-16LE" in order to be sure I work with "UTF-16BE".

------------------------------------------------------------------------
[2008-02-18 17:16:32] jdephix at polenord dot com

UTF-16LE and UTF-16BE seem mixed up when using mb_convert_encoding.



I want to read the content of a file in UTF-16BE (starts with \xFE\xFF)
and convert it into UTF-8:



$s = file_get_contents($fileUTF16BE);

$s = mb_convert_encoding($s, 'UTF-8', "UTF-16BE");

//some operations on $s

file_put_contents($anotherUTF16BEfile, mb_convert_encoding($s,
'UTF-16BE', "UTF-8"));



The second file is in Little Endian (starts with \xFF\FE)!!!



I have to specify LE if I want BE.

file_put_contents($anotherUTF16BEfile, mb_convert_encoding($s,
'UTF-16LE', "UTF-8"));



How come it's reversed?

------------------------------------------------------------------------
[2006-06-23 16:11:32] markl at lindenlab dot com

There are two problems when mb_convert_encoding is 

converting from UTF-16:



1) It is including the (transcoded) BOM in the result, 

rather than stripping it



2) If the source UTF-16 string was little endian, then the 

second character of the conversion will be wrong; it is 

converted as if the character code had 0xFF00 or'd into it.



Problem 1 occurs with any UTF-16 variant (though it is 

arguably correct behavior for UTF-16LE and UTF-16BE).  

Problem 2 only occurs when converting from UTF-16.



This PHP program demonstrates this all clearly:







function dump($s)

{

        for ($i = 0; $i < strlen($s); ++$i) {

                echo substr(dechex(256+ord(substr($s, $i, 1))), 1, 

2),  ' ';

        }

        var_dump($s);

}



$utf16le = "\xFF\xFE\x41\x00\x42\x00\x43\x00";

$utf16be = "\xFE\xFF\x00\x41\x00\x42\x00\x43";

        // these strings are both valid UTF-16, the BOM at the 

start indicates

        // the endianness.  We don't expect the BOM to be 

included in a conversion



echo "The UTF-16LE and UTF-16BE sequences:\n";

dump($utf16le);

dump($utf16be);

echo "\n";



$encodings = array("ascii", "iso-8859-1", "utf-8", "utf-16", 

"utf-16le", "utf-16be");



foreach ($encodings as $enc) {

        echo "Converting to $enc:\n";

        dump(mb_convert_encoding($utf16le, $enc, "utf-16"));

        dump(mb_convert_encoding($utf16be, $enc, "utf-16"));

        echo "\n";

}

------------------------------------------------------------------------
[2005-10-15 01:00:03] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    http://bugs.php.net/bug.php?id=34776


-- 
Edit this bug report at http://bugs.php.net/bug.php?id=34776&edit=1

Reply via email to