ID: 34776 Updated by: [EMAIL PROTECTED] Reported By: narzeczony at zabuchy dot net -Status: Open +Status: Feedback Bug Type: mbstring related Operating System: Linux, Windows PHP Version: 5.0.5 New Comment:
Please try using this CVS snapshot: http://snaps.php.net/php5-latest.tar.gz For Windows: http://snaps.php.net/win32/php5-win32-latest.zip Previous Comments: ------------------------------------------------------------------------ [2005-10-07 16:36:23] narzeczony at zabuchy dot net The same example but with iconv instead of mb_convert_encoding works perfect - but it doesn't close bug related to mb_convert_encoding I guess :). Another problem exist with converting to 'UTF-16' (using mb_convert_encoding) - BOM section is not added. Again iconv works well in this case. ------------------------------------------------------------------------ [2005-10-07 12:43:32] [EMAIL PROTECTED] ah, mbstring has a weird parameter order (dest, src) instead of (src, dest)... did you try to use iconv perhaps? ------------------------------------------------------------------------ [2005-10-07 12:33:45] narzeczony at zabuchy dot net I'm not specifying which endianess mb_convert_encoding should use to convert to ISO. Look: $utf16LE2iso = mb_convert_encoding($utf16LE,'ISO-8859-1','UTF-16'); I'm converting from UTF-16 (LE or BE) to ISO-8859-1. It looks like mb_convert_encoding is checking BOM field and choosing right encoding (if you remove BOM field it won't be converted properly for one endianess). The only problem is that BOM is not ignored. The first two lines with endianess specified: $utf16LE = mb_convert_encoding($iso_8859_1,'UTF-16LE','ISO-8859-1'); $utf16BE = mb_convert_encoding($iso_8859_1,'UTF-16BE','ISO-8859-1'); are just for convient UTF-16 string creation, please ignore them. ------------------------------------------------------------------------ [2005-10-07 11:57:10] [EMAIL PROTECTED] I think this is correct as you are not supposed to supply a BOM if you specify which endianness your UTF16 stream is in. ------------------------------------------------------------------------ [2005-10-07 11:52:16] narzeczony at zabuchy dot net There is also small typo in documentation but I dont want to open another bug. On http://ie.php.net/mbstring this section is repeated twice: Name in the IANA character set registry: UTF-16BE Underlying character set: Unicode Description: See above. Additional note: In contrast to UTF-16, strings are always assumed to be in big endian form. While one should be about UTF-16BE and other about UTF-16LE. ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/34776 -- Edit this bug report at http://bugs.php.net/?id=34776&edit=1
