ID:               45993
 Comment by:       mtrojan at transline dot de
 Reported By:      mtrojan at transline dot de
 Status:           To be documented
 Bug Type:         mbstring related
 Operating System: Windows XP
 PHP Version:      5.2.6
 New Comment:

Of course, comparing the beginning of a file with the UTF-16 BOM can be
used to detect UTF-16 encoding. But what do you do with UTF-16 encoded
files where no BOM is set?


Previous Comments:
------------------------------------------------------------------------

[2008-11-08 02:20:46] [EMAIL PROTECTED]

mb_detect_encoding does not support the UTF-16/UTF-16BE 
encoding detection. Because UTF-16 isn't byte stream encoding like
UTF-8, we cannot detect the encoding as other byte stream encoding.

The file encoded in UTF-16 can be detected easily using BOM, 
it is like,

if ($content[0]==chr(0xff) && $content[1]==chr(0xfe)) {
  echo 'UTF-16';
} else if ($content[0]==chr(0xfe) && $content[1]==chr(0xff)) {
  echo 'UTF-16BE';
}







------------------------------------------------------------------------

[2008-10-26 23:01:49] [EMAIL PROTECTED]

Assigned to the mbstring maintainer.

------------------------------------------------------------------------

[2008-09-04 11:47:39] mtrojan at transline dot de

Description:
------------
mb_detect_encoding does not seem to recognize UTF-16 encoded files
properly. Even if it is assured by using mb_check_encoding that a file
is truly UTF-16LE, mb_detect_encoding does not detect the same file as
UTF-16 and is returning ISO-8859-1 instead. Activating/deactivating
strict mode has no influence on the result.

Reproduce code:
---------------
$content = file_get_contents($src_path);
        
$encodings = array('UTF-16', 'UTF-16LE', 'UTF-16BE', 'UTF-8',
'UNICODE', 'ISO-8859-1');

$enc = mb_detect_encoding($content, $encodings);
print "encoding: $enc\n";
        
print 'checked: ' . intval(mb_check_encoding($content, 'UTF-16LE'));

Expected result:
----------------
encoding: UTF-16LE
checked: 1

Actual result:
--------------
encoding: ISO-8859-1
checked: 1


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=45993&edit=1

Reply via email to