ID:               36994
 User updated by:  ynynmzvqofeaz at mailinator dot com
 Reported By:      ynynmzvqofeaz at mailinator dot com
 Status:           Bogus
 Bug Type:         mbstring related
 Operating System: Linux
 PHP Version:      4.4.2
 Assigned To:      hirokawa
 New Comment:

So to use my example from before, why do both
 $string = "testö"
in a utf-8 text file, and 
 $string = "testö"
in an iso-8859-1 file (converted using iconv) return "UTF-8" with
mb_detect_encoding, even when strict is on?


Previous Comments:
------------------------------------------------------------------------

[2006-04-17 15:41:03] [EMAIL PROTECTED]

It is not a bug, it is a specification.
You should use 'strict' mode in mb_detect_encoding() 
if you need to return correct result.

mb_detect_encoding() treat the string as byte-stream.
{0x61,0x63,0x63,0x65,0x6e,0x74,0x75,0xe9} is a correct
UTF-8 byte stream.
In this case, 0xe9 is treat as the first byte of
multibyte character. 

{0x61,0x63,0x63,0x65,0x6e,0x74,0x75,0xe9,0x65} is wrong
UTF-8 byte stream because 0xe965 is invalid byte sequence in
UTF-8.

If you need to remove the incomplete multibyte character from
detection, please try to use 'strict' option like,
echo mb_detect_encoding($s1 , 'UTF-8, ISO-8859-1',true);


------------------------------------------------------------------------

[2006-04-10 11:53:02] ynynmzvqofeaz at mailinator dot com

Ignore the last comment. Do this:

Create two files with the following content, and name them test_iso1
and test_utf8:
<?php
echo mb_detect_encoding("testä");
?>

Make sure the encoding is correct:
$ file test_iso1
should return iso-8859-1
$ file test_utf8
should return utf-8

If they do not return the correct encoding, use iconv to convert them,
e.g.
$ iconv -f utf-8 -t iso-8859-1 test_iso1 >test_iso1.fixed
or
$ iconv -f iso-8859-1 -t utf-8 test_utf8 >test_utf8.fixed

Now run each script. The test_iso1 script should return a type of iso1,
the test_utf8 script should return a type of utf8.

Workaround: append an extra character to the end of the string, and
then remove it(!)

------------------------------------------------------------------------

[2006-04-10 07:02:09] ynynmzvqofeaz1 at mailinator dot com

*sigh*

<?php
echo mb_detect_encoding("testä");
?>

use iconv -f utf-8 -t iso-8859-1 INFILE > OUTFILE
if "file OUTFILE" says utf-8.

------------------------------------------------------------------------

[2006-04-06 11:49:43] ynynmzvqofeaz at mailinator dot com

Description:
------------
mb_detect_encoding returns wrong result when text contains a trailing
accent.
See http://www.php.net/manual/en/function.mb-detect-encoding.php#55228



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=36994&edit=1

Reply via email to