Edit report at https://bugs.php.net/bug.php?id=51563&edit=1
ID: 51563
Comment by: jmichae3 at yahoo dot com
Reported by: zdenis at free dot fr
Summary: Incorrect result
Status: Assigned
Type: Bug
Package: mbstring related
Operating System: Windows
PHP Version: 5.3.2
Assigned To: moriyoshi
Block user comment: N
Private report: N
New Comment:
I am getting russian spam in my email forms. mb_detect_encoding() on my form
mail content string shows as ASCII strangely enough! the characters are around
the UNICODE Ѐ range.
this prevents me from detecting foreign language characters in my form mail.
please fix.
my code is
//detect foreign languages
$arr[0] = "ASCII";
$arr[1] = "US-ASCII";
if (false===mb_detect_encoding($comment,$arr,true)) {
echo "<div
style='color:red;'>ERRORB:".mb_detect_encoding($comment)."</div>";
return true; //error
}
and using the string I generated from charmap
ÐÏÐγÏÐÐÐЫЫÐÏÐÐÑмдп I get
ASCII for a result from that last mb_detect_encoding($comment)
Previous Comments:
------------------------------------------------------------------------
[2010-04-15 16:06:23] zdenis at free dot fr
Description:
------------
When using mb_detect_encoding, depending on how many é characters - or any
character above 127 - are present in the string, the detected charset is not
consistent and then sometimes wrong.
Test script:
---------------
// little example
php -r "echo mb_detect_encoding(\"é\", 'UTF-8,ISO-8859-1');"
php -r "echo mb_detect_encoding(\"éé\", 'UTF-8,ISO-8859-1');"
// real life example
php -r "echo mb_detect_encoding(\"Produit commandé\", 'UTF-8,ISO-8859-1');"
php -r "echo mb_detect_encoding(\"Société\", 'UTF-8,ISO-8859-1');"
Expected result:
----------------
ISO-8859-1
ISO-8859-1
ISO-8859-1
ISO-8859-1
Actual result:
--------------
UTF-8
ISO-8859-1
UTF-8
ISO-8859-1
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=51563&edit=1