ID: 24309 User updated by: jc at mega-bucks dot co dot jp Reported By: jc at mega-bucks dot co dot jp -Status: Bogus +Status: Open Bug Type: mbstring related Operating System: Linux -PHP Version: 4.3.1 +PHP Version: 4.3.3RC1 New Comment:
Bug still occurs with newest version Previous Comments: ------------------------------------------------------------------------ [2003-06-26 12:05:04] [EMAIL PROTECTED] Thank you for taking the time to report a problem with PHP. Unfortunately you are not using a current version of PHP -- the problem might already be fixed. Please download a new PHP version from http://www.php.net/downloads.php If you are able to reproduce the bug with one of the latest versions of PHP, please change the PHP version on this bug report to the version you tested and change the status back to "Open". Again, thank you for your continued support of PHP. PHP 4.3.2 was released a while ago. ------------------------------------------------------------------------ [2003-06-24 02:52:51] jc at mega-bucks dot co dot jp Description: ------------ I've just run into a strange "bug". I have a form on my web site that takes input from the user and then uses that to do a search of a postgresql database. The form is set to be EUC-JP, but this weekend a user submitted a query that postgres reject because it "contains invalid EUC-JP" characters. Luckily the error was logged and I was able to track it down. I thought that maybe the user had entered some bad characters in the form or used some strange encoding so I should better check to make sure that the encoding of the submitted form data really is EUC-JP using mb_detect_encoding(). But unfortunately mb_detect_encoding() says that the invalid string *is* in EUC-JP!? The query string is as it appears in the URL is: search_words=%B7%F6%BA%7E In the script that parses this query I have put the following: $words = $_GET["words"]; $enc = mb_detect_encoding($aI["words"]); echo "encoding is $enc and the query is ($words)";die; The result is: encoding is EUC-JP and the query is (喧?) As you can see the query string is *not* a valid EUC-JP sequence ... Reproduce code: --------------- $words = $_GET["words"]; $enc = mb_detect_encoding($aI["words"]); echo "encoding is $enc and the query is ($words)";die; Expected result: ---------------- SJIS (?) or Undefined. mb_detect_encoding() does not specify what it returns if an invalid character sequence for which the encoding cannot be detectec is passed in. In the above case the character sequence is valid SJIS I believe ... Actual result: -------------- EUC-JP ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=24309&edit=1