I've just run into a strange "bug". I have a form on my web site that
takes input from the user and then uses that to do a search of a
postgresql database.

The form is set to be EUC-JP, but this weekend a user submitted a query
that postgres reject because it "contains invalid EUC-JP" characters.
Luckily the error was logged and I was able to track it down.

I thought that maybe the user had entered some bad characters in the
form or used some strange encoding so I should better check to make sure
that the encoding of the submitted form data really is EUC-JP using
mb_detect_encoding(). But unfortunately mb_detect_encoding() says that
the invalid string *is* in EUC-JP!?

The query string is as it appears in the URL is:

search_words=%B7%F6%BA%7E

In the script that parses this query I have put the following:

$words = $_GET["search_words"];
$enc = mb_detect_encoding($aI["search_words"]);
echo "encoding is $enc and the query is ($words)";die;

The result is:

encoding is EUC-JP and the query is (喧?)

As you can see the query string is *not* a valid EUC-JP sequence ...

Is there any way for me to check the user input to make sure it is a
valid EUC-JP sequence before passing it off to my database?

I though that mb_detect_encoding() would be the function to use but it
fails in this case.

Any suggestions?

Thanks,

Jen-Christian Imbeault



-- 
PHP Internationalization Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to