[PHP] mbstring: Japanese conversion not working for me
I am hoping that someone can help can japanese user input into my pgsql database ... I am still having trouble but am sure it is a simple thing; something about mbstring probably. I tried to directly insert user input into my DB but I got the following error: Warning: PostgreSQL query failed: ERROR: Invalid EUC_JP character sequence found (0x8140) in /www/htdocs/test.php on line 31 So I assumed that I should first convert user input into EUC-JP. I wrote some PHP code to do this but my original user input gets mangled. I am including the output from my test code and the code at the end of this email. Can someone spot what my error/problem is? Jc PHP ouput as shown in browser (N6.2) $_POST["textfield"] : 111$B$"$$$&$($*!!4A;z$R$i$,$J(B1235 CONVERT "auto" to EUC-JP : 111$B!"!V!"!"!"%r!"%#!"%'!#!#%(%A%5yh%a!"r&%c!"%O(B1235 mb_internal_encoding() : EUC-JP mb_detect_order(): ASCII, JIS, UTF-8, EUC-JP, SJIS mb_http_input() : FALSE mb_http_input(): --- input encoding is: pass PHP code I wrote: - ?php $input = $_POST["textfield"]; // convert the user input into EUC-JP (detect user's encoding // automatically) $new = mb_convert_encoding($input, "EUC-JP", "auto"); // output original input and converted input to screen // this doesn't work as the original input gets mangled echo("PRE"); echo("BR\$_POST[\"textfield\"] : ".$input."BR"); echo("CONVERT \"auto\" to EUC-JP : ".$new."BR"); // the following outputs mbstring settings to the screen echo("BRmb_internal_encoding() : ".mb_internal_encoding()."BR"); echo("mb_detect_order(): ".implode(", ", mb_detect_order())."BR"); // What is the value of the HTTP input conversion? // I get a return of "false" which is really strange // it should be "auto" since that is the setting in my // php.ini file, and also what phpinfo() says $enc = mb_http_input("P"); if ($enc == false) echo("BRmb_http_input() : FALSEBR"); echo("mb_http_input(): $encBR"); echo("BR---BR"); // Here I check to see what encoding PHP thinks the // input is in. I get a value of "pass" which is alse // strange since it is not a valid return value! $interenc = mb_internal_encoding(); $inputenc = mb_convert_variables($interenc, "", $input); echo("BRinput encoding is: ".$inputenc."BR"); echo("/PRE"); ? -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] mbstring: Japanese conversion not working for me
ðÒÉ×ÅÔ! As for deciding what your user language/charset requests are (in terms of his/her browser settings) you might use this function // this function remains unchanged. It returns an array // 0 : negotiated charset // 1 : negotiated lancode function negotiated_langset() { // process charset request header and build charset request array $headchrreq = explode(',',$_SERVER['HTTP_ACCEPT_CHARSET']); $i = 0; while ($icount($headchrreq)) { $chunk = explode(';',$headchrreq[$i]); $charsets[$i] = ltrim(rtrim($chunk[0])); $i++; } // process language request header and build language request array $headlanreq = explode(',',$_SERVER['HTTP_ACCEPT_LANGUAGE']); $i = 0; while ($icount($headlanreq)) { $chunk = explode(';',$headlanreq[$i]); $language[$i] = substr(ltrim(rtrim($chunk[0])),0,2); $i++; } // start negotiation $i = 0; while ( isset($language[$i]) !$this-has_content($language[$i]) ) { $i++; } # did we get anything? if (isset($language[$i])) { $lancode = $language[$i]; if ($i==0) { $charset=$charsets[0]; } else { // default charset when first choice unavailable $charset=ISO-8859-1; } } else { // default on nothing found $charset = ISO-8859-1; $lancode = en; } $result[0] = $charset; $result[1] = $lancode; return $result; } *NOTE* the !$this-has_content($language[$i]) call goes to a local function of yours that will return true/false, depending on whether you have available content for this language. Charset request will default to ISO-8859-1 when your first languace choice is unavailable, because there is no data about further languages in the headers. You might want this to became a utf-8 value. Function will not negotiate charset against content availability (as usually you will not have separate content editions for different charsets in your content repository). But you may easily add up the code snippet needed to do it, if that is your case. If you do please share the result :) ðÏËÁ áÌØÂÅÒÔÏ ëÉÅ× -_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_- LoRd, CaN yOu HeAr Me, LiKe I'm HeArInG yOu? lOrD i'M sHiNiNg... YoU kNoW I AlMoSt LoSt My MiNd, BuT nOw I'm HoMe AnD fReE tHe TeSt, YeS iT iS ThE tEsT, yEs It Is tHe TeSt, YeS iT iS ThE tEsT, yEs It Is... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] mbstring: Japanese conversion not working for me
Thanks for the ideas Alberto but what I relly want is to understand how to use the mbstring library, not how to implement a new solution. mbstring is supposed to do everything I want, I guess I am just not quite understanding how to use it yet. Also in regard to some things you said, Now, let's make sure we have a clear background: 1) japanese chars come in three flavours: a) ISO-2022-JP (the one you are using yourself) b) SHIFT-JIS c) EUC-JP 2) your database setting requires you input in c) style while you present it values in a) style. 1) is true but irrelavant. I am assuming that mbstring can automatically detect and convert the user's input. *Database configuration* if possible, turn your Postgres configuration into one that will all three charsets Impossible, though it would be nice. Postgres can only accept one charset for it's input not multiple. *charset forcing* have your input page always delivered in standard format, My page is always in the same charset, the problem is that the user input might not be ... Jc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] mbstring: Japanese conversion not working for me
ðÒÉ×ÅÔ! Jean-Christian Imbeault wrote: Impossible, though it would be nice. Postgres can only accept one charset for it's input not multiple. I hope you mean one charset per language. Otherwise I can just cancel POstgres from my list of usable engines. But yes, it can't be just one. *charset forcing* have your input page always delivered in standard format, My page is always in the same charset, the problem is that the user input might not be ... You mean that browsers will accept charset mixing in japanese? You explicitely declare charset=mycharset in the page headers and the damned thing returns input in charset=hischarset??? Now that's a awful surprise to me. What browser does that? ÐÏËÁ áÌØÂÅÒÔÏ ëÉÅ× -_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_- LoRd, CaN yOu HeAr Me, LiKe I'm HeArInG yOu? lOrD i'M sHiNiNg... YoU kNoW I AlMoSt LoSt My MiNd, BuT nOw I'm HoMe AnD fReE tHe TeSt, YeS iT iS ThE tEsT, yEs It Is tHe TeSt, YeS iT iS ThE tEsT, yEs It Is... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] mbstring: Japanese conversion not working for me
ðÒÉ×ÅÔ! Jean-Christian Imbeault wrote: My page is always in the same charset, the problem is that the user input might not be ... Okay, I went thru a bit of docs on the japanese multibyte problem and got some surface understanding of the problem. Yes, since char dimensions are going to be different I see why browser would end-up mixing up the input. If you find any interesting site explaining how to do this please share. I'll be extending an existing content-repository to add chinese text management in the winter so I'll better start to worry about it. Thanks in advance. As for your problem, I am afraid you would better turn to a japanese programmers' mailing list. That's if you speak japanese yourself, but you seem to do, so... ÐÏËÁ áÌØÂÅÒÔÏ ëÉÅ× -_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_- LoRd, CaN yOu HeAr Me, LiKe I'm HeArInG yOu? lOrD i'M sHiNiNg... YoU kNoW I AlMoSt LoSt My MiNd, BuT nOw I'm HoMe AnD fReE tHe TeSt, YeS iT iS ThE tEsT, yEs It Is tHe TeSt, YeS iT iS ThE tEsT, yEs It Is... -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] mbstring: Japanese conversion not working for me
Alberto Serra wrote: I hope you mean one charset per language. Otherwise I can just cancel POstgres from my list of usable engines. But yes, it can't be just one. I'm no pgsql expert but I think that yes, it will only accept input in one charset. But for charsets that use only 8-bits I think you can insert data that is in more than one charset. But for charsets that use more than 8-bits I think pgsql actually checks that the input is in the charset the DB expects it to be in. Jc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] mbstring: Japanese conversion not working for me
Alberto Serra wrote: I'll be extending an existing content-repository to add chinese text management in the winter so I'll better start to worry about it. Worry and worry a lot ... As for your problem, I am afraid you would better turn to a japanese programmers' mailing list. That's if you speak japanese yourself, but you seem to do, so... Nope, I don't speak japanese ...so this ML is all I can turn to :( Jc -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php