[PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Jean-Christian Imbeault
I am hoping that someone can help can japanese user input into my pgsql
database ...

I am still having trouble but am sure it is a simple thing; something
about mbstring probably.

I tried to directly insert user input into my DB but I got the following
error:

Warning:  PostgreSQL query failed:  ERROR:  Invalid EUC_JP character
sequence found (0x8140) in /www/htdocs/test.php on line 31

So I assumed that I should first convert user input into EUC-JP.

I wrote some PHP code to do this but my original user input gets mangled.

I am including the output from my test code and the code at the end of
this email.

Can someone spot what my error/problem is?

Jc


PHP ouput as shown in browser (N6.2)


$_POST["textfield"]  : 111$B$"$$$&$($*!!4A;z$R$i$,$J(B1235
CONVERT "auto" to EUC-JP : 111$B!"!V!"!"!"%r!"%#!"%'!#!#%(%A%5yh%a!"r&%c!"%O(B1235

mb_internal_encoding()   : EUC-JP
mb_detect_order(): ASCII, JIS, UTF-8, EUC-JP, SJIS

mb_http_input()  : FALSE
mb_http_input():

---

input encoding is: pass



PHP code I wrote:
-

?php

$input = $_POST["textfield"];

// convert the user input into EUC-JP (detect user's encoding
// automatically)

$new   = mb_convert_encoding($input, "EUC-JP", "auto");

// output original input and converted input to screen
// this doesn't work as the original input gets mangled

echo("PRE");
echo("BR\$_POST[\"textfield\"]  : ".$input."BR");
echo("CONVERT \"auto\" to EUC-JP : ".$new."BR");

// the following outputs mbstring settings to the screen
echo("BRmb_internal_encoding()   : ".mb_internal_encoding()."BR");
echo("mb_detect_order(): ".implode(", ", mb_detect_order())."BR");

// What is the value of the HTTP input conversion?
// I get a return of "false" which is really strange
// it should be "auto" since that is the setting in my
// php.ini file, and also what phpinfo() says

$enc = mb_http_input("P");
if ($enc == false) echo("BRmb_http_input()  : FALSEBR");
echo("mb_http_input(): $encBR");

echo("BR---BR");

// Here I check to see what encoding PHP thinks the
// input is in. I get a value of "pass" which is alse
// strange since it is not a valid return value!

$interenc = mb_internal_encoding();
$inputenc = mb_convert_variables($interenc, "", $input);

echo("BRinput encoding is: ".$inputenc."BR");

echo("/PRE");
?


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Alberto Serra

ðÒÉ×ÅÔ!

As for deciding what your user language/charset requests are (in terms 
of his/her browser settings) you might use this function

 // this function remains unchanged. It returns an array
 //  0 : negotiated charset
 //  1 : negotiated lancode
 function negotiated_langset() {
   // process charset request header and build charset request array
   $headchrreq = explode(',',$_SERVER['HTTP_ACCEPT_CHARSET']);
   $i = 0;
   while ($icount($headchrreq)) {
 $chunk = explode(';',$headchrreq[$i]);
 $charsets[$i] = ltrim(rtrim($chunk[0]));
 $i++;
   }

   // process language request header and build language request array
   $headlanreq = explode(',',$_SERVER['HTTP_ACCEPT_LANGUAGE']);
   $i = 0;
   while ($icount($headlanreq)) {
 $chunk = explode(';',$headlanreq[$i]);
 $language[$i] = substr(ltrim(rtrim($chunk[0])),0,2);
 $i++;
   }

   // start negotiation
   $i = 0;
   while ( isset($language[$i]) 
   !$this-has_content($language[$i]) ) { $i++; }

# did we get anything?
if (isset($language[$i])) {
   $lancode = $language[$i];
   if ($i==0) { $charset=$charsets[0]; }
   else {
 // default charset when first choice unavailable
 $charset=ISO-8859-1;
   }
}
else {
  // default on nothing found
  $charset = ISO-8859-1;
  $lancode  = en;
}

$result[0] = $charset;
$result[1] = $lancode;

return $result;
 }

*NOTE* the !$this-has_content($language[$i]) call goes to a local 
function of yours that will return true/false, depending on whether you 
have available content for this language.

Charset request will default to ISO-8859-1 when your first languace 
choice is unavailable, because there is no data about further languages 
in the headers. You might want this to became a utf-8 value.

Function will not negotiate charset against content availability (as 
usually you will not have separate content editions for different 
charsets in your content repository). But you may easily add up the code 
snippet needed to do it, if that is your case.

If you do please share the result :)


ðÏËÁ
áÌØÂÅÒÔÏ
ëÉÅ×

-_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-

LoRd, CaN yOu HeAr Me, LiKe I'm HeArInG yOu?
lOrD i'M sHiNiNg...
YoU kNoW I AlMoSt LoSt My MiNd, BuT nOw I'm HoMe AnD fReE
tHe TeSt, YeS iT iS
ThE tEsT, yEs It Is
tHe TeSt, YeS iT iS
ThE tEsT, yEs It Is...


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Jean-Christian Imbeault

Thanks for the ideas Alberto but what I relly want is to understand how 
to use the mbstring library, not how to implement a new solution. 
mbstring is supposed to do everything I want, I guess I am just not 
quite understanding how to use it yet.

Also in regard to some things you said,

 Now, let's make sure we have a clear background:
   1) japanese chars come in three flavours:
   a) ISO-2022-JP (the one you are using yourself)
   b) SHIFT-JIS
   c) EUC-JP
   2) your database setting requires you input in c) style while
  you present it values in a) style.


1) is true but irrelavant. I am assuming that mbstring can automatically 
detect and convert the user's input.

 
 *Database configuration* if possible, turn your Postgres configuration 
 into one that will all three charsets


Impossible, though it would be nice. Postgres can only accept one 
charset for it's input not multiple.

 *charset forcing* have your input page always delivered in standard 
 format,


My page is always in the same charset, the problem is that the user 
input might not be ...

Jc


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Alberto Serra

ðÒÉ×ÅÔ!

Jean-Christian Imbeault wrote:
 Impossible, though it would be nice. Postgres can only accept one 
 charset for it's input not multiple.

I hope you mean one charset per language. Otherwise I can just cancel 
POstgres from my list of usable engines. But yes, it can't be just one.

 *charset forcing* have your input page always delivered in standard 
 format,
 My page is always in the same charset, the problem is that the user 
 input might not be ...

You mean that browsers will accept charset mixing in japanese? You 
explicitely declare charset=mycharset in the page headers and the 
damned thing returns input in charset=hischarset??? Now that's a awful 
surprise to me. What browser does that?

ÐÏËÁ
áÌØÂÅÒÔÏ
ëÉÅ×

-_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-

LoRd, CaN yOu HeAr Me, LiKe I'm HeArInG yOu?
lOrD i'M sHiNiNg...
YoU kNoW I AlMoSt LoSt My MiNd, BuT nOw I'm HoMe AnD fReE
tHe TeSt, YeS iT iS
ThE tEsT, yEs It Is
tHe TeSt, YeS iT iS
ThE tEsT, yEs It Is...


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Alberto Serra

ðÒÉ×ÅÔ!
 Jean-Christian Imbeault wrote:
 My page is always in the same charset, the problem is that the user 
 input might not be ...

Okay, I went thru a bit of docs on the japanese multibyte problem and 
got some surface understanding of the problem. Yes, since char 
dimensions are going to be different I see why browser would end-up 
mixing up the input. If you find any interesting site explaining how to 
do this please share. I'll be extending an existing content-repository 
to add chinese text management in the winter so I'll better start to 
worry about it. Thanks in advance.

As for your problem, I am afraid you would better turn to a japanese 
programmers' mailing list. That's if you speak japanese yourself, but 
you seem to do, so...

ÐÏËÁ
áÌØÂÅÒÔÏ
ëÉÅ×

-_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-@-_=}{=_--_=}{=_-

LoRd, CaN yOu HeAr Me, LiKe I'm HeArInG yOu?
lOrD i'M sHiNiNg...
YoU kNoW I AlMoSt LoSt My MiNd, BuT nOw I'm HoMe AnD fReE
tHe TeSt, YeS iT iS
ThE tEsT, yEs It Is
tHe TeSt, YeS iT iS
ThE tEsT, yEs It Is...


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Jean-Christian Imbeault

Alberto Serra wrote:

 
 I hope you mean one charset per language. Otherwise I can just cancel 
 POstgres from my list of usable engines. But yes, it can't be just one.


I'm no pgsql expert but I think that yes, it will only accept input in 
one charset. But for charsets that use only 8-bits I think you can 
insert data that is in more than one charset.

But for charsets that use more than 8-bits I think pgsql actually checks 
that the input is in the charset the DB expects it to be in.

 
Jc


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] mbstring: Japanese conversion not working for me

2002-07-08 Thread Jean-Christian Imbeault

Alberto Serra wrote:



 I'll be extending an existing content-repository 
 to add chinese text management in the winter so I'll better start to 
 worry about it.


Worry and worry a lot ...


 As for your problem, I am afraid you would better turn to a japanese 
 programmers' mailing list. That's if you speak japanese yourself, but 
 you seem to do, so...


Nope, I don't speak japanese ...so this ML is all I can turn to :(

Jc


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php