On Tue, Aug 20, 2002 at 06:05:32PM +0200, Merijn van den Kroonenberg wrote: > > > In general the quote() method should be as aware of utf8 as the > > database is. If the database supports utf8 then the quote() method > > should do-the-right-thing or else it's broken and needs fixing. > > Well, when i quote it manually: > > ############################################################ > # utf8_quote(string) > sub utf8_quote($){ > my $astring = shift; > $astring =~ s/(['"\\\0])/\\$1/g; > return "'".$astring."'"; > }# utf8_quote > ############################################################ > > Then i can store and retrieve it just fine. So i guess it supports utf8 ;-)
It may just be storing a sequence of bytes. (You can check by using SQL functions like LENGTH() and SUBSTRING() on it.) Tim. > > > Oh yeah, one other thing, since Encode::_utf8_on is a internal function, > > > wouldn't it be better to use Encode::decode("utf8",$somevar) instead? As > far > > > as i can see, it should do exactly the same, but if i am mistaken, let > me > > > know :) > > > > Encode::_utf8_on *just* sets the internal uft8 flag bit on the value > > which *must* be already valid uft8 (or else you'll get problems later). > > > > I believe Encode::decode is different (but I've never used either and > > could easily not know what I'm talking about :) > > from perldoc Encode > CAVEAT: When you run "$string = decode("utf8", > $octets)", then $string may not be equal to $octets. > Though they both contain the same data, the utf8 flag > for $string is on unless $octets entirely consists of > ASCII data (or EBCDIC on EBCDIC machines). See "The > UTF-8 flag" below. > > Thats why i got that idea, so i wondered, cause it also seems to set the > utf8 flag, and leave the data alone. Not sure tho. > > > > > > Tim. > > Thank you for the swift reply, > > Merijn van den Kroonenberg > > > > > > Thank you, > > > Merijn van den Kroonenberg > > > > > > > > > ----- Original Message ----- > > > From: "SADAHIRO Tomoyuki" <[EMAIL PROTECTED]> > > > To: "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> > > > Cc: <[EMAIL PROTECTED]> > > > Sent: Thursday, August 15, 2002 3:12 PM > > > Subject: Re: perl, unicode and databases (mysql) > > > > > > > > > > > > > > On Tue, 13 Aug 2002 14:09:37 +0200 > > > > "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote: > > > > > > > > > Hi all, > > > > > > > > > > I have a perl application (perl 5.8.0) which puts utf8 data in a > mysql > > > > > database. This seems to work pretty well, and the retrieving of the > data > > > > > with perl also works. Using something like this: > > > > > > > > > > my $sth = $db_handle->prepare("SELECT some query"); > > > > > $sth->execute; > > > > > my @row=$sth->fetchrow_array; > > > > > print $row[0]."\n"; #### print before > > > > > if ($]>5.007){ > > > > > require Encode; > > > > > Encode::_utf8_on($row[0]);} > > > > > print $row[0]."\n"; #### print after > > > > > $sth->finish; > > > > > > > > > > The Encode utf8_on gives me back good data. As far as i understood > the > > > > > _utf8_on method doesnt do any real conversions, but only switches > the > > > utf > > > > > flag of a perl string? > > > > > > > > > > If you compare the two prints in above example, then it seems that > after > > > the > > > > > utf flag is set the string is utf decoded. This results in the > correct > > > > > string, so it seems the original string is utf encoded (double > encoded, > > > > > since it already was UTF). > > > > > > > > > > When i select the same string manually (mysql prompt) or with PHP, > then > > > i > > > > > get back the double encoded string. So it seems to me that the > double > > > > > encoded format is how perl stores it internally (and also in the > > > database)? > > > > > But this doesnt sound right to me...this would mean that everytime a > utf > > > > > flagged string is used it would need to be utf decoded. That sounds > not > > > very > > > > > effecient to me, so i doubt its done that way. But meanwhile i have > no > > > idea > > > > > how its done...and how its stored in the database. > > > > > > > > > > As you might have guessed i want to access the data i put in the > > > database > > > > > with PHP, but i get back double utf encoded data there. The problem > > > could be > > > > > in alot of different places, for example my fetching in PHP, storing > in > > > perl > > > > > and maybe somewhere else where i have some faulty conversion. To > check > > > if > > > > > the data in the database is correct i tried to figure out how perl > works > > > > > with the data. > > > > > > > > > > Maybe someone could put me on the right track, because this got me > > > mighty > > > > > confused ;-) > > > > > > > > To look what Perl's scalar holds, > > > > use Devel/Peek.pm. > > > > > > > > #!perl > > > > use Devel::Peek; > > > > use Encode; > > > > > > > > our $camel_utf8 = "\351\247\261\351\247\235"; > > > > > > > > print STDERR "* _utf8_on\n\n"; > > > > Encode::_utf8_on($camel_utf8); > > > > Dump($camel_utf8); > > > > > > > > print STDERR "\n"; > > > > > > > > print STDERR "* _utf8_off\n\n"; > > > > Encode::_utf8_off($camel_utf8); > > > > Dump($camel_utf8); > > > > > > > > __END__ > > > > > > > > The output is like this. > > > > The difference between _on and _off is found in FLAGS. > > > > > > > > * _utf8_on > > > > > > > > SV = PV(0x1661c60) at 0x166cccc > > > > REFCNT = 1 > > > > FLAGS = (POK,pPOK,UTF8) > > > > PV = 0x16db4e0 "\351\247\261\351\247\235"\0 [UTF8 > "\x{99f1}\x{99dd}"] > > > > CUR = 6 > > > > LEN = 7 > > > > > > > > * _utf8_off > > > > > > > > SV = PV(0x1661c60) at 0x166cccc > > > > REFCNT = 1 > > > > FLAGS = (POK,pPOK) > > > > PV = 0x16db4e0 "\351\247\261\351\247\235"\0 > > > > CUR = 6 > > > > LEN = 7 > > > > > > > > > > > > > > > > SADAHIRO Tomoyuki > > > > > > > > > > > > >