----- Original Message ----- From: "Tim Bunce" <[EMAIL PROTECTED]> To: "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> Sent: Tuesday, August 20, 2002 6:35 PM Subject: Re: perl, unicode and databases (mysql)
> On Tue, Aug 20, 2002 at 06:05:32PM +0200, Merijn van den Kroonenberg wrote: > > > > > In general the quote() method should be as aware of utf8 as the > > > database is. If the database supports utf8 then the quote() method > > > should do-the-right-thing or else it's broken and needs fixing. > > > > Well, when i quote it manually: > > > > ############################################################ > > # utf8_quote(string) > > sub utf8_quote($){ > > my $astring = shift; > > $astring =~ s/(['"\\\0])/\\$1/g; > > return "'".$astring."'"; > > }# utf8_quote > > ############################################################ > > > > Then i can store and retrieve it just fine. So i guess it supports utf8 ;-) > > It may just be storing a sequence of bytes. (You can check by using > SQL functions like LENGTH() and SUBSTRING() on it.) Probably yes, but as long as i don't do any manipulation in the database like selecting on strings or sorting, it shouldn't matter, right? As long as the app that retrieves it from the database can work with utf. > > Tim. > > > > > Oh yeah, one other thing, since Encode::_utf8_on is a internal function, > > > > wouldn't it be better to use Encode::decode("utf8",$somevar) instead? As > > far > > > > as i can see, it should do exactly the same, but if i am mistaken, let > > me > > > > know :) > > > > > > Encode::_utf8_on *just* sets the internal uft8 flag bit on the value > > > which *must* be already valid uft8 (or else you'll get problems later). > > > > > > I believe Encode::decode is different (but I've never used either and > > > could easily not know what I'm talking about :) > > > > from perldoc Encode > > CAVEAT: When you run "$string = decode("utf8", > > $octets)", then $string may not be equal to $octets. > > Though they both contain the same data, the utf8 flag > > for $string is on unless $octets entirely consists of > > ASCII data (or EBCDIC on EBCDIC machines). See "The > > UTF-8 flag" below. > > > > Thats why i got that idea, so i wondered, cause it also seems to set the > > utf8 flag, and leave the data alone. Not sure tho. > >