On Tue, Aug 20, 2002 at 06:05:32PM +0200, Merijn van den Kroonenberg wrote:
> 
> > In general the quote() method should be as aware of utf8 as the
> > database is.  If the database supports utf8 then the quote() method
> > should do-the-right-thing or else it's broken and needs fixing.
> 
> Well, when i quote it manually:
> 
> ############################################################
> # utf8_quote(string)
> sub utf8_quote($){
>   my $astring = shift;
>   $astring =~ s/(['"\\\0])/\\$1/g;
>   return "'".$astring."'";
> }# utf8_quote
> ############################################################
> 
> Then i can store and retrieve it just fine. So i guess it supports utf8 ;-)

It may just be storing a sequence of bytes. (You can check by using
SQL functions like LENGTH() and SUBSTRING() on it.)

Tim.

> > > Oh yeah, one other thing, since Encode::_utf8_on is a internal function,
> > > wouldn't it be better to use Encode::decode("utf8",$somevar) instead? As
> far
> > > as i can see, it should do exactly the same, but if i am mistaken, let
> me
> > > know :)
> >
> > Encode::_utf8_on *just* sets the internal uft8 flag bit on the value
> > which *must* be already valid uft8 (or else you'll get problems later).
> >
> > I believe Encode::decode is different (but I've never used either and
> > could easily not know what I'm talking about :)
> 
> from perldoc Encode
>  CAVEAT: When you run "$string = decode("utf8",
>          $octets)", then $string may not be equal to $octets.
>          Though they both contain the same data, the utf8 flag
>          for $string is on unless $octets entirely consists of
>          ASCII data (or EBCDIC on EBCDIC machines).  See "The
>          UTF-8 flag" below.
> 
> Thats why i got that idea, so i wondered, cause it also seems to set the
> utf8 flag, and leave the data alone. Not sure tho.
> 
> 
> >
> > Tim.
> 
> Thank you for the swift reply,
> 
> Merijn van den Kroonenberg
> 
> >
> > > Thank you,
> > > Merijn van den Kroonenberg
> > >
> > >
> > > ----- Original Message -----
> > > From: "SADAHIRO Tomoyuki" <[EMAIL PROTECTED]>
> > > To: "Merijn van den Kroonenberg" <[EMAIL PROTECTED]>
> > > Cc: <[EMAIL PROTECTED]>
> > > Sent: Thursday, August 15, 2002 3:12 PM
> > > Subject: Re: perl, unicode and databases (mysql)
> > >
> > >
> > > >
> > > > On Tue, 13 Aug 2002 14:09:37 +0200
> > > > "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I have a perl application (perl 5.8.0) which puts utf8 data in a
> mysql
> > > > > database. This seems to work pretty well, and the retrieving of the
> data
> > > > > with perl also works. Using something like this:
> > > > >
> > > > > my $sth = $db_handle->prepare("SELECT some query");
> > > > > $sth->execute;
> > > > > my @row=$sth->fetchrow_array;
> > > > > print $row[0]."\n"; #### print before
> > > > > if ($]>5.007){
> > > > >   require Encode;
> > > > >   Encode::_utf8_on($row[0]);}
> > > > > print $row[0]."\n"; #### print after
> > > > > $sth->finish;
> > > > >
> > > > > The Encode utf8_on gives me back good data. As far as i understood
> the
> > > > > _utf8_on method doesnt do any real conversions, but only switches
> the
> > > utf
> > > > > flag of a perl string?
> > > > >
> > > > > If you compare the two prints in above example, then it seems that
> after
> > > the
> > > > > utf flag is set the string is utf decoded. This results in the
> correct
> > > > > string, so it seems the original string is utf encoded (double
> encoded,
> > > > > since it already was UTF).
> > > > >
> > > > > When i select the same string manually (mysql prompt) or with PHP,
> then
> > > i
> > > > > get back the double encoded string. So it seems to me that the
> double
> > > > > encoded format is how perl stores it internally (and also in the
> > > database)?
> > > > > But this doesnt sound right to me...this would mean that everytime a
> utf
> > > > > flagged string is used it would need to be utf decoded. That sounds
> not
> > > very
> > > > > effecient to me, so i doubt its done that way. But meanwhile i have
> no
> > > idea
> > > > > how its done...and how its stored in the database.
> > > > >
> > > > > As you might have guessed i want to access the data i put in the
> > > database
> > > > > with PHP, but i get back double utf encoded data there. The problem
> > > could be
> > > > > in alot of different places, for example my fetching in PHP, storing
> in
> > > perl
> > > > > and maybe somewhere else where i have some faulty conversion. To
> check
> > > if
> > > > > the data in the database is correct i tried to figure out how perl
> works
> > > > > with the data.
> > > > >
> > > > > Maybe someone could put me on the right track, because this got me
> > > mighty
> > > > > confused ;-)
> > > >
> > > > To look what Perl's scalar holds,
> > > > use Devel/Peek.pm.
> > > >
> > > > #!perl
> > > > use Devel::Peek;
> > > > use Encode;
> > > >
> > > > our $camel_utf8 = "\351\247\261\351\247\235";
> > > >
> > > > print STDERR "* _utf8_on\n\n";
> > > > Encode::_utf8_on($camel_utf8);
> > > > Dump($camel_utf8);
> > > >
> > > > print STDERR "\n";
> > > >
> > > > print STDERR "* _utf8_off\n\n";
> > > > Encode::_utf8_off($camel_utf8);
> > > > Dump($camel_utf8);
> > > >
> > > > __END__
> > > >
> > > > The output is like this.
> > > > The difference between _on and _off is found in FLAGS.
> > > >
> > > > * _utf8_on
> > > >
> > > > SV = PV(0x1661c60) at 0x166cccc
> > > >   REFCNT = 1
> > > >   FLAGS = (POK,pPOK,UTF8)
> > > >   PV = 0x16db4e0 "\351\247\261\351\247\235"\0 [UTF8
> "\x{99f1}\x{99dd}"]
> > > >   CUR = 6
> > > >   LEN = 7
> > > >
> > > > * _utf8_off
> > > >
> > > > SV = PV(0x1661c60) at 0x166cccc
> > > >   REFCNT = 1
> > > >   FLAGS = (POK,pPOK)
> > > >   PV = 0x16db4e0 "\351\247\261\351\247\235"\0
> > > >   CUR = 6
> > > >   LEN = 7
> > > >
> > > >
> > > >
> > > > SADAHIRO Tomoyuki
> > > >
> > >
> > >
> >
> 

Reply via email to