(Intro: The problem is how to retrieve a unicode character from a Jet-Database using DBI and DBD-ADO.)
On Sat, Dec 06, 2003 at 11:47:51PM +0800, Autrijus Tang wrote: > Hence, you'd need to explicitly convert bytestrings returned by > DBI into ustrings, using either utf8::decode, or Encode::decode_utf8. Thank you, I have tried this in several ways (listing below). It always seems as if the Unicode character 025A 602 LATIN SMALL LETTER SCHWA WITH HOOK contained in the Jet-Database can not be converted into anything anymore, because the information has already be lost. Possibly due to some driver "converting" it to ISO-8859-1. Now, I am CC-ing this e-Mail to Steffen Goeldner, possibly he knows something about how DBD-ADO is handling Unicode characters, which are not part of the ISO-8859-1 character set. My test script and its output follows. It now shows effects of various conversions as suggest by Autrijus. use utf8; use strict; use warnings; use DBI; use DBD::ADO; use Encode; print "\$DBI::VERSION = " . $DBI::VERSION . "\n"; print "\$DBD::ADO::VERSION = " . $DBD::ADO::VERSION . "\n"; print "\$Encode::VERSION = " . $Encode::VERSION . "\n"; sub show { my( $text )= @_; print "\n\nlength = (" . length( $text ) . ")\n"; print "text = [" . $text . "]\ncharcodes = "; for my $i ( 1 .. length( $text )) { print "text at $i =[ " . ord( substr( $text, 0, 1 )). " ]\n"; }} my $dbh = DBI->connect( "dbi:ADO:Provider=Microsoft." . "Jet.OLEDB.4.0;Data Source=c:\\tmp.mdb;" ); my $sth = $dbh->prepare( "SELECT tmp FROM tmp" ); $sth->execute(); my $row = $sth->fetchrow_hashref; my $text; $text = decode( "utf16", $row->{'tmp'} ); show( $text ); $text = decode( "ucs2", $row->{'tmp'} ); show( $text ); $text = decode( "utf8", $row->{'tmp'} ); show( $text ); $text = Encode::decode_utf8( $row->{'tmp'} ); show( $text ); $text = utf8::decode( $row->{'tmp'} ); show( $text ); $sth->finish(); $dbh->disconnect(); # this script is started using # perl, v5.8.0 built for MSWin32-x86-multi-thread # Binary build 804 provided by ActiveState Corp. # outputs the following text: $DBI::VERSION = 1.30 $DBD::ADO::VERSION = 2.81 $Encode::VERSION = 1.83 UTF-16:Partial character at C:/Perl58/lib/Encode.pm line 154. length = (0) text = [] UCS-2BE:Partial character at C:/Perl58/lib/Encode.pm line 154. charcodes = length = (0) text = [] charcodes = length = (1) text = [?] charcodes = text at 1 =[ 63 ] length = (1) text = [?] charcodes = text at 1 =[ 63 ] length = (1) text = [1] charcodes = text at 1 =[ 49 ]