On Sat, Dec 06, 2003 at 10:30:40AM -0500, David Graff wrote: > It would be worthwhile to use some other mode of access to confirm this. > It's possible that non-utf8 text data are being stored into tables in > some way that you don't expect or don't directly control.
As Etienne has written, it is stored in UCS-2/UTF-16. I could use UTF-16 in perl as well - after all, I can recode anything as long as no information is lost. But that is what seems to happen here. I now have created a database "c:\tmp.mdb" with a table "tmp" and a text-field "tmp" containing a single character, namely the schwar (an IPA-phonetic character not contained in ISO-8859-1). The following perl script behaves as if DBI or the ADO driver would try to convert the text to ISO-8859-1. Because it can not do so here, it seems to convert it to a question mark. # this is started using # perl, v5.8.0 built for MSWin32-x86-multi-thread # Binary build 804 provided by ActiveState Corp. use strict; use warnings; use DBI; # error handling and clean-up actions are # intentionally missing from this ad-hoc script my $dbh = DBI->connect( "dbi:ADO:Provider=Microsoft." . "Jet.OLEDB.4.0;Data Source=c:\\tmp.mdb;" ); my $sth = $dbh->prepare( "SELECT tmp FROM tmp" ); $sth->execute(); my $row = $sth->fetchrow_hashref; my $text = $row->{'tmp'}; print "[" . $text . "]"; # prints "[?]" print "[" . ord( substr( $text, 0, 1 )). "]"; # prints "[63]" $sth->finish(); $dbh->disconnect(); Output is: [?][63] > Data going to or from a database is supposed to pass through DBI without > modification of any sort. Then may be tha ADO driver or the Jet engine does some conversion? I have already been looking for a switch to turn this off. > If you have a utf8-encoded string and put this into a table via an > insert or update operation, that specific byte sequence should be > retrievable from the table later on, via a normal query. I have inserted a schwar using Microsoft® Access. It appears on the screen as a visible schwar character. Somethink like: ### # # # # # ######## # # # ### # # # # ### According to Etienne this should be stored as UCS-2/UTF-16. > when you query for that string, you should contact the author of the > dbi:ADO driver module. Ok, I will try that, too. Thank you and Etienne. (I am reading the mailing-list using the web archive. This might mean, that I read some messages after a delay.)