RE: [dbi] Re: Adding utf8 support to DBD::mysql
On 30-Apr-2006 Patrick Galbraith wrote: Martin J. Evans wrote: Martin, Sure, I'll be glad to put in the code. If you can come up with tests, that's one thing we need. Whatever you like, and whatever works for you. I'll always give you due credit. I also would like to know how to have a test end its while loop if a version doesn't support the server it's running against - if you know that, that would be fabulous. I came up with a way by querying the database, then checking if a string is /^4/ or /^3/ (stored procs), if it matches that, set $state = 1. I think there may be a better way! (well, that'd be Test::More I think) As far as I can see there is no supported way of doing this _during_ the loop but there is a way to do it before the loop starts. That way, if you put your mysql 3 tests in one test file you can do as 60leak.t: eval { require Proc::ProcessTable; }; if ($@ || !$ENV{SLOW_TESTS}) { print 1..0 # Skip \$ENV{SLOW_TESTS} is not set or Proc::ProcessTable not installed \n; exit 0; } . . . # # Main loop; leave this untouched, put tests after creating # the new table. # while (Testing()) { You can skip individual tests by doing: if (!$state some_condition) { Skip(Reason); } else { Test($state or command); } e.g. [EMAIL PROTECTED]:~/dbd_mysql/svn/trunk$ perl t/texecute.t 1..8 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 # Skip reason ok 8 ok 9 If you apply the attached patch to subversion trunk you can if (!$state some_condition) { SkipAll(Reason) if ($state == 0); } Test($state or statement); . . [EMAIL PROTECTED]:~/dbd_mysql/svn/trunk$ perl t/texecute.t 1..9 ok 1 ok 2 ok 3 ok 4 # Skip reason ok 5 # Skip reason ok 6 # Skip reason ok 7 # Skip reason ok 8 # Skip reason ok 9 # Skip reason but that often leaves the tidying up, like deleting a table you created left lying around so there is also a SkipN: Test($state or statement); Test($state or statement); Test($state or statement); if (!$state some_condition) ( SkipN(Reason, 3,4,5,7); } Test($state or statement); Test($state or statement); Test($state or statement); Test($state or statement); Test($state or statement); Test($state or statement); [EMAIL PROTECTED]:~/dbd_mysql/svn/trunk$ perl t/texecute.t 1..9 ok 1 ok 2 ok 3 ok 4 # Skip skipn ok 5 # Skip skipn ok 6 # Skip skipn ok 7 # Skip skipn ok 8 ok 9 snipped rest of context not about tests Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com lib.patch Description: lib.patch
Re: Adding utf8 support to DBD::mysql
($row[0]); Michael Kröll (apr-10-2006) posted a change for DBD::db2 which seemed to sort this out for DB2 so a similar change could be added to mysql. Hope this helps. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com On 24-Apr-2006 Tim Bunce wrote: [I'm at the mysql conference and Patrick asked me about adding utf8 support to DBD::mysql. I said I'd look at the libmysql docs and give my thoughts. I'm posting to dbi-dev since it may be of interest to others interested in enhancing DBD::mysql and to other driver developers. These are just random thoughts from a quick look at the docs.] The keys mysql docs seem to be http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html The mysql api and client-server protocol doesn't support passing characterset info to the server on a per-statement / per-bind value basis. (http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.ht m l) So the sane way to send utf8 to the server is by setting the 'connection character set' to utf8 and then only sending utf8 (or its ASCII subset) to the server on that connection. *** Fetching data: MySQL 4.1.0 added unsigned int charsetnr to the MYSQL_FIELD structure. It's the character set number for the field. So set the UTF8 flag based on that value. Something like: (field-charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv); I couldn't see any docs for the values of the charsetnr field. Also, would be good to enable perl code to access the charsetnr values: $sth-{mysql_charsetnr}-[$i] *** Fetching Metadata: The above is a minimum. It doesn't address metadata like field names ($sth-{NAME}) that might also be in utf8. For that the driver needs to know if the 'connection character set' is currently utf8. (The docs mention mysql-charset but it's not clear if that's part of the public API.) However it's detected, the code needs to end up doing: (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv); on the metadata. *** SET NAMES '...' Intercept SET NAMES and call the mysql_set_character_set() API instead. See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html *** Detecting Inconsistencies If the connection character set is _not_ utf8 but the application calls the driver with data (or SQL statement) that has the UTF8 flag set, then it could issue a warning. In practice that may be to be too noisy for people that done their own workarounds for utf8 support. If so then they could be changes to level 1 trace messages. If the connection character set _is_ utf8, and the application calls the driver with data (or SQL statement) that does _not_ have the UTF8 flag set but _does_ have bytes with the high bit set, then the driver should issue a warning. The checking for high bit set is an extra cost so this should only be enabled if tracing and/or an attribute is set (perhaps called $dbh-{mysql_charset_checks} = 1) Tim.
Re: Adding utf8 support to DBD::mysql
it is utf8 print join( , unpack(H*, $row[0])), \n; # turning on utf8 causes the rignt uf8 sequence to be output # and hence sv_utf8_upgrade(sv) will probably work Encode::_utf8_on($row[0]); print data_string_desc (after fetch): , data_string_desc($row[0]),\n; open OUT, utf.out; binmode (OUT, :utf8); print OUT $row[0]; close OUT; # data written to utf.out is not UTF8 unless is marked utf8 produces: Is utf8::is_utf8 defined: 1 Is utf8::valid defined: 1 e298ba787878d790d8a7 length(str) = 6 bytes::length(str) = 10 utf8::is_utf8 = 1 data_string_desc: UTF8 on, non-ASCII, 6 characters 10 bytes data_string_desc (after fetch): UTF8 off, non-ASCII, 10 characters 10 bytes e298ba787878d790d8a7 with utf.out containing: C3A2C298C2BAxxxC397C290C398C2A7 without that Encode::_utf8_on($row[0]); Michael Kröll (apr-10-2006) posted a change for DBD::db2 which seemed to sort this out for DB2 so a similar change could be added to mysql. Hope this helps. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com On 24-Apr-2006 Tim Bunce wrote: [I'm at the mysql conference and Patrick asked me about adding utf8 support to DBD::mysql. I said I'd look at the libmysql docs and give my thoughts. I'm posting to dbi-dev since it may be of interest to others interested in enhancing DBD::mysql and to other driver developers. These are just random thoughts from a quick look at the docs.] The keys mysql docs seem to be http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html The mysql api and client-server protocol doesn't support passing characterset info to the server on a per-statement / per-bind value basis. (http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.ht m l) So the sane way to send utf8 to the server is by setting the 'connection character set' to utf8 and then only sending utf8 (or its ASCII subset) to the server on that connection. *** Fetching data: MySQL 4.1.0 added unsigned int charsetnr to the MYSQL_FIELD structure. It's the character set number for the field. So set the UTF8 flag based on that value. Something like: (field-charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv); I couldn't see any docs for the values of the charsetnr field. Also, would be good to enable perl code to access the charsetnr values: $sth-{mysql_charsetnr}-[$i] *** Fetching Metadata: The above is a minimum. It doesn't address metadata like field names ($sth-{NAME}) that might also be in utf8. For that the driver needs to know if the 'connection character set' is currently utf8. (The docs mention mysql-charset but it's not clear if that's part of the public API.) However it's detected, the code needs to end up doing: (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv); on the metadata. *** SET NAMES '...' Intercept SET NAMES and call the mysql_set_character_set() API instead. See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html *** Detecting Inconsistencies If the connection character set is _not_ utf8 but the application calls the driver with data (or SQL statement) that has the UTF8 flag set, then it could issue a warning. In practice that may be to be too noisy for people that done their own workarounds for utf8 support. If so then they could be changes to level 1 trace messages. If the connection character set _is_ utf8, and the application calls the driver with data (or SQL statement) that does _not_ have the UTF8 flag set but _does_ have bytes with the high bit set, then the driver should issue a warning. The checking for high bit set is an extra cost so this should only be enabled if tracing and/or an attribute is set (perhaps called $dbh-{mysql_charset_checks} = 1) Tim.
Re: Adding utf8 support to DBD::mysql
-user as to whether both or either of # the following should be set $dbh-do(set character set utf8); $dbh-do(set names utf8); $dbh-do(drop table if exists utf); $dbh-do(create table utf (a char(100)) default charset utf8); my $sth = $dbh-prepare(insert into utf values (?)); $sth-execute($str); $sth = $dbh-prepare(select * from utf); $sth-execute; my @row = $sth-fetchrow_array; print data_string_desc (after fetch): , data_string_desc($row[0]),\n; # the following shows we'e got the right data back # but perl does not know it is utf8 print join( , unpack(H*, $row[0])), \n; # turning on utf8 causes the rignt uf8 sequence to be output # and hence sv_utf8_upgrade(sv) will probably work Encode::_utf8_on($row[0]); print data_string_desc (after fetch): , data_string_desc($row[0]),\n; open OUT, utf.out; binmode (OUT, :utf8); print OUT $row[0]; close OUT; # data written to utf.out is not UTF8 unless is marked utf8 produces: Is utf8::is_utf8 defined: 1 Is utf8::valid defined: 1 e298ba787878d790d8a7 length(str) = 6 bytes::length(str) = 10 utf8::is_utf8 = 1 data_string_desc: UTF8 on, non-ASCII, 6 characters 10 bytes data_string_desc (after fetch): UTF8 off, non-ASCII, 10 characters 10 bytes e298ba787878d790d8a7 with utf.out containing: C3A2C298C2BAxxxC397C290C398C2A7 without that Encode::_utf8_on($row[0]); Michael Kröll (apr-10-2006) posted a change for DBD::db2 which seemed to sort this out for DB2 so a similar change could be added to mysql. Hope this helps. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com On 24-Apr-2006 Tim Bunce wrote: [I'm at the mysql conference and Patrick asked me about adding utf8 support to DBD::mysql. I said I'd look at the libmysql docs and give my thoughts. I'm posting to dbi-dev since it may be of interest to others interested in enhancing DBD::mysql and to other driver developers. These are just random thoughts from a quick look at the docs.] The keys mysql docs seem to be http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html The mysql api and client-server protocol doesn't support passing characterset info to the server on a per-statement / per-bind value basis. (http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.ht m l) So the sane way to send utf8 to the server is by setting the 'connection character set' to utf8 and then only sending utf8 (or its ASCII subset) to the server on that connection. *** Fetching data: MySQL 4.1.0 added unsigned int charsetnr to the MYSQL_FIELD structure. It's the character set number for the field. So set the UTF8 flag based on that value. Something like: (field-charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv); I couldn't see any docs for the values of the charsetnr field. Also, would be good to enable perl code to access the charsetnr values: $sth-{mysql_charsetnr}-[$i] *** Fetching Metadata: The above is a minimum. It doesn't address metadata like field names ($sth-{NAME}) that might also be in utf8. For that the driver needs to know if the 'connection character set' is currently utf8. (The docs mention mysql-charset but it's not clear if that's part of the public API.) However it's detected, the code needs to end up doing: (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv); on the metadata. *** SET NAMES '...' Intercept SET NAMES and call the mysql_set_character_set() API instead. See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html *** Detecting Inconsistencies If the connection character set is _not_ utf8 but the application calls the driver with data (or SQL statement) that has the UTF8 flag set, then it could issue a warning. In practice that may be to be too noisy for people that done their own workarounds for utf8 support. If so then they could be changes to level 1 trace messages. If the connection character set _is_ utf8, and the application calls the driver with data (or SQL statement) that does _not_ have the UTF8 flag set but _does_ have bytes with the high bit set, then the driver should issue a warning. The checking for high bit set is an extra cost so this should only be enabled if tracing and/or an attribute is set (perhaps called $dbh-{mysql_charset_checks} = 1) Tim.
Re: Adding utf8 support to DBD::mysql
::length($str), \n; print utf8::is_utf8 = , utf8::is_utf8($str) ? 1 : 0, \n; print data_string_desc: , data_string_desc($str),\n; open OUT, uni.out; binmode(OUT, :utf8); print OUT $str\n; # data written to uni.out is UTF8 my $dbh = DBI-connect(dbi:mysql:test, xxx, xxx); # there are posts on dbi-user as to whether both or either of # the following should be set $dbh-do(set character set utf8); $dbh-do(set names utf8); $dbh-do(drop table if exists utf); $dbh-do(create table utf (a char(100)) default charset utf8); my $sth = $dbh-prepare(insert into utf values (?)); $sth-execute($str); $sth = $dbh-prepare(select * from utf); $sth-execute; my @row = $sth-fetchrow_array; print data_string_desc (after fetch): , data_string_desc($row[0]),\n; # the following shows we'e got the right data back # but perl does not know it is utf8 print join( , unpack(H*, $row[0])), \n; # turning on utf8 causes the rignt uf8 sequence to be output # and hence sv_utf8_upgrade(sv) will probably work Encode::_utf8_on($row[0]); print data_string_desc (after fetch): , data_string_desc($row[0]),\n; open OUT, utf.out; binmode (OUT, :utf8); print OUT $row[0]; close OUT; # data written to utf.out is not UTF8 unless is marked utf8 produces: Is utf8::is_utf8 defined: 1 Is utf8::valid defined: 1 e298ba787878d790d8a7 length(str) = 6 bytes::length(str) = 10 utf8::is_utf8 = 1 data_string_desc: UTF8 on, non-ASCII, 6 characters 10 bytes data_string_desc (after fetch): UTF8 off, non-ASCII, 10 characters 10 bytes e298ba787878d790d8a7 with utf.out containing: C3A2C298C2BAxxxC397C290C398C2A7 without that Encode::_utf8_on($row[0]); Michael Kröll (apr-10-2006) posted a change for DBD::db2 which seemed to sort this out for DB2 so a similar change could be added to mysql. Hope this helps. Martin -- Martin J. Evans Easysoft Ltd, UK http://www.easysoft.com On 24-Apr-2006 Tim Bunce wrote: [I'm at the mysql conference and Patrick asked me about adding utf8 support to DBD::mysql. I said I'd look at the libmysql docs and give my thoughts. I'm posting to dbi-dev since it may be of interest to others interested in enhancing DBD::mysql and to other driver developers. These are just random thoughts from a quick look at the docs.] The keys mysql docs seem to be http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html The mysql api and client-server protocol doesn't support passing characterset info to the server on a per-statement / per-bind value basis. (http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.ht m l) So the sane way to send utf8 to the server is by setting the 'connection character set' to utf8 and then only sending utf8 (or its ASCII subset) to the server on that connection. *** Fetching data: MySQL 4.1.0 added unsigned int charsetnr to the MYSQL_FIELD structure. It's the character set number for the field. So set the UTF8 flag based on that value. Something like: (field-charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv); I couldn't see any docs for the values of the charsetnr field. Also, would be good to enable perl code to access the charsetnr values: $sth-{mysql_charsetnr}-[$i] *** Fetching Metadata: The above is a minimum. It doesn't address metadata like field names ($sth-{NAME}) that might also be in utf8. For that the driver needs to know if the 'connection character set' is currently utf8. (The docs mention mysql-charset but it's not clear if that's part of the public API.) However it's detected, the code needs to end up doing: (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv); on the metadata. *** SET NAMES '...' Intercept SET NAMES and call the mysql_set_character_set() API instead. See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html *** Detecting Inconsistencies If the connection character set is _not_ utf8 but the application calls the driver with data (or SQL statement) that has the UTF8 flag set, then it could issue a warning. In practice that may be to be too noisy for people that done their own workarounds for utf8 support. If so then they could be changes to level 1 trace messages. If the connection character set _is_ utf8, and the application calls the driver with data (or SQL statement) that does _not_ have the UTF8 flag set but _does_ have bytes with the high bit set, then the driver should issue a warning. The checking for high bit set is an extra cost so this should only be enabled if tracing and/or an attribute is set (perhaps called $dbh-{mysql_charset_checks} = 1) Tim.