Re: UTF-8 flags (again)
On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote: On Tue, Sep 07, 2004 at 04:03:21PM -0700, David Wheeler wrote: On Sun, 08 Aug 2004 12:33:22 -0700, Tim Bunce wrote: I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = $mode 0: Force SvUTF8_off regardless undef: Do nothing (leave it up to the driver) 1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off 2: Force SvUTF8_on regardless (with a way to set it via bind_col as well) And perhaps a $dbh->{SetUTF8} = $mode; to provide a default. Umm, it's just dawned on me that the persistance of the utf8 flag across sv_set functions means I could implement all but "1" in DBI v1. (Option "1" requires looking at the value that's just been set and that not simple/efficient for DBI v1.) Hey, I just ran into a situation where I could really use this. http://bugs.bricolage.cc/show_bug.cgi?id=709#c14 Tim, do you think this might make it into the next release of DBI v1? I was thinking of doing at least "0" on that list for DBI 1.44. I'd especially like to do $dbh->{SetUTF8} = 2; And be done with it. I'll take a look. Patches welcome, of course! I just started looking at this myself, but I'm not making much progress. C and XS are still somewhat of a black art to me. Tim, could you perhaps give me some pointers where to start on this? I'd like to get a database handle-level SetUTF8 working with 0 and 2 to affect whether the utf8 flag is on or off for all data fetched from the database. I started looking at how to do it in PurePerl. It looks like it'd be fairly straight-forward to do it in _set_fbav(), yes? If so, how would you like to handle backwards compatibility? I thought I'd load Encode and use its functions to turn the utf8 flag on and off on values, but it's only available in Perl 5.8.0 and later... Thanks! David
Re: UTF-8 flags (again)
On Sep 8, 2004, at 12:58 PM, Tim Bunce wrote: That's the trivial bit :) The fiddly bit is handling the SetUTF8 attribute (and corresponding bit flags to make it fast enough). But thanks anyway :) Ah well, sorry I can't be more help... Regards, David
Re: UTF-8 flags (again)
On Wed, Sep 08, 2004 at 09:15:36AM -0700, David Wheeler wrote: > On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote: > > >I was thinking of doing at least "0" on that list for DBI 1.44. > > > >>I'd especially like to do > >> > >> $dbh->{SetUTF8} = 2; > >> > >>And be done with it. > > > >I'll take a look. Patches welcome, of course! > > Hey, if I knew any C...I can paste these from Encode.xs, at least: That's the trivial bit :) The fiddly bit is handling the SetUTF8 attribute (and corresponding bit flags to make it fast enough). But thanks anyway :) Tim.
Re: UTF-8 flags (again)
On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote: I was thinking of doing at least "0" on that list for DBI 1.44. I'd especially like to do $dbh->{SetUTF8} = 2; And be done with it. I'll take a look. Patches welcome, of course! Hey, if I knew any C...I can paste these from Encode.xs, at least: _utf8_on(sv) SV *sv CODE: { if (SvPOK(sv)) { SV *rsv = newSViv(SvUTF8(sv)); RETVAL = rsv; SvUTF8_on(sv); } else { RETVAL = &PL_sv_undef; } } OUTPUT: RETVAL SV * _utf8_off(sv) SV *sv CODE: { if (SvPOK(sv)) { SV *rsv = newSViv(SvUTF8(sv)); RETVAL = rsv; SvUTF8_off(sv); } else { RETVAL = &PL_sv_undef; } } OUTPUT: RETVAL PS: I assume that if I do: my $data = $utf8_data; where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is that correct? Yes. Great, I figured as much. Thanks! David
Re: UTF-8 flags (again)
On Tue, Sep 07, 2004 at 04:03:21PM -0700, David Wheeler wrote: > On Sun, 08 Aug 2004 12:33:22 -0700, Tim Bunce wrote: > > >I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = > >$mode > > > >0: Force SvUTF8_off regardless > >undef: Do nothing (leave it up to the driver) > >1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off > >2: Force SvUTF8_on regardless > > > >(with a way to set it via bind_col as well) > > > >And perhaps a $dbh->{SetUTF8} = $mode; to provide a default. > > > >Umm, it's just dawned on me that the persistance of the utf8 flag > >across sv_set functions means I could implement all but "1" in DBI v1. > >(Option "1" requires looking at the value that's just been set and > >that not simple/efficient for DBI v1.) > > Hey, I just ran into a situation where I could really use this. > > http://bugs.bricolage.cc/show_bug.cgi?id=709#c14 > > Tim, do you think this might make it into the next release of DBI v1? I was thinking of doing at least "0" on that list for DBI 1.44. > I'd especially like to do > > $dbh->{SetUTF8} = 2; > > And be done with it. I'll take a look. Patches welcome, of course! > Cheers, > > David > > PS: I assume that if I do: > > my $data = $utf8_data; > > where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is > that correct? Yes. Tim.
Re: UTF-8 flags (again)
On Sun, 08 Aug 2004 12:33:22 -0700, Tim Bunce wrote: I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = $mode 0: Force SvUTF8_off regardless undef: Do nothing (leave it up to the driver) 1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off 2: Force SvUTF8_on regardless (with a way to set it via bind_col as well) And perhaps a $dbh->{SetUTF8} = $mode; to provide a default. Umm, it's just dawned on me that the persistance of the utf8 flag across sv_set functions means I could implement all but "1" in DBI v1. (Option "1" requires looking at the value that's just been set and that not simple/efficient for DBI v1.) Hey, I just ran into a situation where I could really use this. http://bugs.bricolage.cc/show_bug.cgi?id=709#c14 Tim, do you think this might make it into the next release of DBI v1? I'd especially like to do $dbh->{SetUTF8} = 2; And be done with it. Cheers, David PS: I assume that if I do: my $data = $utf8_data; where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is that correct?
Re: UTF-8 flags (again)
On Sun, Aug 08, 2004 at 06:15:39PM +0100, Matt Sergeant wrote: > On 8 Aug 2004, at 17:35, David Wheeler wrote: > > >On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote: > > > >>i.e. for every fetch call, you need to do: > >> > >> SvUTF8_off(AvARRAY(av)[i]); > >> > >>Now, people using your DBD can decide to upgrade the variable if they > >>wish to, but most people who don't need to will be unaffected. Or, more generally, explicitly call either SvUTF8_off or SvUTF8_on as appropriate, but be sure to call one of them for each field. Meanwhile I think it would be wise for the DBI to explicitly do SvUTF8_off on the elements of the internal row buffer before each row is fetched. That would avoid the utf8 flag 'leaking' from one row to the next. I'll do that for DBI 1.44. > >I think that this is fine as long as there's an easy way to upgrade > >the variable. I could use Encode::_utf8_on(), but that seems like more > >overhead than is necessary unless I've loaded Encode for some other > >use already. Perhaps there could be a module or even a DBI method that > >does the equivalent? > > > > # Psudeocode; > > sub utf8_on { SvUTF8_on($_[0]) } > > Certainly fairly easy to export that from the DBI. I'll do that (and utf8_off) for DBI 1.44. > Tim and I talked about long term plans for this, where the user might > specify in advance which columns he'd like UTF-8 turned on for, or some > (I thought horrible) heuristic method where the DBD automagically > decides to turn on the flag if it detects data that it can turn into > UTF-8 - but that sounds like a world of pain to me. Sure, but some apps/drivers may need the choice. I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = $mode 0: Force SvUTF8_off regardless undef: Do nothing (leave it up to the driver) 1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off 2: Force SvUTF8_on regardless (with a way to set it via bind_col as well) And perhaps a $dbh->{SetUTF8} = $mode; to provide a default. Umm, it's just dawned on me that the persistance of the utf8 flag across sv_set functions means I could implement all but "1" in DBI v1. (Option "1" requires looking at the value that's just been set and that not simple/efficient for DBI v1.) > Better IMHO would be an extension to bind_col - it should be trivial to > add an attribute in there. The downside being that not many people use > bind_col. Those that need to control utf8 settings need to make code changes anyway. Tim.
Re: UTF-8 flags (again)
On Aug 8, 2004, at 10:15 AM, Matt Sergeant wrote: Better IMHO would be an extension to bind_col - it should be trivial to add an attribute in there. The downside being that not many people use bind_col. No, but if it could be integrated with bind_columns(), so that several could be specified at once, it might do the trick. Regards, David smime.p7s Description: S/MIME cryptographic signature
Re: UTF-8 flags (again)
On 8 Aug 2004, at 17:35, David Wheeler wrote: On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote: i.e. for every fetch call, you need to do: SvUTF8_off(AvARRAY(av)[i]); Now, people using your DBD can decide to upgrade the variable if they wish to, but most people who don't need to will be unaffected. I think that this is fine as long as there's an easy way to upgrade the variable. I could use Encode::_utf8_on(), but that seems like more overhead than is necessary unless I've loaded Encode for some other use already. Perhaps there could be a module or even a DBI method that does the equivalent? # Psudeocode; sub utf8_on { SvUTF8_on($_[0]) } Certainly fairly easy to export that from the DBI. Tim and I talked about long term plans for this, where the user might specify in advance which columns he'd like UTF-8 turned on for, or some (I thought horrible) heuristic method where the DBD automagically decides to turn on the flag if it detects data that it can turn into UTF-8 - but that sounds like a world of pain to me. Better IMHO would be an extension to bind_col - it should be trivial to add an attribute in there. The downside being that not many people use bind_col. Matt.
Re: UTF-8 flags (again)
On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote: i.e. for every fetch call, you need to do: SvUTF8_off(AvARRAY(av)[i]); Now, people using your DBD can decide to upgrade the variable if they wish to, but most people who don't need to will be unaffected. I think that this is fine as long as there's an easy way to upgrade the variable. I could use Encode::_utf8_on(), but that seems like more overhead than is necessary unless I've loaded Encode for some other use already. Perhaps there could be a module or even a DBI method that does the equivalent? # Psudeocode; sub utf8_on { SvUTF8_on($_[0]) } Regards, David smime.p7s Description: S/MIME cryptographic signature