Re: UTF-8 flags (again)

2005-04-22 Thread David Wheeler
On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote:
On Tue, Sep 07, 2004 at 04:03:21PM -0700, David Wheeler wrote:
On Sun, 08 Aug 2004 12:33:22 -0700, Tim Bunce wrote:
I'm thinking in terms of something like $sth->{SetUTF8}->[$index] =
$mode
   0: Force SvUTF8_off regardless
   undef: Do nothing (leave it up to the driver)
   1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off
   2: Force SvUTF8_on regardless
(with a way to set it via bind_col as well)
And perhaps a $dbh->{SetUTF8} = $mode; to provide a default.
Umm, it's just dawned on me that the persistance of the utf8 flag
across sv_set functions means I could implement all but "1" in DBI 
v1.
(Option "1" requires looking at the value that's just been set and
that not simple/efficient for DBI v1.)
Hey, I just ran into a situation where I could really use this.
  http://bugs.bricolage.cc/show_bug.cgi?id=709#c14
Tim, do you think this might make it into the next release of DBI v1?
I was thinking of doing at least "0" on that list for DBI 1.44.
I'd especially like to do
  $dbh->{SetUTF8} = 2;
And be done with it.
I'll take a look. Patches welcome, of course!
I just started looking at this myself, but I'm not making much 
progress. C and XS are still somewhat of a black art to me. Tim, could 
you perhaps give me some pointers where to start on this? I'd like to 
get a database handle-level SetUTF8 working with 0 and 2 to affect 
whether the utf8 flag is on or off for all data fetched from the 
database.

I started looking at how to do it in PurePerl. It looks like it'd be 
fairly straight-forward to do it in _set_fbav(), yes? If so, how would 
you like to handle backwards compatibility? I thought I'd load Encode 
and use its functions to turn the utf8 flag on and off on values, but 
it's only available in Perl 5.8.0 and later...

Thanks!
David


Re: UTF-8 flags (again)

2004-09-08 Thread David Wheeler
On Sep 8, 2004, at 12:58 PM, Tim Bunce wrote:
That's the trivial bit :) The fiddly bit is handling the SetUTF8 
attribute
(and corresponding bit flags to make it fast enough).

But thanks anyway :)
Ah well, sorry I can't be more help...
Regards,
David


Re: UTF-8 flags (again)

2004-09-08 Thread Tim Bunce
On Wed, Sep 08, 2004 at 09:15:36AM -0700, David Wheeler wrote:
> On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote:
> 
> >I was thinking of doing at least "0" on that list for DBI 1.44.
> >
> >>I'd especially like to do
> >>
> >>  $dbh->{SetUTF8} = 2;
> >>
> >>And be done with it.
> >
> >I'll take a look. Patches welcome, of course!
> 
> Hey, if I knew any C...I can paste these from Encode.xs, at least:

That's the trivial bit :) The fiddly bit is handling the SetUTF8 attribute
(and corresponding bit flags to make it fast enough).

But thanks anyway :)

Tim.


Re: UTF-8 flags (again)

2004-09-08 Thread David Wheeler
On Sep 8, 2004, at 4:45 AM, Tim Bunce wrote:
I was thinking of doing at least "0" on that list for DBI 1.44.
I'd especially like to do
  $dbh->{SetUTF8} = 2;
And be done with it.
I'll take a look. Patches welcome, of course!
Hey, if I knew any C...I can paste these from Encode.xs, at least:
_utf8_on(sv)
SV *sv
CODE:
{
if (SvPOK(sv)) {
SV *rsv = newSViv(SvUTF8(sv));
RETVAL = rsv;
SvUTF8_on(sv);
} else {
RETVAL = &PL_sv_undef;
}
}
OUTPUT:
RETVAL
SV *
_utf8_off(sv)
SV *sv
CODE:
{
if (SvPOK(sv)) {
SV *rsv = newSViv(SvUTF8(sv));
RETVAL = rsv;
SvUTF8_off(sv);
} else {
RETVAL = &PL_sv_undef;
}
}
OUTPUT:
RETVAL
PS: I assume that if I do:
  my $data = $utf8_data;
where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is
that correct?
Yes.
Great, I figured as much. Thanks!
David


Re: UTF-8 flags (again)

2004-09-08 Thread Tim Bunce
On Tue, Sep 07, 2004 at 04:03:21PM -0700, David Wheeler wrote:
> On Sun, 08 Aug 2004 12:33:22 -0700, Tim Bunce wrote:
> 
> >I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = 
> >$mode
> >
> >0: Force SvUTF8_off regardless
> >undef: Do nothing (leave it up to the driver)
> >1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off
> >2: Force SvUTF8_on regardless
> >
> >(with a way to set it via bind_col as well)
> >
> >And perhaps a $dbh->{SetUTF8} = $mode; to provide a default.
> >
> >Umm, it's just dawned on me that the persistance of the utf8 flag
> >across sv_set functions means I could implement all but "1" in DBI v1.
> >(Option "1" requires looking at the value that's just been set and
> >that not simple/efficient for DBI v1.)
> 
> Hey, I just ran into a situation where I could really use this.
> 
>   http://bugs.bricolage.cc/show_bug.cgi?id=709#c14
> 
> Tim, do you think this might make it into the next release of DBI v1? 

I was thinking of doing at least "0" on that list for DBI 1.44.

> I'd especially like to do
> 
>   $dbh->{SetUTF8} = 2;
> 
> And be done with it.

I'll take a look. Patches welcome, of course!

> Cheers,
> 
> David
> 
> PS: I assume that if I do:
> 
>   my $data = $utf8_data;
> 
> where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is 
> that correct?

Yes.

Tim.


Re: UTF-8 flags (again)

2004-09-07 Thread David Wheeler
On Sun, 08 Aug 2004 12:33:22 -0700, Tim Bunce wrote:
I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = 
$mode

0: Force SvUTF8_off regardless
undef: Do nothing (leave it up to the driver)
1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off
2: Force SvUTF8_on regardless
(with a way to set it via bind_col as well)
And perhaps a $dbh->{SetUTF8} = $mode; to provide a default.
Umm, it's just dawned on me that the persistance of the utf8 flag
across sv_set functions means I could implement all but "1" in DBI v1.
(Option "1" requires looking at the value that's just been set and
that not simple/efficient for DBI v1.)
Hey, I just ran into a situation where I could really use this.
  http://bugs.bricolage.cc/show_bug.cgi?id=709#c14
Tim, do you think this might make it into the next release of DBI v1? 
I'd especially like to do

  $dbh->{SetUTF8} = 2;
And be done with it.
Cheers,
David
PS: I assume that if I do:
  my $data = $utf8_data;
where $utf8_data has SvUTF8_on that $data will also have SvUTF8_on. Is 
that correct?



Re: UTF-8 flags (again)

2004-08-08 Thread Tim Bunce
On Sun, Aug 08, 2004 at 06:15:39PM +0100, Matt Sergeant wrote:
> On 8 Aug 2004, at 17:35, David Wheeler wrote:
> 
> >On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote:
> >
> >>i.e. for every fetch call, you need to do:
> >>
> >>  SvUTF8_off(AvARRAY(av)[i]);
> >>
> >>Now, people using your DBD can decide to upgrade the variable if they 
> >>wish to, but most people who don't need to will be unaffected.

Or, more generally, explicitly call either SvUTF8_off or SvUTF8_on as
appropriate, but be sure to call one of them for each field.

Meanwhile I think it would be wise for the DBI to explicitly do SvUTF8_off
on the elements of the internal row buffer before each row is fetched.
That would avoid the utf8 flag 'leaking' from one row to the next.
I'll do that for DBI 1.44.

> >I think that this is fine as long as there's an easy way to upgrade 
> >the variable. I could use Encode::_utf8_on(), but that seems like more 
> >overhead than is necessary unless I've loaded Encode for some other 
> >use already. Perhaps there could be a module or even a DBI method that 
> >does the equivalent?
> >
> >  # Psudeocode;
> >  sub utf8_on { SvUTF8_on($_[0]) }
> 
> Certainly fairly easy to export that from the DBI.

I'll do that (and utf8_off) for DBI 1.44.

> Tim and I talked about long term plans for this, where the user might 
> specify in advance which columns he'd like UTF-8 turned on for, or some 
> (I thought horrible) heuristic method where the DBD automagically 
> decides to turn on the flag if it detects data that it can turn into 
> UTF-8 - but that sounds like a world of pain to me.

Sure, but some apps/drivers may need the choice.

I'm thinking in terms of something like $sth->{SetUTF8}->[$index] = $mode

0: Force SvUTF8_off regardless
undef: Do nothing (leave it up to the driver)
1: (value is well-formed utf8) ? SvUTF8_on : SvUTF8_off
2: Force SvUTF8_on regardless

(with a way to set it via bind_col as well)

And perhaps a $dbh->{SetUTF8} = $mode; to provide a default.

Umm, it's just dawned on me that the persistance of the utf8 flag
across sv_set functions means I could implement all but "1" in DBI v1.
(Option "1" requires looking at the value that's just been set and
that not simple/efficient for DBI v1.)

> Better IMHO would be an extension to bind_col - it should be trivial to 
> add an attribute in there. The downside being that not many people use 
> bind_col.

Those that need to control utf8 settings need to make code changes anyway.

Tim.


Re: UTF-8 flags (again)

2004-08-08 Thread David Wheeler
On Aug 8, 2004, at 10:15 AM, Matt Sergeant wrote:
Better IMHO would be an extension to bind_col - it should be trivial 
to add an attribute in there. The downside being that not many people 
use bind_col.
No, but if it could be integrated with bind_columns(), so that several 
could be specified at once, it might do the trick.

Regards,
David


smime.p7s
Description: S/MIME cryptographic signature


Re: UTF-8 flags (again)

2004-08-08 Thread Matt Sergeant
On 8 Aug 2004, at 17:35, David Wheeler wrote:
On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote:
i.e. for every fetch call, you need to do:
  SvUTF8_off(AvARRAY(av)[i]);
Now, people using your DBD can decide to upgrade the variable if they 
wish to, but most people who don't need to will be unaffected.
I think that this is fine as long as there's an easy way to upgrade 
the variable. I could use Encode::_utf8_on(), but that seems like more 
overhead than is necessary unless I've loaded Encode for some other 
use already. Perhaps there could be a module or even a DBI method that 
does the equivalent?

  # Psudeocode;
  sub utf8_on { SvUTF8_on($_[0]) }
Certainly fairly easy to export that from the DBI.
Tim and I talked about long term plans for this, where the user might 
specify in advance which columns he'd like UTF-8 turned on for, or some 
(I thought horrible) heuristic method where the DBD automagically 
decides to turn on the flag if it detects data that it can turn into 
UTF-8 - but that sounds like a world of pain to me.

Better IMHO would be an extension to bind_col - it should be trivial to 
add an attribute in there. The downside being that not many people use 
bind_col.

Matt.


Re: UTF-8 flags (again)

2004-08-08 Thread David Wheeler
On Aug 8, 2004, at 9:14 AM, Matt Sergeant wrote:
i.e. for every fetch call, you need to do:
  SvUTF8_off(AvARRAY(av)[i]);
Now, people using your DBD can decide to upgrade the variable if they 
wish to, but most people who don't need to will be unaffected.
I think that this is fine as long as there's an easy way to upgrade the 
variable. I could use Encode::_utf8_on(), but that seems like more 
overhead than is necessary unless I've loaded Encode for some other use 
already. Perhaps there could be a module or even a DBI method that does 
the equivalent?

  # Psudeocode;
  sub utf8_on { SvUTF8_on($_[0]) }
Regards,
David


smime.p7s
Description: S/MIME cryptographic signature