2005/12/18, John Siracusa wrote:
[..snip..]
> > An alternative approach is to define new layer of classes on top of
> > for e.g. R:D:O::Metadata::Column::Varchar/Char/Text where to perform
> > the UTF decoding and then to change the type-to-class mapping in
> > R:D:O::Metadata. All my derived classes will use the new meta data
> > object and they will have Unicode support.
>
> That approach will have the best performance, but it's also the most work.

You are right about performance penalty in trigger approach but I saw
a benefit of triggers after playing with the code -- the trigger is
actually executed when I access the property for the very first time.
This may be beneficial if I work with many object instances and I
don't access all the UTF8 properties thus minimizing the UTF decoding
'on-demand'. Another cool thing is that R:D:O doesn't execute the
trigger on the next access.

> > So, my questions are:
> > 1. Am I doing the Right Thing by adding trigger to a column to perform
> > UTF8 conversion/inflation
>
> It's a valid approach, yes.  One way to automate it would be to simply
> define your own custom metadata class and then override
> make_column_methods() or initialize() to apply your UTF-8 trigger to all
> your columns automatically.  Example (untested code):
>
> sub make_column_methods
> {
>   my($self) = shift;
>
>   $self->SUPER::make_column_methods(@_);
>
>   foreach my $column ($self->columns)
>   {
>     next  unless($column->type =~ /^(?:text|varchar|character)$/);
>
>     $column->add_trigger(inflate => sub
>     {
>       my $self = shift;
>       my $value = shift;
>       if(!Encode::is_utf8($value))
>       {
>         $value = Encode::decode_utf8($value);
>       }
>       return $value;
>     });
>
>     return; # return value is not significant
>   }
>

Actually I did something similar to your example -- I redefined the
'add_columns' in my own meta class to add trigger after defining
column. I checked your example -- it also works.

However it is mastery to me why both automations don't work using
auto_initalize in my R:D:O object? It works fine when I manually
specify the columns/keys/relations (or copy/pasting the output of
perl_class_definition in the module). Any ideas?

> > 3. Do think of some other approach to convert octets coming from
> > database into Unicode scalars
>
> If you just want to convert data as it comes from the database, you might
> want to use an on_load trigger instead of an inflate trigger.  The trigger
> code would look slightly different:
>
> sub make_column_methods
> {
>   my($self) = shift;
>
>   $self->SUPER::make_column_methods(@_);
>
>   foreach my $column ($self->columns)
>   {
>     next  unless($column->type =~ /^(?:text|varchar|character)$/);
>
>     my $get_method = $column->accessor_method_name;
>     my $set_method = $column->mutator_method_name;
>
>     $column->add_trigger(on_load => sub
>     {
>       my $self = shift;
>
>       # Triggers disabled within a trigger, so no infinite recursion here
>       my $value = $self->$get_method();
>
>       if(!Encode::is_utf8($value))
>       {
>         $self->$set_method(Encode::decode_utf8($value));
>       }
>       return; # return value is not significant
>     });
>
>     return; # return value is not significant
>   }
> }

Sure - this works fine too (checked) -- I haven't decided yet what
trigger I'll use (on load or inflate) because the inflate can fix
octets coming from other sub-system which doesn't understand UTF8 and
just pass through octets.

> > 2. Could someone show an example code how to extend
> > R:D:O::Metadata::Column::Varchar column type so it can inflate the
> > values loaded from DB.
>
> I'd actually like this functionality to be in the core distribution.
>
> Were you to do it on your own, the best approach would be to make your own
> trivial subclasses of the column classes, then point those column classes at
> your own custom method maker classes.  Finally, make your own trivial
> metadata object subclass and map the appropriate column types to your new
> column classes.

I'm not sure I get this right. I do understand that I need custom meta
class in order to add my own mapping type->class. I'll create
subclasses of text/varchar classes. The mapping reuses existing
non-text classes in RDBO and uses my own text-classes. I don't
understand when method maker enters the game. A sample code would be
appreciated (or just ignore me :)

>
> There are "shorter" ways to do this, but the approach described above
> ensures that the default behavior remains unchanged for any RDBO-based
> classes that do not want your modified behavior.
>
> Anyway, like I said, I'd like to make this stuff built-in since it's a
> reasonably common task.  I'm thinking of perhaps making some column classes
> like this:
>
>     Rose::DB::Object::Metadata::Column::Varchar::WithEncoding
>                                         Character
>                                         Text
>
> and then adding attributes for "check encoding" and "set encoding."  Example
> usage:
>
>     __PACKAGE__->meta->columns
>     (
>       name =>
>       {
>         type   => 'varchar',
>         length => 255,
>         check_encoding => \&Encode::is_utf8,
>         set_encoding   => \&Encode::decode_utf8,
>       },
>       ...
>     );
>
> where the set_encoding function is called on the column value if the
> check_encoding function returns false when passed the current value.  Then
> it'd be trivial to add ::UTF8 column variants that simply hard-code the
> check/set_encoding functions.
>
> Perhaps I could even "intelligently" substitute these classes if a
> text-based field has its "utf8" attribute set.  Hmmm.
>
>     __PACKAGE__->meta->columns
>     (
>       name => { type   => 'varchar', length => 255, utf8 => 1 },
>       ...
>     );

The described new features above fit perfectly with my needs! However,
I'm not sure they should be in core RDBO distro since the whole UTF8
adventure roots in the lack of UTF8 support in DBD::mysql. I think
they should be in separate CPAN distro managing with UTF8
incompatibilities.

BTW, does someone has similar problems with Postgre/SQLite?

>
> Anyone have any suggestions for better approaches to this problem?  Is there
> an even more generic way to handle encoding/decoding?  Should these checks
> and operations be done on load only or on set as well?

I lean toward the trigger approach done on inflate because (as stated above):
(a) it is execute on demand
(b) converts any broken octets which aren't supposed to be in a
full-blown UTF8 app.

>
> -John

Thank you for the comprehensive and thorough answer! Keep up the good job!

- Svilen


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object

Reply via email to