Re: [RDBO] Doubt regarding proper module usage

John Siracusa Tue, 31 Jan 2006 08:19:25 -0800

On 1/31/06, Nilson Santos Figueiredo Junior <[EMAIL PROTECTED]> wrote:
> I was trying to perform some fairly simple benchmarks comparing
> Rose::DB::Object, Class::DBI and an in-house solution and came up with
> some disappointing results from Rose::DB::Object.
>
> I'd like to know if there's something I did wrong or that could be
> done in a better way since RoseDB was only marginally faster (about
> 40%) than Class::DBI (vanilla, without prefetch, nothing).


Depending on what you're testing, 40% isn't necessarily
"disappointing."  Let's look at what you're doing.

First, the class definitions look okay.  Just a quick tip:

> # Create a private registry for this class
> __PACKAGE__->registry(Rose::DB::Registry->new);

That's better written as:

    __PACKAGE__->use_private_registry;

Now the actual benchmarks:

> sub RDB {
>         for my $cidade (@{CidadeRDB::Manager->get_cidade(query=>[nome=> {
> like => '%a%' },])}) {
>                 my $x = $cidade->id;
>                 my $y = $cidade->nome;
>                 my $z = $cidade->estado_obj;
>
>                 my $var = $cidade->estado_obj->pais_obj->nome;
>         }
> }
>
> sub CDBI {
>         for my $cidade (CidadeDBI->search_like({nome =>'%a%'})) {
>                 my $x = $cidade->id;
>                 my $y = $cidade->nome;
>                 my $z = $cidade->estado;
>
>                 my $var = $cidade->estado->pais->nome;
>
>         }
> }

For these two simple tasks, let's think about what's being tested. 
The database is the same for each module, obviously.  But the
generated SQL is slightly different.  RDBO will try to fetch all
columns at once:

    SELECT id, name, estado FROM cidade WHERE name LIKE '%a$'

whereas your CDBI class will only fetch the primary key:

    SELECT id WHERE name LIKE '%a$'

Later, when you do this:

    my $x = $cidade->id;
    my $y = $cidade->nome;

CDBI will go back to the database to fetch those values.  To change
this, make all of the columns "essential" in your CDBI class:

    __PACKAGE__->columns(Essential => qw(id nome estado));

Alternately, you could make all of the columns "lazy" in your RDBO
class.  Then it will mimic the CDBI behavior.  The point is to do an
apples-to-apples comparison.  Another option is to try to make each
module go as fast as possible.  Your benchmark as it existed did
neither.

As for the RDBO example, there are a few things you can do to make it
go faster.  RDBO Manager methods accept a Rose::DB object parameter
through which they access the database.  If you do not supply one, a
new one will be created during each call, and then discarded at the
end.  By not passing a db object, you're forcing get_cidade() to
reconnect to the database each time.  Since the database connection is
class data in CDBI, it persists between calls in the CDBI example.

In any "real" application using many repeated RDBO Manager calls,
you'd create a single db object and then pass it to each call.  In
terms of the benchmark, it'd look like this:

  my $db = RDBSetup->new;

  sub RDB {
    for my $cidade (@{CidadeRDB::Manager->get_cidade(db => $db,
                        query => [ nome => { like => '%a%' } ])}) {
    ...
  }

Incorporating both these changes, now the SQL is the same and both
modules reuse a single database connection.  Now what's actually being
tested?  Basically, it boils down to:

1. SQL generation - How fast can the SQL query be generated based on
the "name like ..." abstract parameters?

2. Object instantiation - How fast can the objects be created from the row data?

The first item can be sped up a bit in RDBO by passing the
"query_is_sql" parameter with a true value.

    CidadeRDB::Manager->get_cidade(db => $db,
                                   query_is_sql => 1,
                                   query => [ ... ]);

This parameter tells the Manager that there's no need to try to parse
and reformat the query arguments.  They are already formatted
correctly for the database.  This is true of the lone query parameter,
'%a%', so it's safe to include that flag.

(An example of where it wouldn't be safe is a query that includes a
"casually formatted" date string or a DateTime object or something
else that has to be parsed and then reformatted for the current
database.)

As for the second thing being tested, object instantiation, that
depends on how many rows are actually being returned.  If no rows are
returned, then it isn't a factor at all and you're just testing SQL
generation speed.  If thousands of rows are returned, then the SQL
generation speed will fade away into the noise and you're just testing
object instantiation speed.

Finally, we come to this part:

    # RDBO
    my $var = $cidade->estado_obj->pais_obj->nome;

    # CDBI
    my $var = $cidade->estado->pais->nome;

That is, fetching a related object through a foreign key.  As written,
your benchmark hits the database once for each row in order to fetch
the related object.  This is an apples-to-apples comparison (or
rather, will be once you make all the columns "Essential" in that CDBI
class, or lazy in that RDBO class; it's the same issue as before).

But any real code written using an RDBO Manager would almost certainly
opt to fetch that related object in a single query rather than making
one extra query per row.  To do this, use the require_objects
parameter:

    CidadeRDB::Manager->get_cidade(db => $db,
                                   query_is_sql => 1,
                                   require_objects => [ 'estado_obj' ],
                                   query => [ ... ]);

The resulting SQL would be something like:

    SELECT
      t1.id,
      t1.name,
      t1.estado,
      t2.id,
      t2.nome,
      t2.pais
    FROM
      cidade t1,
      estado t2
    WHERE
      t1.estado = t2.id AND
      t1.name LIKE '%a$'

Since this feature is not present in Class::DBI, the comparison is now
"unfair" from one perspective.  OTOH, artificially tying RDBO's hands
behind its back will not lead to very representative benchmark
results.  But it all depends on what you're trying to test.

(There are further speed improvements to be had in the RDBO example by
making a custom Manager method with a pre-generated SQL query, but
that's only worthwhile for queries that you plan to run a lot.)

To sum up, it's important decide what it is you really want to test. 
Making good benchmarks is hard.  I did my best to isolate the
performance of several common tasks in the benchmark suite that is
bundled with Rose::DB::Object. (The script is located at:
t/benchmarks/bench.pl)  I tried to make each module go as fast as
possible while accomplishing the same task, since that's what people
will do in the real world.

If you want to pursue your own benchmarks, consider the changes I
recommended above.  But also please take a long look at the source
code and classes for the bench.pl script before you decide that you
need to test something that isn't represented there.

-John


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object

Re: [RDBO] Doubt regarding proper module usage

Reply via email to