Re: [RDBO] insert speed questions with RDBO

2008-01-23 Thread Ted Zlatanov
On Tue, 22 Jan 2008 20:09:55 -0500 John Siracusa [EMAIL PROTECTED] wrote: 

JS On Jan 22, 2008 11:42 AM, Ted Zlatanov [EMAIL PROTECTED] wrote:
 I'd like to try without autocommit; I tried the built-in RDBO
 transactions but those didn't work well for me.  begin_work was fine,
 but if an error happened anywhere in the transaction, rollback() didn't
 do anything and the DB rejected any further transactions.  I could post
 my code (I tried about 6 different ways), but I'd really like to know if
 there is an example in the docs anywhere of using transactions.

JS You can find some examples in the Rose::DB test suite:

JS http://search.cpan.org/src/JSIRACUSA/Rose-DB-0.737/t/trx.t

JS If you post one of your 6 ways and explain where it goes wrong, it'd
JS make it easier to figure out where the problem is.

I found one of the problems is that you forcibly set RaiseError to 1 in
a lot of places, in Rose::DB and Rose::DB::Object.  Why override the
connect_options?

My code is working with do_transaction for now, but begin_work/commit
should really be shown with an eval in the Rose::DB SYNOPSIS.  I got the
impression they would not die from the Rose::DB docs, so their
exceptions were killing an outer eval, and rollback() was not called...
I'd show you the code but it's not interesting or pretty.

I ran with DBI_TRACE set high and checked that the DBD commit was only
called as many times as I specified.  So I'm using transactions to
insert the rows, and RDBO is just not able to handle the load for my
particular task.  With COPY FROM, I can insert about 10-20 times faster
than RDBO, so I'll have to stick with that.  I'll mimic the RDBO SQL
statements and just run them I guess...

Ted

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-23 Thread John Siracusa
On Jan 23, 2008 2:07 PM, Ted Zlatanov [EMAIL PROTECTED] wrote:
 I found one of the problems is that you forcibly set RaiseError to 1 in
 a lot of places, in Rose::DB and Rose::DB::Object.  Why override the
 connect_options?

 My code is working with do_transaction for now, but begin_work/commit
 should really be shown with an eval in the Rose::DB SYNOPSIS.  I got the
 impression they would not die from the Rose::DB docs, so their
 exceptions were killing an outer eval, and rollback() was not called...
 I'd show you the code but it's not interesting or pretty.

The only instances I can think of off the top of my head are
local()ized changes so that I can use eval blocks internally for error
handling.  For example:

  sub whatever
  {
...
eval
{
  local $dbh-{'RaiseError'} = 1;
  # do stuff
};

if($@) { ... } # handler error
else   { ... } # or success
  }

If there's an instance where RaiseError is altered outside and eval
block and/or not changed back before a sub returns, it's probably a
bug, so send me any cases you've found that demonstrate this.

 I ran with DBI_TRACE set high and checked that the DBD commit was only
 called as many times as I specified.  So I'm using transactions to
 insert the rows, and RDBO is just not able to handle the load for my
 particular task.  With COPY FROM, I can insert about 10-20 times faster
 than RDBO, so I'll have to stick with that.  I'll mimic the RDBO SQL
 statements and just run them I guess...

Did you try plain DBI?  I suspect even that will be 10-15 times slower
than COPY FROM.  Native bulk loading is always going to win, by a lot.

-John

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-22 Thread Ted Zlatanov
On Mon, 07 Jan 2008 12:34:16 -0600 Ted Zlatanov [EMAIL PROTECTED] wrote: 

TZ On Sat, 5 Jan 2008 15:27:34 -0500 John Siracusa [EMAIL PROTECTED] 
wrote: 
JS On Jan 5, 2008 1:25 PM, Ted Zlatanov [EMAIL PROTECTED] wrote:
 I realize much of this question belongs to database-specific domains,
 but here goes: I want RDBO insert operations to be faster.  I need to
 load thousands of records per second into a database.

JS The first thing to investigate is using plain DBI.  If that's not fast
JS enough, then RDBO will never be any faster so you'll have to look
JS elsewhere.

JS The second thing to consider is bulk loading rather than INSERT
JS statements.  You can perhaps use RDBO to produce the (usually
JS db-specific) bulk-load files themselves, then point  the database at
JS the files using DBI and whatever the load from file syntax is for
JS your db.

TZ I'll try this.  The RDBO code will be read-only to let me know,
TZ essentially, if a row already exists (using RDBO::Cached).  Each run
TZ will generate a tab-separated file per table affected.  At the end of a
TZ run of new objects, I'll just exit the agent process and run COPY FROM
TZ rather than trying to flush all the caches; this is also necessary
TZ because of Perl's habit of releasing memory whenever it feels like doing
TZ it.  So I'll have to repopulate the cache occasionally, but that's not
TZ as bad as the churn I was seeing before with a 100% RDBO-driven insert
TZ process.

I can definitely keep up with the data flow when using COPY FROM.  As
soon as RDBO comes into the picture, though, I'm 3-4 times slower than
the incoming data.  I can write a customized loader that will extract
the unique foreign keys from the raw data and insert them before
the actual rows that use those foreign keys, but that's dancing around
the problem.

I'd like to try without autocommit; I tried the built-in RDBO
transactions but those didn't work well for me.  begin_work was fine,
but if an error happened anywhere in the transaction, rollback() didn't
do anything and the DB rejected any further transactions.  I could post
my code (I tried about 6 different ways), but I'd really like to know if
there is an example in the docs anywhere of using transactions.  My loop
is like this:

# @items comes from the outside
foreach my $item (@items)
{
 db_process($item);
}

Ideally I'd do, in pseudo-Perl:

begin_work
db_process(@items[0..200]);
commit

if (error)
{
 # maybe retry here
}
else
{
 delete @items[0..200];
}

Can anyonw show me how to do this properly, including the Rose::DB
initialization parameters please?

Thanks
Ted

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-22 Thread John Siracusa
On Jan 22, 2008 11:42 AM, Ted Zlatanov [EMAIL PROTECTED] wrote:
 I'd like to try without autocommit; I tried the built-in RDBO
 transactions but those didn't work well for me.  begin_work was fine,
 but if an error happened anywhere in the transaction, rollback() didn't
 do anything and the DB rejected any further transactions.  I could post
 my code (I tried about 6 different ways), but I'd really like to know if
 there is an example in the docs anywhere of using transactions.

You can find some examples in the Rose::DB test suite:

http://search.cpan.org/src/JSIRACUSA/Rose-DB-0.737/t/trx.t

If you post one of your 6 ways and explain where it goes wrong, it'd
make it easier to figure out where the problem is.

-John

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-07 Thread Ted Zlatanov
On Sat, 05 Jan 2008 14:07:05 -0500 Adam Prime [EMAIL PROTECTED] wrote: 

AP Ted Zlatanov wrote:
 I realize much of this question belongs to database-specific domains,
 but here goes: I want RDBO insert operations to be faster.  I need to
 load thousands of records per second into a database.  I'm using
 PostgreSQL, and everything works correctly.  The extra DB operations per
 insert are killing performance, however.  I would rather optimize this
 through RDBO than hand-code it in SQL, hence the questions below.

AP I don't know the internals, but the first thing i'd ask here is are you 
AP actually using the objects associated with these inserts, are you doing 
AP a thousand inserts, then never actually using the objects?  It seems to 
AP me that if you don't need the objects, it might make more sense to batch 
AP insert them into the DB outside of rose using whatever method is 
AP quickest on your DB.

Yes, I'm considering COPY FROM in PostgreSQL.  It's not an easy problem
because so many of my tables have foreign keys, so the agent that writes
the batch files has to be smart.

AP I assume you're using $obj-insert, not $obj-save right?

Because I have to use load_or_insert from RDBO::Helpers (10% of the data
fills in extra fields on existing rows) I can't avoid the SELECT+UPDATE
cycle.  Referential integrity is very expensive too, and using
RDBO::Cached as John suggested helped a bit, but not enough, I'm
definitely using COPY FROM.

Thanks
Ted

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-07 Thread Ted Zlatanov
On Sat, 5 Jan 2008 15:27:34 -0500 John Siracusa [EMAIL PROTECTED] wrote: 

JS On Jan 5, 2008 1:25 PM, Ted Zlatanov [EMAIL PROTECTED] wrote:
 I realize much of this question belongs to database-specific domains,
 but here goes: I want RDBO insert operations to be faster.  I need to
 load thousands of records per second into a database.

JS The first thing to investigate is using plain DBI.  If that's not fast
JS enough, then RDBO will never be any faster so you'll have to look
JS elsewhere.

JS The second thing to consider is bulk loading rather than INSERT
JS statements.  You can perhaps use RDBO to produce the (usually
JS db-specific) bulk-load files themselves, then point  the database at
JS the files using DBI and whatever the load from file syntax is for
JS your db.

I'll try this.  The RDBO code will be read-only to let me know,
essentially, if a row already exists (using RDBO::Cached).  Each run
will generate a tab-separated file per table affected.  At the end of a
run of new objects, I'll just exit the agent process and run COPY FROM
rather than trying to flush all the caches; this is also necessary
because of Perl's habit of releasing memory whenever it feels like doing
it.  So I'll have to repopulate the cache occasionally, but that's not
as bad as the churn I was seeing before with a 100% RDBO-driven insert
process.

JS You can always make your product class inherit from
JS Rose::DB::Object::Cached, so each product will only be SELECTed once.
JS (This assumes you're actually load()ing the products, as opposed to
JS getting them with a JOIN via with = ... etc.)

RDBO::Cached made a difference for sure.  I still have to use COPY FROM,
though, to keep up with the data flow.  Damn referential integrity.

Ted

-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-05 Thread Adam Prime
Ted Zlatanov wrote:
 I realize much of this question belongs to database-specific domains,
 but here goes: I want RDBO insert operations to be faster.  I need to
 load thousands of records per second into a database.  I'm using
 PostgreSQL, and everything works correctly.  The extra DB operations per
 insert are killing performance, however.  I would rather optimize this
 through RDBO than hand-code it in SQL, hence the questions below.

I don't know the internals, but the first thing i'd ask here is are you 
actually using the objects associated with these inserts, are you doing 
a thousand inserts, then never actually using the objects?  It seems to 
me that if you don't need the objects, it might make more sense to batch 
insert them into the DB outside of rose using whatever method is 
quickest on your DB.

I assume you're using $obj-insert, not $obj-save right?

 So, first question: my tables are set up with minimal optimizations.
 Are there any easy PostgreSQL insert optimizations I can do?  I have
 referential integrity constraints, and no indexing outside of primary
 keys.  Pointers to outside references are welcome, but note that I've
 looked around and most information is about optimizing SELECTs and not
 INSERTs.
 
 Second question: RDBO loves to be thorough.  Every time I use a foreign
 key in a one-to-many relationship (potentially storing it) it will check
 if the key exists, then either select or insert it, then use it.  For
 certain tables, I know that's not necessary: if you have seen the key
 before, I'm sure it won't be deleted, so using it from memory is OK.
 All the work is inserts, in other words: the database is not touched by
 other processes so I know caching of simple foreign keys is OK.
 
 For example, say I have an order O100 with a single product P100 named
 car.  If I say (in pseudocode) load(O100).product(car), and we've
 already seen that P100 is car, the extra SELECT RDBO will normally do
 here should not happen.  If we don't know P100 is car yet, RDBO can do
 the normal SELECT+INSERT cycle, of course.  I have not seen a way to do
 this automatically in RDBO, and I think it is a common case.

You can avoid this extra select by pulling it from the DB immediately 
using 'with', but I'm sure you're already aware of that.

 I think this optimization would save me at least 10 SELECTs per record
 because the products repeat very often.  In addition, I'm curious if
 there's any other optimizations people can suggest, both in RDBO and
 outside it.

Not sure if any of this will be any help or not.

Adam

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object


Re: [RDBO] insert speed questions with RDBO

2008-01-05 Thread John Siracusa
On Jan 5, 2008 1:25 PM, Ted Zlatanov [EMAIL PROTECTED] wrote:
 I realize much of this question belongs to database-specific domains,
 but here goes: I want RDBO insert operations to be faster.  I need to
 load thousands of records per second into a database.

The first thing to investigate is using plain DBI.  If that's not fast
enough, then RDBO will never be any faster so you'll have to look
elsewhere.

The second thing to consider is bulk loading rather than INSERT
statements.  You can perhaps use RDBO to produce the (usually
db-specific) bulk-load files themselves, then point  the database at
the files using DBI and whatever the load from file syntax is for
your db.

 For example, say I have an order O100 with a single product P100 named
 car.  If I say (in pseudocode) load(O100).product(car), and we've
 already seen that P100 is car, the extra SELECT RDBO will normally do
 here should not happen.  If we don't know P100 is car yet, RDBO can do
 the normal SELECT+INSERT cycle, of course.  I have not seen a way to do
 this automatically in RDBO, and I think it is a common case.

I don't think I follow your example.  Give me a non-psuedocode example
and I can probably tell you how to avoid any unnecessary SELECt
statements.

 I think this optimization would save me at least 10 SELECTs per record
 because the products repeat very often.

You can always make your product class inherit from
Rose::DB::Object::Cached, so each product will only be SELECTed once.
(This assumes you're actually load()ing the products, as opposed to
getting them with a JOIN via with = ... etc.)

-John

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object