On Tue, Nov 27, 2012 at 6:26 PM, Steve Atkins <st...@blighty.com> wrote:

>
> On Nov 27, 2012, at 2:04 PM, Mike Blackwell <mike.blackw...@rrd.com>
> wrote:
>
> > I need to delete about 1.5 million records from a table and reload it in
> one transaction.  The usual advice when loading with inserts seems to be
> group them into transactions of around 1k records.  Committing at that
> point would leave the table in an inconsistent state.
>
> I'd probably just do the whole thing in one transaction.
>
> Do you have specific reasons you want to avoid a long transaction, or just
> relying on rules of thumb? Postgresql isn't going to run out of resources
> doing a big transaction, in the way some other databases will.
>
> Long running transactions will interfere with vacuuming, but inserting a
> couple of million rows shouldn't take that long.
>
> >  Would issuing a savepoint every 1k or so records negate whatever
> downside there is to keeping a transaction open for all 1.5 million
> records, or just add more overhead?
>
>
> Savepoints are going to increase overhead and have no effect on the length
> of the transaction. If you want to catch errors and not have to redo the
> entire transaction, they're great, but that's about it.
>
> > The data to reload the table is coming from a Perl DBI connection to a
> different database (not PostgreSQL) so I'm not sure the COPY alternative
> applies here.
>
> COPY works nicely from perl:
>
> $dbh->do("COPY foo FROM STDIN");
> $dbh->pg_putcopydata("foo\tbar\tbaz\n");
> $dbh->pg_putcopyend();
>
> The details are in DBD::Pg. I use this a lot for doing big-ish (tens of
> millions of rows) bulk inserts. It's not as fast as you can get, but it's
> probably as fast as you can get with perl.
>
> Cheers,
>   Steve


I do this as well - insert a few million rows into a table using the
DBI::Pg copy interface.  It works well.

I ended up batching the copies so that each COPY statement only does a few
hundred thousand at a time, but it's all one transaction.

The batching was necessary because of an idiosyncrasy of COPY in Pg 8.1:
each COPY statement's contents was buffered in a malloc'd space, and if
there were several million rows buffered up, the allocated virtual memory
could get quite large - as in several GB.  It plus the buffer pool
sometimes exceeded the amount of RAM I had available at that time (several
years ago), with bad effects on performance.

This may have been fixed since then, or maybe RAM's gotten big enough that
it's not a problem.

Dan Franklin

Reply via email to