On Tue, Nov 27, 2012 at 6:26 PM, Steve Atkins <st...@blighty.com> wrote:
> > On Nov 27, 2012, at 2:04 PM, Mike Blackwell <mike.blackw...@rrd.com> > wrote: > > > I need to delete about 1.5 million records from a table and reload it in > one transaction. The usual advice when loading with inserts seems to be > group them into transactions of around 1k records. Committing at that > point would leave the table in an inconsistent state. > > I'd probably just do the whole thing in one transaction. > > Do you have specific reasons you want to avoid a long transaction, or just > relying on rules of thumb? Postgresql isn't going to run out of resources > doing a big transaction, in the way some other databases will. > > Long running transactions will interfere with vacuuming, but inserting a > couple of million rows shouldn't take that long. > > > Would issuing a savepoint every 1k or so records negate whatever > downside there is to keeping a transaction open for all 1.5 million > records, or just add more overhead? > > > Savepoints are going to increase overhead and have no effect on the length > of the transaction. If you want to catch errors and not have to redo the > entire transaction, they're great, but that's about it. > > > The data to reload the table is coming from a Perl DBI connection to a > different database (not PostgreSQL) so I'm not sure the COPY alternative > applies here. > > COPY works nicely from perl: > > $dbh->do("COPY foo FROM STDIN"); > $dbh->pg_putcopydata("foo\tbar\tbaz\n"); > $dbh->pg_putcopyend(); > > The details are in DBD::Pg. I use this a lot for doing big-ish (tens of > millions of rows) bulk inserts. It's not as fast as you can get, but it's > probably as fast as you can get with perl. > > Cheers, > Steve I do this as well - insert a few million rows into a table using the DBI::Pg copy interface. It works well. I ended up batching the copies so that each COPY statement only does a few hundred thousand at a time, but it's all one transaction. The batching was necessary because of an idiosyncrasy of COPY in Pg 8.1: each COPY statement's contents was buffered in a malloc'd space, and if there were several million rows buffered up, the allocated virtual memory could get quite large - as in several GB. It plus the buffer pool sometimes exceeded the amount of RAM I had available at that time (several years ago), with bad effects on performance. This may have been fixed since then, or maybe RAM's gotten big enough that it's not a problem. Dan Franklin