On Fri, Aug 12, 2011 at 3:16 PM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > COPY is slow. Let's make it faster. One obvious optimization is to insert > heap tuples in bigger chunks, instead of calling heap_insert() separately > for every tuple. That saves the overhead of pinning and locking the buffer > for every tuple, and you only need to write one WAL record for all the > tuples written to the same page, instead of one for each tuple. > > Attached is a WIP patch to do that. It adds a new function, > heap_multi_insert, which does the same thing as heap_insert, but works in > bulk. It takes an array of tuples as argument, and tries to cram as many of > them into the chosen targe page as it can, and only writes a single WAL > record of the operation. > > This gives a significant speedup to COPY, particularly for narrow tables, > with small tuples. Grouping multiple tuples into one WAL record reduces the > WAL volume significantly, and the time spent in writing that WAL. The > reduced overhead of repeatedly locking the buffer is also most noticeable on > narrow tables. On wider tables, the effects are smaller. See > copytest-results.txt, containing test results with three tables of different > widths. The scripts used to get those numbers are also attached. > > Triggers complicate this. I believe it is only safe to group tuples together > like this if the table has no triggers. A BEFORE ROW trigger might run a > SELECT on the table being copied to, and check if some of the tuples we're > about to insert exist. If we run BEFORE ROW triggers for a bunch of tuples > first, and only then insert them, none of the trigger invocations will see > the other rows as inserted yet. Similarly, if we run AFTER ROW triggers > after inserting a bunch of tuples, the trigger for each of the insertions > would see all the inserted rows. So at least for now, the patch simply falls > back to inserting one row at a time if there are any triggers on the table. > > The patch is WIP, mainly because I didn't write the WAL replay routines yet, > but please let me know if you see any issues.
I thought about trying to do this at one point in the past, but I couldn't figure out exactly how to make it work. I think the approach you've taken here is good. Aside from the point already raised about needing to worry only about BEFORE ROW triggers, I don't see any showstoppers. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers