Re: [HACKERS] Inserting heap tuples in bulk in COPY

Robert Haas Fri, 12 Aug 2011 13:58:03 -0700

On Fri, Aug 12, 2011 at 3:16 PM, Heikki Linnakangas
<[email protected]> wrote:
> COPY is slow. Let's make it faster. One obvious optimization is to insert
> heap tuples in bigger chunks, instead of calling heap_insert() separately
> for every tuple. That saves the overhead of pinning and locking the buffer
> for every tuple, and you only need to write one WAL record for all the
> tuples written to the same page, instead of one for each tuple.
>
> Attached is a WIP patch to do that. It adds a new function,
> heap_multi_insert, which does the same thing as heap_insert, but works in
> bulk. It takes an array of tuples as argument, and tries to cram as many of
> them into the chosen targe page as it can, and only writes a single WAL
> record of the operation.
>
> This gives a significant speedup to COPY, particularly for narrow tables,
> with small tuples. Grouping multiple tuples into one WAL record reduces the
> WAL volume significantly, and the time spent in writing that WAL. The
> reduced overhead of repeatedly locking the buffer is also most noticeable on
> narrow tables. On wider tables, the effects are smaller. See
> copytest-results.txt, containing test results with three tables of different
> widths. The scripts used to get those numbers are also attached.
>
> Triggers complicate this. I believe it is only safe to group tuples together
> like this if the table has no triggers. A BEFORE ROW trigger might run a
> SELECT on the table being copied to, and check if some of the tuples we're
> about to insert exist. If we run BEFORE ROW triggers for a bunch of tuples
> first, and only then insert them, none of the trigger invocations will see
> the other rows as inserted yet. Similarly, if we run AFTER ROW triggers
> after inserting a bunch of tuples, the trigger for each of the insertions
> would see all the inserted rows. So at least for now, the patch simply falls
> back to inserting one row at a time if there are any triggers on the table.
>
> The patch is WIP, mainly because I didn't write the WAL replay routines yet,
> but please let me know if you see any issues.


I thought about trying to do this at one point in the past, but I
couldn't figure out exactly how to make it work.  I think the approach
you've taken here is good.

Aside from the point already raised about needing to worry only about
BEFORE ROW triggers, I don't see any showstoppers.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inserting heap tuples in bulk in COPY

Reply via email to