Re: [HACKERS] Inserting heap tuples in bulk in COPY

Robert Haas Thu, 09 Aug 2012 07:12:28 -0700

On Thu, Aug 9, 2012 at 2:59 AM, Jesper Krogh <jes...@krogh.cc> wrote:
> If it is an implementation artifact or an result of this
> approach I dont know. But currently, when the GIN fastupdate
> code finally decides to "flush" the buffer, it is going to stall all
> other processes doing updates while doing it. If you only have
> one update process then this doesn't matter. But if you're trying to get
> user-interactive-updates to flow in with batch-updates from
> background processes, then you'd better kill off this feature,
> since you're gauranteed that the user-interactive process is
> either going to flush the buffer or wait on someone else doing
> it.
>
> I havent done the benchmarking, but I'm actually fairly sure that
> fastupdate isn't overall faster if you bump concurrency slightly and run of
> memory or SSD-based backends due to this cross-backend contention
> of the buffer.


Yeah, I've noticed that there are some things that are a little wonky
about GIN fastupdate.  On the other hand, I believe that MySQL has
something along these lines called secondary index buffering which
apparently does very good things for random I/O.  I am not sure of the
details or the implementation, though.

> A buffer that is backend local, so you can use transactions to
> batch up changes would get around this, but that may have another
> huge set of consequenses I dont know if.
>
> ... based on my own real-world experience with this feature.

Well, the main thing to worry about is transactional consistency.  If
a backend which has postponed doing the index-inserts does an index
scan after the command-counter-id has been bumped, it'll see
inconsistent results.  We could avoid that by only using the
optimization when some set of sanity checks passes and doing the
deferred inserts at the end of the statement, or something like that.

The other tricky part is figuring out how to actually get a
performance improvement out of it.  I think Simon's probably right
that a lot of the cost is in repeatedly walking the btree, looking up
and pinning/unpinning/locking/unlocking buffers along the way.  Maybe
we could sort the data in index order, walk down to the first
insertion point, and the insert as many tuples in a row as precede the
next key in the index.  Then lather, rinse, repeat.  If you're
actually just adding everything at the tail of the index, this ought
to work pretty well.  But if the inserts are all over the place it
seems like it might not be any better, or actually a little worse.

Of course it's probably premature to speculate too much until someone
actually codes something up and tests it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Inserting heap tuples in bulk in COPY

Reply via email to