Re: [PATCHES] Bulk Insert tuning

Simon Riggs Wed, 26 Mar 2008 06:27:41 -0700

On Tue, 2008-02-26 at 15:12 -0500, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > Following patch implements a simple mechanism to keep a buffer pinned
> > while we are bulk loading.
> 
> This will fail to clean up nicely after a subtransaction abort, no?
> (For that matter I don't think it's right even for a top-level abort.)
> And I'm pretty sure it will trash your table entirely if someone
> inserts into another relation while a bulk insert is happening.
> (Not at all impossible, think of triggers for instance.)
> 
> >From a code structural point of view, we are already well past the
> number of distinct options that heap_insert ought to have.  I was
> thinking the other day that bulk inserts ought to use a ring-buffer
> strategy to avoid having COPY IN trash the whole buffer arena, just
> as we've taught COPY OUT not to.


Agree with that. That was mentioned here again
http://archives.postgresql.org/pgsql-hackers/2008-02/msg01080.php

What do you think of the "Full Block List" idea? 

> So maybe a better idea is to
> generalize BufferAccessStrategy to be able to handle write as well
> as read concerns; or have two versions of it, one for writing and one
> for reading.  In any case the point being to encapsulate all these
> random little options in a struct, which could also carry along
> state that needs to be saved across a series of inserts, such as
> the last pinned buffer.

I'm trying to implement this, but it begins to look quite ugly.

First, if we allow multiple BulkInsertBuffers then we have to remember
them all in a list so we can unpin them all in case of abort. The change
isn't needed for correctness, as explained before.

Second, we need multiple BufferIOStrategy objects for various purposes.
There is no single default strategy, since normal inserts, normal
updates and toast all have different default behaviour. That makes it
ugly because we either need to differentiate between update/insert and
toast/main inserts - which leads to just as many options, or we need to
have lots of statically defined BufferIOStrategy objects for various
purposes.

I did agree that it was sensible to try to refactor the code, but now it
just looks like a big pile of ugly changes for no benefit. I'll save my
WIP so we can judge.

Coding it using flags that can be ORd together makes more sense than a
list of bools and will be easier to read. I'm going to try that approach
instead.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com 

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk


-- 
Sent via pgsql-patches mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: [PATCHES] Bulk Insert tuning

Reply via email to