Greg,

Thanks for the mental prod!  Yes, the original data is more closely sorted by 
the timestamptz column, since they represent events coming into the collection 
system in real time.  As for the distribution of data values, it goes without 
saying the timestamptz value is monotonically increasing, with roughly 1300 
entries having the same timestamptz value.  The other three columns' values are 
essentially reference data, with 400 values for the varchar, 680 for the first 
text column, and 60 for the second text column.  The distribution is fairly 
even, with some small spikes but nothing significant.

The "duh" moment came for me when you pointed out the implicit sort order of 
the data.  After resorting the data into the new index column order the insert 
performance was largely restored.  I didn't monitor the process with vmstat, 
however - the end result is good enough for me.  I believe that the index 
maintenance of page splitting, etc., that you describe below was exactly the 
culprit, and that presorting the data solved that problem.  

I call it my "duh" moment since I've presorted data for Sybase and Oracle for 
exactly the same reason, but forgot to apply the lesson to PostgreSQL.

BTW, this is PG 8.2.1 and 8.3.7 running on SLES 10.3, although I don't think it 
matters.

Thanks for the help, Greg and Tom!

--- On Sat, 6/27/09, Greg Smith <gsm...@gregsmith.com> wrote:

> From: Greg Smith <gsm...@gregsmith.com>
> Subject: Re: [PERFORM] Insert performance and multi-column index order
> To: bob_lun...@yahoo.com
> Cc: pgsql-performance@postgresql.org
> Date: Saturday, June 27, 2009, 1:08 AM
> On Fri, 26 Jun 2009, bob_lun...@yahoo.com
> wrote:
> 
> > The original unique index was in the order
> (timestamptz, varchar, text, text) and most queries against
> it were slow.  I changed the index order to (varchar, text,
> timestamptz, text) and queries now fly, but loading data
> (via copy from stdin) in the table is 2-4 times slower.
> 
> Is the input data closer to being sorted by the timestamptz
> field than the varchar field?  What you might be seeing
> is that the working set of index pages needed to keep
> building the varchar index are bigger or have more of a
> random access component to them as they spill in and out of
> the buffer cache.  Usually you can get a better idea
> what the difference is by comparing the output from vmstat
> while the two are loading.  More random read/write
> requests in the mix will increase the waiting for I/O
> percentage while not increasing the total amount
> read/written per second.
> 
> --
> * Greg Smith gsm...@gregsmith.com
> http://www.gregsmith.com Baltimore, MD




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to