Thanks for the URL, I went through postgresql.conf and made some modifications to the 
config based on information therein.  I will have to wait and see how it affects 
things, as I won't know for a week or so.

Select time has never been a problem, the DB has always been very fast, it's the 
insert time that has been a problem.  I'm not sure how much this is a function of the 
drive array sucking, the OS not doing a good job or the DB getting caught up in 
transaction logs.

What does seem odd is that the performance degrades as time goes on, and the space 
that the DB files takes up increases as well.

The Vacuum full is performed once at the end of the whole job.  We could probably get 
away with doing this once per week, but in the past I have noticed that if I don't run 
it regularlly, when I do run it, it seems to take much longer.  This has lead me to 
run more regularly than not.

As for 7.3.3, the project in question suffered a 10x performance degredation on 7.3.3 
which went away when we rolled back to 7.3.2.  Almost all the inserts had triggers 
which updated stats tables, the database in question was very very write heavy, it was 
pretty much a datawarehouse for X10 sensor information which was then mined for 
analysis.

I had certainly considered building the script to do binary seperation style inserts, 
split the job in half, insert, if it fails, split in half again until you get 
everything in.  This would probably work okay considering only about two dozen out of 
30,000 rows fail. The only reason not to do that it the time and effort required, 
particularly as we are looking at a substantial overhaul of the whole system in the 
next 6 months.

Alex Turner


On Wed, Sep 10, 2003 at 07:31:53PM +0100, Richard Huxton wrote:
> On Wednesday 10 September 2003 18:53, [EMAIL PROTECTED] wrote:
> > Hi,
> >
> > My name is Alex Turner and I work for a small Tech company in Pottstown PA.
> >  We run Postgresql on a number of systems for a variety of different
> > applications, and it has been a joy to deal with all around, working fast
> > and reliably for over 2 years.
> >
> > We recently upgraded from RedHat 7.2 to RedHat 9.0, and we are running
> > Postgres 7.3.2 on our Proliant ML370 (Raid 1 2x18 10k, and Raid 5 3x36 10k,
> > 2x866 PIII, 2GB RAM).
> [snip]
> > I have noticed that whilst inserts seem to be slower than before, the
> > vacuum full doesn't seem to take as long overall.
> >
> > postgresql.conf is pretty virgin, and we run postmaster with -B512 -N256
> > -i.  /var/lib/pgsql/data is a symlink to /eda/data, /eda being the mount
> > point for the Raid 5 array.
> 
> First things first then, go to:
> http://www.varlena.com/varlena/GeneralBits/Tidbits/index.php
> and read the item on Performance Tuning and the commented postgresql.conf
> 
> > the database isn't huge, storing about 30000 properties, and the largest
> > table is 2.1 Million rows for property features.  The dump file is only
> > 221MB. Alas, I did not design the schema, but I have made several 'tweaks'
> > to it to greatly improve read performance allowing us to be the fastest
> > provider in the Tristate area.  Unfortunately the Job starts at 01:05
> > (thats the earliest the dump is available) and runs until completion
> > finishing with a vacuum full.  The vacuum full locks areas of the database
> > long enough that our service is temporarily down.  At the worst point, the
> > vacuum full was starting after 09:00, which our customers didn't
> > appreciate.
> 
> You might be able to avoid a vacuum full by tweaking the *fsm* settings to be 
> able to cope with activity.
> 
> > I'm wondering if there is anything I can do with postgres to allieviate
> > this problem.  Either upgrading to 7.3.4 (although I tried 7.3.3 for
> > another app, and we had to roll back to 7.3.2 because of performance
> > problems), 
> 
> Hmm - can't think what would have changed radically between 7.3.2 and 7.3.3, 
> upgrading to .4 is probably sensible.
> 
> [snip]
> > Any help/suggestions would be grealy appreciated,
> 
> You say that each insert/update is a separate transaction. I don't know how 
> much "bad" data you get in the dump, but you might be able to do something 
> like:
> 
> 1. Set batch size to 128 items
> 2. Read batch-size rows from the dump
> 3. Try to insert/update the batch. If it works, move along by the size of the 
> batch and back to #1
> 4. If batch-size=1, record error, move along one row and back to #1
> 5. If batch-size>1, halve batch-size and go back to #3
> 
> Your initial batch-size will depend on how many errors there are (but 
> obviously use a power of 2).
> 
> You could also run an ordinary vacuum every 1000 rows or so (number depends on 
> your *fsm* settings as mentioned above).
> 
> You might also want to try a REINDEX once a night/week too.
> -- 
>   Richard Huxton
>   Archonet Ltd
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Reply via email to