[PERFORM] Deteriorating performance when loading large objects

2008-11-25 Thread Vegard Bønes
Hi,

I have a problem with large objects in postgresql 8.1: The performance of
loading large objects into a database goes way down after a few days of
operation.

I have a cron job kicking in twice a day, which generates and loads around
6000 large objects of 3.7MB each. Each night, old data is deleted, so
there is never more than 24000 large object in the database.

If I start loading on a freshly installed database, load times for this is
around 13 minutes, including generating the data to be stored. If I let
the database run for a few days, this takes much longer. After one or two
days, this goes down to almost an hour, with logs indicating that this
extra time is solely spent transferring the large objects from file to
database.

Turning autovacuum on or off seems to have no effect on this.

I have only made the following changes to  the default postgresql.conf file:
max_fsm_pages = 2500
vacuum_cost_delay = 10
checkpoint_segments = 256

So, my question for you is: Why does this happen, and what can I do about it?

Regards,

Vegard Bønes



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deteriorating performance when loading large objects

2008-11-25 Thread Ivan Voras
Vegard Bønes wrote:
 Hi,
 
 I have a problem with large objects in postgresql 8.1: The performance of
 loading large objects into a database goes way down after a few days of
 operation.
 
 I have a cron job kicking in twice a day, which generates and loads around
 6000 large objects of 3.7MB each. Each night, old data is deleted, so
 there is never more than 24000 large object in the database.

 So, my question for you is: Why does this happen, and what can I do about it?

Try putting a vacuumdb -zf command as a cron job after the data is
deleted.



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Deteriorating performance when loading large objects

2008-11-25 Thread Tom Lane
=?iso-8859-1?Q?Vegard_B=F8nes?= [EMAIL PROTECTED] writes:
 I have a problem with large objects in postgresql 8.1: The performance of
 loading large objects into a database goes way down after a few days of
 operation.

 I have a cron job kicking in twice a day, which generates and loads around
 6000 large objects of 3.7MB each. Each night, old data is deleted, so
 there is never more than 24000 large object in the database.

Are you sure you're deleting the large objects themselves (ie,
lo_unlink), and not just deleting some references to them?

A manual vacuum verbose on pg_largeobject might be informative.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Monitoring buffercache...

2008-11-25 Thread Mark Kirkwood

Scott Marlowe wrote:

On Mon, Nov 24, 2008 at 12:52 PM, Brad Nicholson
[EMAIL PROTECTED] wrote:
  

I just ran it in a loop over and over on my 8 core opteron server and
it ran the load factor up by almost exactly 1.0.  Under our normal
daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new
load of running that query over and over.  So, it doesn't seem to be
blocking or anything.
  

The internal docs for pg_buffercache_pages.c state:

To get a consistent picture of the buffer state, we must lock all
partitions of the buffer map.  Needless to say, this is horrible
for concurrency.  Must grab locks in increasing order to avoid
possible deadlocks.



Well, the pg hackers tend to take a parnoid view (it's a good thing
TM) on things like this.  My guess is that the period of time for
which pg_buffercache takes locks on the buffer map are short enough
that it isn't a real big deal on a fast enough server.  On mine, it
certainly had no real negative effects for the 5 minutes or so it was
running in a loop.  None I could see, and we run hundreds of queries
per second on our system.

Of course, for certain other types of loads it could be a much bigger
issue.  But for our load, on our machine, it was virtually
unnoticeable.

  
Yeah, I wouldn't worry about accessing it every 15 minutes! I put the 
comment there to make it clear that (like pg_locks) selecting from it 
*very frequently* could effect performance.


Cheers

Mark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Partition table query performance

2008-11-25 Thread Greg Jaman
I have a problem with partitioning and I'm wondering if anyone can provide
some insight.   I'm trying to find the max value of a column across multiple
partitions.  The query against the partition set is quite slow while queries
against child partitions is very fast!


I setup a basic Range Partition table definition.
   A parent table:  Data {  dataID, sensorID, value, ts }
   child tables   Data__WEEKNO { dataID, sensorID, value, ts}  inherited
from Data
  Each child tables has a primary key index on dataID and a
composite index on (sensorID, ts).
  Each child has check constraints for the week range identified in
the table name (non overlapping)

I want to perform a simple operation:  select the max ts (timestamp) giving
a sensorID.  Given my indexs on the table, this should be a simple and fast
operation.


DB=# EXPLAIN ANALYZE  select max(ts) from Data where valid=true and
sensorID=8293 ;
--
 Aggregate  (cost=334862.92..334862.93 rows=1 width=8) (actual
time=85183.381..85183.383 rows=1 loops=1)
   -  Append  (cost=2.30..329397.68 rows=2186096 width=8) (actual
time=1.263..76592.755 rows=2205408 loops=1)
 -  Bitmap Heap Scan on Data  (cost=2.30..8.84 rows=3 width=8)
(actual time=0.027..0.027 rows=0 loops=1)
   Recheck Cond: (sensorID = 8293)
   Filter: valid
   -  Bitmap Index Scan on def_data_sensorID_ts
(cost=0.00..2.30 rows=6 width=0) (actual time=0.021..0.021 rows=0 loops=1)
 Index Cond: (sensorID = 8293)
 -  *Index Scan using Data_2008_01_sensorID_ts_index on
Data_2008_01 Data*  (cost=0.00..4.27 rows=1 width=8) (actual
time=0.014..0.014 rows=0 loops=1)
   Index Cond: (sensorID = 8293)
   Filter: valid
 -  *Bitmap Heap Scan on Data_2008_02 Data*  (cost=3.01..121.08
rows=98 width=8) (actual time=0.017..0.017 rows=0 loops=1)
   Recheck Cond: (sensorID = 8293)
   Filter: valid
   -  Bitmap Index Scan on Data_2008_02_sensorID_ts_index
(cost=0.00..2.99 rows=98 width=0) (actual time=0.011..0.011 rows=0 loops=1)
 Index Cond: (sensorID = 8293)
.
.
. (omitted a list of all partitions with same as data above)
.
 Total runtime: 85188.694 ms




When I query against a specific partition:


DB=# EXPLAIN ANALYZE  select max(ts) from Data_2008_48 where valid=true
and sensorID=8293 ;

QUERY
PLAN

-
 Result  (cost=0.10..0.11 rows=1 width=0) (actual time=3.830..3.832 rows=1
loops=1)
   InitPlan
 -  Limit  (cost=0.00..0.10 rows=1 width=8) (actual time=3.817..3.819
rows=1 loops=1)
   -  Index Scan Backward using Data_2008_48_sensorID_ts_index on
Data_2008_48  (cost=0.00..15304.55 rows=148959 width=8) (actual
time=3.813..3.813 rows=1 loops=1)
 Index Cond: (sensorID = 8293)
 Filter: ((ts IS NOT NULL) AND valid)
 Total runtime: 0.225 ms


The query plan against the child partition makes sense - Uses the index to
find the max value.  The query plan for the partitions uses a combination of
bitmap heap scans and index scans.
Why would the query plan choose to use a bitmap heap scan after bitmap index
scan or is that the best choice?  (what is it doing?) and what can I do to
speed up this query?

As a sanity check I did a union query of all partitions to find the max(ts).
My manual union query executed in 13ms vs the query against the parent table
that was 85,188ms!!!.



Greg Jaman