[PERFORM] Deteriorating performance when loading large objects
Hi, I have a problem with large objects in postgresql 8.1: The performance of loading large objects into a database goes way down after a few days of operation. I have a cron job kicking in twice a day, which generates and loads around 6000 large objects of 3.7MB each. Each night, old data is deleted, so there is never more than 24000 large object in the database. If I start loading on a freshly installed database, load times for this is around 13 minutes, including generating the data to be stored. If I let the database run for a few days, this takes much longer. After one or two days, this goes down to almost an hour, with logs indicating that this extra time is solely spent transferring the large objects from file to database. Turning autovacuum on or off seems to have no effect on this. I have only made the following changes to the default postgresql.conf file: max_fsm_pages = 2500 vacuum_cost_delay = 10 checkpoint_segments = 256 So, my question for you is: Why does this happen, and what can I do about it? Regards, Vegard Bønes -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Deteriorating performance when loading large objects
Vegard Bønes wrote: Hi, I have a problem with large objects in postgresql 8.1: The performance of loading large objects into a database goes way down after a few days of operation. I have a cron job kicking in twice a day, which generates and loads around 6000 large objects of 3.7MB each. Each night, old data is deleted, so there is never more than 24000 large object in the database. So, my question for you is: Why does this happen, and what can I do about it? Try putting a vacuumdb -zf command as a cron job after the data is deleted. signature.asc Description: OpenPGP digital signature
Re: [PERFORM] Deteriorating performance when loading large objects
=?iso-8859-1?Q?Vegard_B=F8nes?= [EMAIL PROTECTED] writes: I have a problem with large objects in postgresql 8.1: The performance of loading large objects into a database goes way down after a few days of operation. I have a cron job kicking in twice a day, which generates and loads around 6000 large objects of 3.7MB each. Each night, old data is deleted, so there is never more than 24000 large object in the database. Are you sure you're deleting the large objects themselves (ie, lo_unlink), and not just deleting some references to them? A manual vacuum verbose on pg_largeobject might be informative. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Monitoring buffercache...
Scott Marlowe wrote: On Mon, Nov 24, 2008 at 12:52 PM, Brad Nicholson [EMAIL PROTECTED] wrote: I just ran it in a loop over and over on my 8 core opteron server and it ran the load factor up by almost exactly 1.0. Under our normal daily load, it sits at 1.9 to 2.5, and it climbed to 2.9 under the new load of running that query over and over. So, it doesn't seem to be blocking or anything. The internal docs for pg_buffercache_pages.c state: To get a consistent picture of the buffer state, we must lock all partitions of the buffer map. Needless to say, this is horrible for concurrency. Must grab locks in increasing order to avoid possible deadlocks. Well, the pg hackers tend to take a parnoid view (it's a good thing TM) on things like this. My guess is that the period of time for which pg_buffercache takes locks on the buffer map are short enough that it isn't a real big deal on a fast enough server. On mine, it certainly had no real negative effects for the 5 minutes or so it was running in a loop. None I could see, and we run hundreds of queries per second on our system. Of course, for certain other types of loads it could be a much bigger issue. But for our load, on our machine, it was virtually unnoticeable. Yeah, I wouldn't worry about accessing it every 15 minutes! I put the comment there to make it clear that (like pg_locks) selecting from it *very frequently* could effect performance. Cheers Mark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Partition table query performance
I have a problem with partitioning and I'm wondering if anyone can provide some insight. I'm trying to find the max value of a column across multiple partitions. The query against the partition set is quite slow while queries against child partitions is very fast! I setup a basic Range Partition table definition. A parent table: Data { dataID, sensorID, value, ts } child tables Data__WEEKNO { dataID, sensorID, value, ts} inherited from Data Each child tables has a primary key index on dataID and a composite index on (sensorID, ts). Each child has check constraints for the week range identified in the table name (non overlapping) I want to perform a simple operation: select the max ts (timestamp) giving a sensorID. Given my indexs on the table, this should be a simple and fast operation. DB=# EXPLAIN ANALYZE select max(ts) from Data where valid=true and sensorID=8293 ; -- Aggregate (cost=334862.92..334862.93 rows=1 width=8) (actual time=85183.381..85183.383 rows=1 loops=1) - Append (cost=2.30..329397.68 rows=2186096 width=8) (actual time=1.263..76592.755 rows=2205408 loops=1) - Bitmap Heap Scan on Data (cost=2.30..8.84 rows=3 width=8) (actual time=0.027..0.027 rows=0 loops=1) Recheck Cond: (sensorID = 8293) Filter: valid - Bitmap Index Scan on def_data_sensorID_ts (cost=0.00..2.30 rows=6 width=0) (actual time=0.021..0.021 rows=0 loops=1) Index Cond: (sensorID = 8293) - *Index Scan using Data_2008_01_sensorID_ts_index on Data_2008_01 Data* (cost=0.00..4.27 rows=1 width=8) (actual time=0.014..0.014 rows=0 loops=1) Index Cond: (sensorID = 8293) Filter: valid - *Bitmap Heap Scan on Data_2008_02 Data* (cost=3.01..121.08 rows=98 width=8) (actual time=0.017..0.017 rows=0 loops=1) Recheck Cond: (sensorID = 8293) Filter: valid - Bitmap Index Scan on Data_2008_02_sensorID_ts_index (cost=0.00..2.99 rows=98 width=0) (actual time=0.011..0.011 rows=0 loops=1) Index Cond: (sensorID = 8293) . . . (omitted a list of all partitions with same as data above) . Total runtime: 85188.694 ms When I query against a specific partition: DB=# EXPLAIN ANALYZE select max(ts) from Data_2008_48 where valid=true and sensorID=8293 ; QUERY PLAN - Result (cost=0.10..0.11 rows=1 width=0) (actual time=3.830..3.832 rows=1 loops=1) InitPlan - Limit (cost=0.00..0.10 rows=1 width=8) (actual time=3.817..3.819 rows=1 loops=1) - Index Scan Backward using Data_2008_48_sensorID_ts_index on Data_2008_48 (cost=0.00..15304.55 rows=148959 width=8) (actual time=3.813..3.813 rows=1 loops=1) Index Cond: (sensorID = 8293) Filter: ((ts IS NOT NULL) AND valid) Total runtime: 0.225 ms The query plan against the child partition makes sense - Uses the index to find the max value. The query plan for the partitions uses a combination of bitmap heap scans and index scans. Why would the query plan choose to use a bitmap heap scan after bitmap index scan or is that the best choice? (what is it doing?) and what can I do to speed up this query? As a sanity check I did a union query of all partitions to find the max(ts). My manual union query executed in 13ms vs the query against the parent table that was 85,188ms!!!. Greg Jaman