Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum
* Jeremy Haile: Good advice on the partitioning idea. I may have to restructure some of my queries, since some of them query across the whole range - but it may be a much more performant solution. How is the performance when querying across a set of partitioned tables vs. querying on a single table with all rows? Locality of access decreases, of course, and depending on your data size, you hit something like to 2 or 4 additional disk seeks per partition for index-based accesses. Sequential scans are not impacted. Does my current approach of disabling autovacuum and manually vacuuming once-an-hour sound like a good idea, or would I likely have better results by auto-vacuuming and turning row-level stats back on? Sorry, I haven't got much experience with autovacuum, since most of other databases are INSERT-only (or get VACUUMed automatically after major updates). -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Horribly slow query/ sequential scan
I wrote: ... What seems to be happening is that Informix is willing to flatten the sub-SELECT into an IN join even though the sub-SELECT is correlated to the outer query (that is, it contains outer references). I did some googling this morning and found confirmation that recent versions of Informix have pretty extensive support for optimizing correlated subqueries: http://www.iiug.org/waiug/archive/iugnew83/FeaturesIDS73.htm This is something we've not really spent much time on for Postgres, but it might be interesting to look at someday. Given that the problem with your query was really a mistake anyway, I'm not sure that your example is compelling evidence for making it a high priority. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Slow inner join, but left join is fast
I have a query made by joining two subqueries where the outer query performing the join takes significantly longer to run than the two subqueries. The first subquery runs in 600ms. The seconds subquery runs in 700ms. But the outer query takes 240 seconds to run! Both of the two subqueries only return 8728 rows. Changing the inner join to a left join makes the outer query run in about 1000ms (which is great), but I don't understand why the inner join is so slow! I'm using PostgreSQL 8.2.1. Any ideas? QUERY PLAN (Inner Join) - takes 240 seconds --- Nested Loop (cost=17.46..17.56 rows=1 width=120) Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND (a.dcms_dim_id = b.dcms_dim_id)) - HashAggregate (cost=8.71..8.74 rows=1 width=16) - Index Scan using transaction_facts_transaction_date_idx on transaction_facts (cost=0.00..8.69 rows=1 width=16) Index Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - HashAggregate (cost=8.75..8.78 rows=1 width=16) - HashAggregate (cost=8.71..8.72 rows=1 width=55) - Index Scan using transaction_facts_transaction_date_idx on transaction_facts (cost=0.00..8.69 rows=1 width=55) Index Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) QUERY PLAN (Left Join) - takes one second --- Merge Left Join (cost=304037.63..304064.11 rows=2509 width=120) Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id = b.merchant_dim_id)) - Sort (cost=152019.45..152025.72 rows=2509 width=64) Sort Key: a.dcms_dim_id, a.merchant_dim_id - HashAggregate (cost=151771.15..151852.69 rows=2509 width=16) - Bitmap Heap Scan on transaction_facts (cost=5015.12..150419.90 rows=77214 width=16) Recheck Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - Bitmap Index Scan on transaction_facts_transaction_date_idx (cost=0.00..4995.81 rows=77214 width=0) Index Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - Sort (cost=152018.18..152020.54 rows=943 width=64) Sort Key: b.dcms_dim_id, b.merchant_dim_id - Subquery Scan b (cost=151931.51..151971.59 rows=943 width=64) - HashAggregate (cost=151931.51..151962.16 rows=943 width=16) - HashAggregate (cost=151578.11..151672.35 rows=9424 width=55) - Bitmap Heap Scan on transaction_facts (cost=5015.12..150419.90 rows=77214 width=55) Recheck Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - Bitmap Index Scan on transaction_facts_transaction_date_idx (cost=0.00..4995.81 rows=77214 width=0) Index Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) QUERY --- select a.merchant_dim_id, a.dcms_dim_id, a.num_success, a.num_failed, a.total_transactions, a.success_rate, b.distinct_num_success, b.distinct_num_failed, b.distinct_total_transactions, b.distinct_success_rate from ( -- SUBQUERY 1 select merchant_dim_id, dcms_dim_id, sum(success) as num_success, sum(failed) as num_failed, count(*) as total_transactions, (sum(success) * 1.0 / count(*)) as success_rate from transaction_facts where transaction_date = '2007-1-9' and transaction_date '2007-1-9 9:30' group by merchant_dim_id, dcms_dim_id ) as a inner join ( -- SUBQUERY 2 select merchant_dim_id,
Re: [PERFORM] group by will not use an index?
For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the equivalent GROUP BY from index values alone. Ok, that makes sense. Thanks for the help everybody! If this table is large, perhaps you could denormalize and maintain a summary table with date (using truncation) and count, updated with triggers on the original table. This table will presumably have a small number of rows at the cost of doubling the times for updates, inserts, and deletes. Well, the inserted time, at least, is never updated, and deletions are very rare (never, so far), so I'll have a look at doing things that way. Thanks!
Re: [PERFORM] Slow inner join, but left join is fast
Jeremy Haile [EMAIL PROTECTED] writes: I have a query made by joining two subqueries where the outer query performing the join takes significantly longer to run than the two subqueries. Please show EXPLAIN ANALYZE results, not just EXPLAIN. Also, have you analyzed your tables recently? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow inner join, but left join is fast
The table should have been analyzed, but to make sure I ran analyze on the table before executing the explain analyze queries. Well - problem solved. This time the inner join query runs quickly. I still don't understand why the inner join would be so different from the left join prior to the analyze. It looks like the amount of rows expected in the original query plan for inner join was 1 (not correct since it was really 8728) The left join query had the exact same subqueries but expected 77214 rows to be returned from them, which was still not correct but resulted in a better query plan. After the recent analyze, here's the new inner join query plan. I won't bother pasting the left join plan, since it is almost identical now (including row counts) FYI -the result of the queries is (and always was) identical for inner join and left join. QUERY PLAN (inner join) Merge Join (cost=279457.86..279479.83 rows=43 width=120) (actual time=626.771..670.275 rows=8728 loops=1) Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id = b.merchant_dim_id)) - Sort (cost=139717.30..139722.38 rows=2029 width=64) (actual time=265.669..269.878 rows=8728 loops=1) Sort Key: a.dcms_dim_id, a.merchant_dim_id - HashAggregate (cost=139519.61..139585.56 rows=2029 width=16) (actual time=211.368..247.429 rows=8728 loops=1) - Bitmap Heap Scan on transaction_facts (cost=4427.62..138316.05 rows=68775 width=16) (actual time=21.858..100.998 rows=65789 loops=1) Recheck Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - Bitmap Index Scan on transaction_facts_transaction_date_idx (cost=0.00..4410.42 rows=68775 width=0) (actual time=21.430..21.430 rows=65789 loops=1) Index Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - Sort (cost=139740.56..139742.67 rows=843 width=64) (actual time=361.083..365.418 rows=8728 loops=1) Sort Key: b.dcms_dim_id, b.merchant_dim_id - Subquery Scan b (cost=139663.76..139699.59 rows=843 width=64) (actual time=308.567..346.135 rows=8728 loops=1) - HashAggregate (cost=139663.76..139691.16 rows=843 width=16) (actual time=308.563..337.677 rows=8728 loops=1) - HashAggregate (cost=139347.68..139431.97 rows=8429 width=55) (actual time=198.093..246.591 rows=48942 loops=1) - Bitmap Heap Scan on transaction_facts (cost=4427.62..138316.05 rows=68775 width=55) (actual time=24.080..83.988 rows=65789 loops=1) Recheck Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) - Bitmap Index Scan on transaction_facts_transaction_date_idx (cost=0.00..4410.42 rows=68775 width=0) (actual time=23.596..23.596 rows=65789 loops=1) Index Cond: ((transaction_date = '2007-01-09 00:00:00'::timestamp without time zone) AND (transaction_date '2007-01-09 09:30:00'::timestamp without time zone)) Total runtime: 675.638 ms On Wed, 10 Jan 2007 12:15:44 -0500, Tom Lane [EMAIL PROTECTED] said: Jeremy Haile [EMAIL PROTECTED] writes: I have a query made by joining two subqueries where the outer query performing the join takes significantly longer to run than the two subqueries. Please show EXPLAIN ANALYZE results, not just EXPLAIN. Also, have you analyzed your tables recently? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow inner join, but left join is fast
Jeremy Haile [EMAIL PROTECTED] writes: I still don't understand why the inner join would be so different from the left join prior to the analyze. Are you sure you hadn't analyzed in between? Or maybe autovac did it for you? The reason for the plan change is the change from estimating 1 row matching the transaction_date range constraint, to estimating lots of them, and the join type away up at the top would surely not have affected that. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Slow inner join, but left join is fast
I'm pretty sure it didn't analyze in between - autovac is turned off and I ran the test multiple times before posting. But since I can't reproduce it anymore, I can't be 100% sure. And it certainly doesn't make sense that the estimate for the index scan would change based on an unrelated join condition. If I ever get it to happen again, I'll be more careful and repost if it is a real issue. Thanks for pointing me in the right direction! On Wed, 10 Jan 2007 13:38:15 -0500, Tom Lane [EMAIL PROTECTED] said: Jeremy Haile [EMAIL PROTECTED] writes: I still don't understand why the inner join would be so different from the left join prior to the analyze. Are you sure you hadn't analyzed in between? Or maybe autovac did it for you? The reason for the plan change is the change from estimating 1 row matching the transaction_date range constraint, to estimating lots of them, and the join type away up at the top would surely not have affected that. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Slow inner join, but left join is fast
Another random idea - does PostgreSQL do any caching of query plans? even on the session level? I ran these queries from the same Query window, so my idea is that maybe the inner join plan was cached prior to an automatic analyze being run. But I'm doubting PostgreSQL would do something like that. And of course, if PostgreSQL doesn't cache query plans - this idea is bogus =) On Wed, 10 Jan 2007 13:38:24 -0500, Jeremy Haile [EMAIL PROTECTED] said: I'm pretty sure it didn't analyze in between - autovac is turned off and I ran the test multiple times before posting. But since I can't reproduce it anymore, I can't be 100% sure. And it certainly doesn't make sense that the estimate for the index scan would change based on an unrelated join condition. If I ever get it to happen again, I'll be more careful and repost if it is a real issue. Thanks for pointing me in the right direction! On Wed, 10 Jan 2007 13:38:15 -0500, Tom Lane [EMAIL PROTECTED] said: Jeremy Haile [EMAIL PROTECTED] writes: I still don't understand why the inner join would be so different from the left join prior to the analyze. Are you sure you hadn't analyzed in between? Or maybe autovac did it for you? The reason for the plan change is the change from estimating 1 row matching the transaction_date range constraint, to estimating lots of them, and the join type away up at the top would surely not have affected that. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow inner join, but left join is fast
Jeremy Haile [EMAIL PROTECTED] writes: Another random idea - does PostgreSQL do any caching of query plans? Only if the client specifies it, either by PREPARE or the equivalent protocol-level message. I dunno what client software you were using, but I think few if any would PREPARE behind your back. Might be worth checking into though, if you've eliminated autovacuum. Actually there's another possibility --- did you create any indexes on the table in between? CREATE INDEX doesn't do a full stats update, but it does count the rows and update pg_class.reltuples. But it's hard to believe that'd have caused as big a rowcount shift as we see here ... regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow inner join, but left join is fast
I did create and drop an index at some point while looking at this issue. But I definitely reran both of the queries (and explains) after the index was dropped, so I don't understand why there would be a difference between the inner and left query plans. (which were run back-to-back more than once) Anyways - I'll let you know if something similar happens again. Thanks, Jeremy Haile On Wed, 10 Jan 2007 14:22:35 -0500, Tom Lane [EMAIL PROTECTED] said: Jeremy Haile [EMAIL PROTECTED] writes: Another random idea - does PostgreSQL do any caching of query plans? Only if the client specifies it, either by PREPARE or the equivalent protocol-level message. I dunno what client software you were using, but I think few if any would PREPARE behind your back. Might be worth checking into though, if you've eliminated autovacuum. Actually there's another possibility --- did you create any indexes on the table in between? CREATE INDEX doesn't do a full stats update, but it does count the rows and update pg_class.reltuples. But it's hard to believe that'd have caused as big a rowcount shift as we see here ... regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] performance implications of binary placement
Are you 100% certain that both builds are using all the same libraries? And to be an apples-apples comparison, you really need to ensure that the datadir is on the same filesystem in both cases (that's the first thing I'd check). Also, that pg_index... error sounds like the second build has been corrupted. On Tue, Dec 26, 2006 at 03:37:47PM -0500, Bob Dusek wrote: Hello all, I've been running performance tests on various incantations of Postgres on/off for a month or so. And, I've just come across some unexpected results. When I start my Postgres build as such: # (Scenario 1) ./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages --disable-rpath --enable-thread-safety --enable-integer-datetimes --without-python --without-perl --without-tcl --without-tk It performs significantly worse than when I start my build like this: # (Scenario 2) ./configure --disable-rpath --enable-thread-safety --enable-integer-datetimes --without-python --without-perl --without-tcl --without-tk Note: the only differences are that Scenario 1 includes these options: --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages And, to be clear, Scenario 1 performs worse than Scenario 2. Simple insert statements are taking significantly longer. I did not expect to see a performance hit with these options, especially since /usr/ on the test machine is mounted as its own partition, and in both cases, all of the binaries, include files, etc. are in that partition. Has anyone seen this before? Are hard drive mechanics the only thing in play here? The only difference I'm seeing in logging between the two versions is that Scenario 2 has several of this message littered throughout the logfile: ERROR: could not open relation pg_index_indexrelid_index: No such file or directory But, that doesn't seem to be effecting functionality or performance (especially considering the fact that the logfile that contains that message is part of the test that is performing better). We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8 source rpm. Thanks for any help you can provide. I can provide more detail if needed. Thanks again, Bob ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)
On Wed, Jan 03, 2007 at 11:56:20AM -0500, Tom Lane wrote: Erik Jones [EMAIL PROTECTED] writes: Tom Lane wrote: I could see this taking an unreasonable amount of time if you had a huge number of pg_class rows or a very long search_path --- is your database at all out of the ordinary in those ways? Well, running select count(*) from pg_class; returns 524699 rows Ouch. our search path is the default. I'd also like to reiterate that \d pg_class returns instantly when run from the 8.1.4 psql client connected to the 8.2 db. I think I know where the problem is: would you compare timing of select * from pg_class where c.relname ~ '^(pg_class)$'; select * from pg_class where c.relname ~ '^pg_class$'; Recent versions of psql put parentheses into the regex pattern for safety in case it's got |, but I just realized that that probably confuses the optimizer's check for an indexable regex :-( However, this only explains slowdown in psql's \d commands, which wasn't your original complaint ... On the other hand, with 500k relations pg_dump is presumably going to be doing a lot of querying of the catalog tables, so if it uses similar queries... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux
On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote: I am sure that this has been discussed before, but I can't seem to find any recent posts. (I am running PostgreSQL 8.2) I have always ran PostgreSQL on Linux in the past, but the company I am currently working for uses Windows on all of their servers. I don't have the luxury right now of running my own benchmarks on the two OSes, but wanted to know if anyone else has done a performance comparison. Is there any significant differences? One thing to consider... I've seen a case or two where pgbench running on windows with HyperThreading enabled was actually faster than with it turned off. (General experience has been that HT hurts PostgreSQL). I suspect that the windows kernel may have features that allow it to better utilize HT than linux. Of course if you don't have HT... it doesn't matter. :) -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL to host e-mail?
On Fri, Jan 05, 2007 at 01:15:44PM -0500, Reid Thompson wrote: On Fri, 2007-01-05 at 04:10 +0100, Grega Bremec wrote: he main reason I'm writing this mail though, is to suggest you take a look at Oryx, http://www.oryx.com/; They used to have this product called Mailstore, which was designed to be a mail store using PostgreSQL as a backend, and has since evolved to a bit more than just that, it seems. Perhaps it could be of help to you while building your system, and I'm sure the people at Oryx will be glad to hear from you while, and after you've built your system. Kind regards, -- ~Grega Bremec re above... http://www.archiveopteryx.org/1.10.html You should also look at http://dbmail.org/ , which runs on several databases (PostgreSQL included). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Partitioning
BTW, someone coming up with a set of functions to handle partitioning for the general 'partition by time' case would make a GREAT project on pgFoundry. On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Arnau Sent: den 5 januari 2007 12:02 To: pgsql-performance@postgresql.org Subject: [PERFORM] Partitioning Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] table partioning performance
On Mon, Jan 08, 2007 at 03:02:24PM -0500, Steven Flatt wrote: We use partitioned tables extensively and we have observed linear performance degradation on inserts as the number of rules on the master table grows (i.e. number of rules = number of partitions). We had to come up with a solution that didn't have a rule per partition on the master table. Just wondering if you are observing the same thing. Except for the simplest partitioning cases, you'll be much better off using a trigger on the parent table to direct inserts/updates/deletes to the children. As a bonus, using a trigger makes it a lot more realistic to deal with an update moving data between partitions. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
On Tue, Jan 09, 2007 at 09:10:51AM -0800, Jeff Frost wrote: On Tue, 9 Jan 2007, Jim C. Nasby wrote: On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote: When benchmarking various options for a new PG server at one of my clients, I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be fastest to have ext2 for the WAL. The winning time was 157m46.713s for ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3 data=writeback. This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/ 1GB BBU cache. This config benched out faster than a 6disk RAID10 + 2 disk RAID1 for those of you who have been wondering if the BBU write back cache mitigates the need for separate WAL (at least on this workload). Those are the fastest times for each config, but ext2 WAL was always faster than the other two options. I didn't test any other filesystems in this go around. Uh, if I'm reading this correctly, you're saying that WAL on a separate ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of ~158.5 minutes, or 0.4%? Is that even above the noise for your measurements? I suspect the phase of the moon might play a bigger role ;P That's what I thought too...cept I ran it 20 times and ext2 won by that margin every time, so it was quite repeatable. :-/ Even so, you've got to really be hunting for performance to go through the hassle of different filesystems just to gain 0.4%... :) -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Partitioning
On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael Those are some great functions. -- erik jones [EMAIL PROTECTED] software development emma(r) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Partitioning
I really wish that PostgreSQL supported a nice partitioning syntax like MySQL has. Here is an example: CREATE TABLE tr (id INT, name VARCHAR(50), purchased DATE) PARTITION BY RANGE( YEAR(purchased) ) ( PARTITION p0 VALUES LESS THAN (1990), PARTITION p1 VALUES LESS THAN (1995), PARTITION p2 VALUES LESS THAN (2000), PARTITION p3 VALUES LESS THAN (2005) ); And to drop a partition: ALTER TABLE tr DROP PARTITION p2; This seems so much more intuitive and simpler than what is required to set it up in PostgreSQL. Does PostgreSQL's approach to table partitioning have any advantage over MySQL? Is a nicer syntax planned for Postgres? On Wed, 10 Jan 2007 14:20:06 -0600, Jim C. Nasby [EMAIL PROTECTED] said: BTW, someone coming up with a set of functions to handle partitioning for the general 'partition by time' case would make a GREAT project on pgFoundry. On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote: Take a look at the set of partitioning functions I wrote shortly after the 8.1 release: http://www.studenter.hb.se/~arch/files/part_functions.sql You could probably work something out using those functions (as-is, or as inspiration) together with pgAgent (http://www.pgadmin.org/docs/1.4/pgagent.html) /Mikael -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Arnau Sent: den 5 januari 2007 12:02 To: pgsql-performance@postgresql.org Subject: [PERFORM] Partitioning Hi all, I'm not sure if this question fits in the topic of this list. I'm interested in partitioning and it's the first time I'd use it. There is an issue I don't know how you handle it. Lets say I'm interested in store monthly based statistical data like the example of http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What I don't like of this approach is that the monthly tables, rules... must be created manually or at least I haven't found any other option. My question is how do you manage this? do you have a cron task that creates automatically these monthly elements (tables, rules, ... ) or there is another approach that doesn't require external things like cron only PostgreSQL. -- Arnau ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS
On Wed, 10 Jan 2007, Jim C. Nasby wrote: RAID1 for those of you who have been wondering if the BBU write back cache mitigates the need for separate WAL (at least on this workload). Those are the fastest times for each config, but ext2 WAL was always faster than the other two options. I didn't test any other filesystems in this go around. Uh, if I'm reading this correctly, you're saying that WAL on a separate ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of ~158.5 minutes, or 0.4%? Is that even above the noise for your measurements? I suspect the phase of the moon might play a bigger role ;P That's what I thought too...cept I ran it 20 times and ext2 won by that margin every time, so it was quite repeatable. :-/ Even so, you've got to really be hunting for performance to go through the hassle of different filesystems just to gain 0.4%... :) Indeed, but actually, I did the math again and it appears that it saves close to 2 minutes versus one big ext3. I guess the moral of the story is that having a separate pg_xlog even on the same physical volume tends to be slightly faster for write oriented workloads. Ext2 is slightly faster than ext3, but of course you could likely go with another filesystem yet and be even slightly faster as well. :-) I guess the real moral of the story is that you can probably use one big ext3 with the default config and it won't matter much more than 1-2% if you have a BBU. -- Jeff Frost, Owner [EMAIL PROTECTED] Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954 ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux
Hey Jim - Thanks for the feedback. The server has dual Xeons with HyperThreading enabled - so perhaps I should try disabling it. How much performance boost have you seen by disabling it? Of course, the bottleneck in my case is more on the I/O or RAM side, not the CPU side. Jeremy Haile On Wed, 10 Jan 2007 14:15:26 -0600, Jim C. Nasby [EMAIL PROTECTED] said: On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote: I am sure that this has been discussed before, but I can't seem to find any recent posts. (I am running PostgreSQL 8.2) I have always ran PostgreSQL on Linux in the past, but the company I am currently working for uses Windows on all of their servers. I don't have the luxury right now of running my own benchmarks on the two OSes, but wanted to know if anyone else has done a performance comparison. Is there any significant differences? One thing to consider... I've seen a case or two where pgbench running on windows with HyperThreading enabled was actually faster than with it turned off. (General experience has been that HT hurts PostgreSQL). I suspect that the windows kernel may have features that allow it to better utilize HT than linux. Of course if you don't have HT... it doesn't matter. :) -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum
On Tue, Jan 09, 2007 at 12:26:41PM -0500, Jeremy Haile wrote: I am developing an application that has very predictable database operations: -inserts several thousand rows into 3 tables every 5 minutes. (table contain around 10 million rows each) -truncates and rebuilds aggregate tables of this data every 5 minutes. (several thousand rows each) -regular reads of aggregate table and sometimes large tables by user interaction -every night, hundreds of thousands of rows are deleted from these 3 tables (old data) -20-30 other tables get inserted/updated slowly throughout the day In order to optimize performance of the inserts, I disabled autovacuum/row-level stats and instead run vacuum analyze on the whole DB every hour. However this operation takes around 20 minutes of each hour. This means that the database is involved in vacuum/analyzing tables 33% of the time. I'd like any performance advice, but my main concern is the amount of time vacuum/analyze runs and its possible impact on the overall DB performance. Thanks! If much of the data in the database isn't changing that often, then why continually re-vacuum the whole thing? I'd suggest trying autovacuum and see how it does (though you might want to tune it to be more or less aggressive, and you'll probably want to enable the cost delay). The only cases where manual vacuum makes sense to me is if you've got a defined slow period and vacuuming during that slow period is still frequent enough to keep up with demand, or if you've got tables that have a very high churn rate and need to be kept small. In the later case, I'll usually setup a cronjob to vacuum those tables once a minute with no cost delay. I'm sure there might be some other cases where not using autovac might make sense, but generally I'd much rather let autovac worry about this so I don't have to. I am running 8.2 (will be 8.2.1 soon). The box is Windows with 2GB RAM connected to a SAN over fiber. The data and pg_xlog are on separate partitions. Modified configuration: effective_cache_size = 1000MB random_page_cost = 3 default_statistics_target = 50 maintenance_work_mem = 256MB shared_buffers = 400MB temp_buffers = 10MB work_mem = 10MB max_fsm_pages = 150 One other useful manual vacuum to consider is running vacuumdb -av periodically (say, once a month) and looking at the last few lines of output. That will give you a good idea on how large you should set max_fsm_pages. Running the output of vacuumdb -av through pgFouine will give you other useful data. checkpoint_segments = 30 stats_row_level = off stats_start_collector = off Unless you're really trying to get the last ounce of performance out, it's probably not worth turning those stats settings off. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux
On Wed, 2007-01-10 at 14:15, Jim C. Nasby wrote: On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote: I am sure that this has been discussed before, but I can't seem to find any recent posts. (I am running PostgreSQL 8.2) I have always ran PostgreSQL on Linux in the past, but the company I am currently working for uses Windows on all of their servers. I don't have the luxury right now of running my own benchmarks on the two OSes, but wanted to know if anyone else has done a performance comparison. Is there any significant differences? One thing to consider... I've seen a case or two where pgbench running on windows with HyperThreading enabled was actually faster than with it turned off. (General experience has been that HT hurts PostgreSQL). I suspect that the windows kernel may have features that allow it to better utilize HT than linux. I've also seen a few comments in perform (and elsewhere) in the past that newer linux kernels seem to handle HT better than older ones, and also might give better numbers for certain situations. Note that you should really test with a wide variety of loads (i.e. a lot of parallel loads, a few etc...) to see what the curve looks like. If HT gets you 10% gain on 4 or fewer clients, but 20% slower with 8 clients, then hyperthreading might be a not so good choice. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] table partioning performance
On 1/9/07, Simon Riggs [EMAIL PROTECTED] wrote: If you are doing date range partitioning it should be fairly simple to load data into the latest table directly. That was the way I originally intended for it to be used. The rules approach isn't something I'd recommend as a bulk loading option and its a lot more complex anyway. The problem we have with blindly loading all data into the latest table is that some data ( 5%, possibly even much less) is actually delivered late and belongs in earlier partitions. So we still needed the ability to send data to an arbitrary partition. Steve
Re: [PERFORM] Partitioning
On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote: This seems so much more intuitive and simpler than what is required to set it up in PostgreSQL. Does PostgreSQL's approach to table partitioning have any advantage over MySQL? Is a nicer syntax planned for Postgres? The focus was to get the base functionality working, and working correctly. Another consideration is that there's multiple ways to accomplish the partitioning; exposing the basic functionality without enforcing a given interface provides more flexibility (ie: it appears that you can't do list partitioning with MySQL, while you can with PostgreSQL). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum
Please cc the list so others can help and learn. On Wed, Jan 10, 2007 at 03:43:00PM -0500, Jeremy Haile wrote: I'd suggest trying autovacuum and see how it does (though you might want to tune it to be more or less aggressive, and you'll probably want to enable the cost delay). What are some decent default values for the cost delay vacuum settings? I haven't used these before. I find that simply setting vacuum_cost_delay to 20 is generally a good starting point. I'll usually do that and then run a vacuum while watching disk activity; I try and tune it so that the disk is ~90% utilized with vacuum running. That allows a safety margin without stretching vacuums out forever. Also - do the default autovacuum settings make sense for tables on the scale of 10 million rows? For example, using the defaults it would require about a million rows (250 + 0.1 * 10 million) to be inserted/updated/deleted before analyzing - which seems high. (about 2 million for vacuum) Or am I overestimating how often I would need to vacuum/analyze these tables? Depends on your application... the way I look at it is that a setting of 0.1 means 10% dead space in the table. While 5% or 1% would be better, you hit a point of diminishing returns since you have to read the entire table and it's indexes to vacuum it. BTW, that's the default values for analyze... the defaults for vacuum are 2x that. Do most people use the default autovacuum settings successfully, or are they usually modified? I generally use the 8.2 defaults (which are much better than the 8.1 defaults) unless I'm really trying to tune things. What's more important is to make sure critical tables (such as queue tables) are getting vacuumed frequently so that they stay small. (Of course you also need to ensure there's no long running transactions). -- Jim C. Nasby, Database Architect [EMAIL PROTECTED] 512.569.9461 (cell) http://jim.nasby.net ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Partitioning
You can do list partitioning in MySQL: http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html My comment was not meant as a criticism of PostgreSQL's current state - I'm glad that it has partitioning. I'm simply wondering if there are any plans of adopting a more user-friendly syntax in the future similar to MySQL partitioning support. Having first-class citizen support of partitions would also allow some nice administrative GUIs and views to be built for managing them. Jeremy Haile On Wed, 10 Jan 2007 15:09:31 -0600, Jim C. Nasby [EMAIL PROTECTED] said: On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote: This seems so much more intuitive and simpler than what is required to set it up in PostgreSQL. Does PostgreSQL's approach to table partitioning have any advantage over MySQL? Is a nicer syntax planned for Postgres? The focus was to get the base functionality working, and working correctly. Another consideration is that there's multiple ways to accomplish the partitioning; exposing the basic functionality without enforcing a given interface provides more flexibility (ie: it appears that you can't do list partitioning with MySQL, while you can with PostgreSQL). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] table partioning performance
On 1/10/07, Jim C. Nasby [EMAIL PROTECTED] wrote: Except for the simplest partitioning cases, you'll be much better off using a trigger on the parent table to direct inserts/updates/deletes to the children. As a bonus, using a trigger makes it a lot more realistic to deal with an update moving data between partitions. In our application, data is never moved between partitions. The problem I found with triggers is the non-robustness of the PLpgSQL record data type. For example, in an on insert trigger, I can't determine the fields of the NEW record unless I hard code the column names into the trigger. This makes it hard to write a generic trigger, which I can use for all our partitioned tables. It would have been somewhat of a pain to write a separate trigger for each of our partitioned tables. For that and other reasons, we moved some of the insert logic up to the application level in our product. Steve
Re: [PERFORM] Partitioning
On Wed, 2007-01-10 at 15:15, Jeremy Haile wrote: You can do list partitioning in MySQL: http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html My comment was not meant as a criticism of PostgreSQL's current state - I'm glad that it has partitioning. I'm simply wondering if there are any plans of adopting a more user-friendly syntax in the future similar to MySQL partitioning support. Having first-class citizen support of partitions would also allow some nice administrative GUIs and views to be built for managing them. I don't think anyone took it as a negative criticism. Jim and I were both more pointing out that the development process of the two projects is somewhat different. In MySQL a small group that doesn't necessarily interact with a large user community sets out to implement a feature in a given time line with a given set of requirements and they tend to ignore what they see as esoteric requirements. In PostgreSQL a large development community that communicates fairly well with it's large user community put somewhat of the onus of proving the need and doing the initial proof of concept on those who say they need a feature, often working in a method where the chief hackers lend a hand to someone who wants the feature so they can get a proof of concept up and running. And example would be the auditing / time travel in the contrib/spi project. After several iterations, and given the chance to learn from the mistakes of the previous incarnations, something often rises out of that to produce the feature needed. Generally speaking the postgresql method takes longer, making life harder today, but produces cleaner more easily maintained solutions, making life easier in the future. Meanwhile the mysql method works faster, making life easier today, but makes compromises that might make life harder in the future. Something that embodies that difference is the table handler philosophy of both databases. PostgreSQL has the abstraction to have more than one table handler, but in practice has exactly one table handler. MySQL has the ability to have many table handlers, and in fact uses many of them. With PostgreSQL this means that things like the query parsing / execution and the table handler are tightly coupled. This results in things like transactable DDL. Sometimes this results in suggestions being dismissed out of hand because they would have unintended consequences. In MySQL, because of the multiple table handlers, many compromises on the query parsing have to be made. The most common one being that you can define constraints / foreign keys in a column item, and they will simply be ignored with no error or notice. The fk constraints have to go at the end of the column list to be parsed and executed. So, partitioning, being something that will touch a lot of parts of the database, isn't gonna just show up one afternoon in pgsql. It will likely take a few people making proof of concept versions before a consensus is reached and someone who has the ability codes it up. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled
BTW, that's the default values for analyze... the defaults for vacuum are 2x that. Yeah - I was actually more concerned that tables would need to be analyzed more often than I was about vacuuming too often, so I used analyze as the example. Since my app is inserting constantly throughout the day and querying for recent data - I want to make sure the query planner realizes that there are lots of rows with new timestamps on them. In other words, if I run a query select * from mytable where timestamp '9:00am' - I want to make sure it hasn't been a day since the table was analyzed, so the planner thinks there are zero rows greater than 9:00am today. What's more important is to make sure critical tables (such as queue tables) are getting vacuumed frequently so that they stay small. Is the best way to do that usually to lower the scale factors? Is it ever a good approach to lower the scale factor to zero and just set the thresholds to a pure number of rows? (when setting it for a specific table) Thanks, Jeremy Haile ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] UNSUBSCRIBE
- Original Message - From: Jim C. Nasby [EMAIL PROTECTED] To: Bob Dusek [EMAIL PROTECTED] Cc: pgsql-performance@postgresql.org Sent: Wednesday, January 10, 2007 2:01 PM Subject: Re: [PERFORM] performance implications of binary placement Are you 100% certain that both builds are using all the same libraries? And to be an apples-apples comparison, you really need to ensure that the datadir is on the same filesystem in both cases (that's the first thing I'd check). Also, that pg_index... error sounds like the second build has been corrupted. On Tue, Dec 26, 2006 at 03:37:47PM -0500, Bob Dusek wrote: Hello all, I've been running performance tests on various incantations of Postgres on/off for a month or so. And, I've just come across some unexpected results. When I start my Postgres build as such: # (Scenario 1) ./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages --disable-rpath --enable-thread-safety --enable-integer-datetimes --without-python --without-perl --without-tcl --without-tk It performs significantly worse than when I start my build like this: # (Scenario 2) ./configure --disable-rpath --enable-thread-safety --enable-integer-datetimes --without-python --without-perl --without-tcl --without-tk Note: the only differences are that Scenario 1 includes these options: --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages And, to be clear, Scenario 1 performs worse than Scenario 2. Simple insert statements are taking significantly longer. I did not expect to see a performance hit with these options, especially since /usr/ on the test machine is mounted as its own partition, and in both cases, all of the binaries, include files, etc. are in that partition. Has anyone seen this before? Are hard drive mechanics the only thing in play here? The only difference I'm seeing in logging between the two versions is that Scenario 2 has several of this message littered throughout the logfile: ERROR: could not open relation pg_index_indexrelid_index: No such file or directory But, that doesn't seem to be effecting functionality or performance (especially considering the fact that the logfile that contains that message is part of the test that is performing better). We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8 source rpm. Thanks for any help you can provide. I can provide more detail if needed. Thanks again, Bob ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] table partioning performance
On Wed, Jan 10, 2007 at 04:39:06PM -0500, Steven Flatt wrote: On 1/10/07, Jim C. Nasby [EMAIL PROTECTED] wrote: Except for the simplest partitioning cases, you'll be much better off using a trigger on the parent table to direct inserts/updates/deletes to the children. As a bonus, using a trigger makes it a lot more realistic to deal with an update moving data between partitions. In our application, data is never moved between partitions. The problem I found with triggers is the non-robustness of the PLpgSQL record data type. For example, in an on insert trigger, I can't determine the fields of the NEW record unless I hard code the column names into the trigger. This makes it hard to write a generic trigger, which I can use for all our partitioned tables. It would have been somewhat of a pain to write a separate trigger for each of our partitioned tables. For that and other reasons, we moved some of the insert logic up to the application level in our product. Yeah, I think the key there would be to produce a function that wrote the function for you. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum
On Wed, Jan 10, 2007 at 04:48:42PM -0500, Jeremy Haile wrote: BTW, that's the default values for analyze... the defaults for vacuum are 2x that. Yeah - I was actually more concerned that tables would need to be analyzed more often than I was about vacuuming too often, so I used analyze as the example. Since my app is inserting constantly throughout the day and querying for recent data - I want to make sure the query planner realizes that there are lots of rows with new timestamps on them. In other words, if I run a query select * from mytable where timestamp '9:00am' - I want to make sure it hasn't been a day since the table was analyzed, so the planner thinks there are zero rows greater than 9:00am today. Well, analyze is pretty cheap. At most it'll read only 30,000 pages, which shouldn't take terribly long on a decent system. So you can be a lot more aggressive with it. What's more important is to make sure critical tables (such as queue tables) are getting vacuumed frequently so that they stay small. Is the best way to do that usually to lower the scale factors? Is it ever a good approach to lower the scale factor to zero and just set the thresholds to a pure number of rows? (when setting it for a specific table) The problem is what happens if autovac goes off and starts vacuuming some large table? While that's going on your queue table is sitting there bloating. If you have a separate cronjob to handle the queue table, it'll stay small, especially in 8.2. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum
Jim C. Nasby wrote: Is the best way to do that usually to lower the scale factors? Is it ever a good approach to lower the scale factor to zero and just set the thresholds to a pure number of rows? (when setting it for a specific table) The problem is what happens if autovac goes off and starts vacuuming some large table? While that's going on your queue table is sitting there bloating. If you have a separate cronjob to handle the queue table, it'll stay small, especially in 8.2. You mean at least in 8.2. In previous releases, you could vacuum that queue table until you were blue on the face, but it would achieve nothing because it would consider that the dead tuples were visible to a running transaction: that running the vacuum on the large table. This is an annoyance that was fixed in 8.2. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum
On Thu, Jan 11, 2007 at 12:10:34AM -0300, Alvaro Herrera wrote: Jim C. Nasby wrote: Is the best way to do that usually to lower the scale factors? Is it ever a good approach to lower the scale factor to zero and just set the thresholds to a pure number of rows? (when setting it for a specific table) The problem is what happens if autovac goes off and starts vacuuming some large table? While that's going on your queue table is sitting there bloating. If you have a separate cronjob to handle the queue table, it'll stay small, especially in 8.2. You mean at least in 8.2. In previous releases, you could vacuum that queue table until you were blue on the face, but it would achieve nothing because it would consider that the dead tuples were visible to a running transaction: that running the vacuum on the large table. This is an annoyance that was fixed in 8.2. True, but in many environments there are other transactions that run long enough that additional vacuums while a long vacuum was running would still help. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly