Re: [PERFORM] dissimilar drives in Raid10 , does it make difference ?
Matthew Wakeling wrote: In fact, it is recommended that two different types of drives are used. That way, if there's a mass failure of a whole batch of drives from one particular vendor, you don't lose all your data. Don't think this is just paranoia, either. I've had it happen to me SEVERAL times - either a second drive fails before I can replace the first, or the second drive in a pair fails during rebuild onto the replacement. I use regular RAID scrubbing, so this isn't just a case of undetected media errors. Nothing beats good backups. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
Alexander Staubo wrote: On Fri, Feb 13, 2009 at 12:53 PM, Alexander Staubo a...@bengler.no wrote: The upgrade was done with dump/restore using pg_dump -Fc. The old database lived on a SAN volume, whereas the new database lives on a local disk volume. I need to correct myself: The Munin graphs were never set to track the SAN volume where the old database lived. So when the graph goes from near-zero to lots, it's actually correct. When I compare the correct graph, however, it's apparently that I/O writes have, on average, doubled. Is there any chance you had pg_xlog stored separately on your old database, and I/O for it wasn't being recorded? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes: why is it not a good idea to give end users control over when they want to run it ? It's not a particularly good idea to give end users things that they are likely then to *immediately* use to shoot themselves in the foot. Turning off vacuuming all day is the sort of thing that is indeed pretty certain to hurt you when you imagined it was going to help you. In particular, if you shut off autovac all day, heavily updated tables with certain sorts of (pretty common!) update patterns are certain to bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on them. In effect, the practical effect of autovacuum at lean hours only is more reasonably described as cancel autovacuum and revert to the elder policy of requiring users to do manual vacuuming. It's worth looking at how autovacuum has been evolving over time... - When it was introduced, 8.0-ish (maybe it was 8.1 when it became official), it was pretty primitive. Autovac was a single process, where you had three controls over behaviour: - You could run autovac, or not; - You could exclude specific tables from being processed by autovac - There is a capability to vacuum less aggressively by using delays to reduce autovac I/O usage - In 8.3, it was enhanced to add the notion of having multiple vacuum workers There was discussion about having one of those workers restrict itself to small tables, so that you'd never have the scenario where the workers were all busy and a small table that needed vacuuming was left unvacuumed for a long time. It appears that didn't happen, which seems unfortunate, but that's life... You should look at all the knobs that *are* offered before deciding a policy that may be harmful to performance. As things stand now, there are a couple of ways I could see tuning this: - You might check on the GUC variables autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit, which would allow you to restrict the I/O cost. This might allow you to run autovacuum all the time without adversely affecting performance. - You might come up with a list of the *LARGE* tables that you don't want vacuumed during the day, and set up a cron job that adds/drops them from the pg_autovacuum table at the appropriate times. This is definitely going to be more finicky, and requires a great deal more awareness of the tables being updated by your applications. It makes autovacuum a whole lot less automatic. There are known enhancements coming up: - In 8.4, there is a capability for VACUUM to only process the portions of the table known to have been altered. That ought to be a better answer than *any* of the fiddling suggested, to date. Right now, a VACUUM on public.my_huge_table, a table 18GB in size, will walk through the entire table, even though there were only a handful of pages where tuples were invalidated. This is almost certainly the single best improvement possible to resolve your issue; it seems likely to *directly* address the problem, and has the considerable merit of not requiring much if any configuration/reconfiguration/scheduling. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://linuxdatabases.info/info/ what would we do without C? we would have PASAL, BASI, OBOL, and Ommon Lisp. -- #Erik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
Its nice to know the evolution of autovacuum and i understand that the suggestion/requirement of autovacuum at lean hours only was defeating the whole idea. regds --rajesh kumar mallah. On Fri, Feb 13, 2009 at 11:07 PM, Chris Browne cbbro...@acm.org wrote: mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes: why is it not a good idea to give end users control over when they want to run it ? It's not a particularly good idea to give end users things that they are likely then to *immediately* use to shoot themselves in the foot. Turning off vacuuming all day is the sort of thing that is indeed pretty certain to hurt you when you imagined it was going to help you. In particular, if you shut off autovac all day, heavily updated tables with certain sorts of (pretty common!) update patterns are certain to bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on them. In effect, the practical effect of autovacuum at lean hours only is more reasonably described as cancel autovacuum and revert to the elder policy of requiring users to do manual vacuuming. It's worth looking at how autovacuum has been evolving over time... - When it was introduced, 8.0-ish (maybe it was 8.1 when it became official), it was pretty primitive. Autovac was a single process, where you had three controls over behaviour: - You could run autovac, or not; - You could exclude specific tables from being processed by autovac - There is a capability to vacuum less aggressively by using delays to reduce autovac I/O usage - In 8.3, it was enhanced to add the notion of having multiple vacuum workers There was discussion about having one of those workers restrict itself to small tables, so that you'd never have the scenario where the workers were all busy and a small table that needed vacuuming was left unvacuumed for a long time. It appears that didn't happen, which seems unfortunate, but that's life... You should look at all the knobs that *are* offered before deciding a policy that may be harmful to performance. As things stand now, there are a couple of ways I could see tuning this: - You might check on the GUC variables autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit, which would allow you to restrict the I/O cost. This might allow you to run autovacuum all the time without adversely affecting performance. - You might come up with a list of the *LARGE* tables that you don't want vacuumed during the day, and set up a cron job that adds/drops them from the pg_autovacuum table at the appropriate times. This is definitely going to be more finicky, and requires a great deal more awareness of the tables being updated by your applications. It makes autovacuum a whole lot less automatic. There are known enhancements coming up: - In 8.4, there is a capability for VACUUM to only process the portions of the table known to have been altered. That ought to be a better answer than *any* of the fiddling suggested, to date. Right now, a VACUUM on public.my_huge_table, a table 18GB in size, will walk through the entire table, even though there were only a handful of pages where tuples were invalidated. This is almost certainly the single best improvement possible to resolve your issue; it seems likely to *directly* address the problem, and has the considerable merit of not requiring much if any configuration/reconfiguration/scheduling. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://linuxdatabases.info/info/ what would we do without C? we would have PASAL, BASI, OBOL, and Ommon Lisp. -- #Erik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow queries from information schema
I'm aware you already know that information_schema is slow [1] [2], so I just want to expose/document another case and tests I did. I'm using the following view to check what tables depend on what other tables. CREATE VIEW raw_relation_tree AS SELECT tc_p.table_catalog AS parent_catalog, tc_p.table_schema AS parent_schema, tc_p.table_name AS parent_table, tc_c.table_catalog AS child_catalog, tc_c.table_schema AS child_schema, tc_c.table_name AS child_table FROM information_schema.referential_constraints AS rc NATURAL JOIN information_schema.table_constraints AS tc_c LEFT JOIN information_schema.table_constraints AS tc_p ON rc.unique_constraint_catalog = tc_p.constraint_catalog AND rc.unique_constraint_schema = tc_p.constraint_schema AND rc.unique_constraint_name = tc_p.constraint_name ; test=# select count(*) from raw_relation_tree; count --- 11 (1 row) An EXPLAIN ANALYZE for a simple SELECT on each of the FROM tables give: referential_constraints: ~9ms. table_constraints: ~24ms. The result, on the above view: ~80ms. Fair enough. But if I apply a condition: SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE parent_schema child_schema; it takes ~2 seconds (!) to complete. I tried using an alternate table_constraints definition by creating my own view and changing UNION to UNION ALL (as per [2]) The results were: table_constraints using UNION ALL has the same number of rows as the UNION version. table_constraints now take about 4 ms (as expected). VIEW raw_relation_tree is now 110 ms. VIEW raw_relation_tree WHERE parent_schema child_schema: 3.3 sec. EXPLAIN results are way too long to post here. If it is ok, I'll gladly post them. Using 8.3.6. [1] http://archives.postgresql.org/pgsql-bugs/2008-12/msg00144.php [2] http://archives.postgresql.org/pgsql-performance/2008-05/msg00062.php -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Fri, Feb 13, 2009 at 6:35 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: You should definitely set effective_cache_size. Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for cache), and today's average write frequency went from 20MB/sec to just 1MB/sec. The documentation suggests that effective_cache_size is only used for query planning in conjunction with indexes. So how come it affects write access? If you still see the problem after that, I suggest testing different settings for: bgwriter_lru_maxpages bgwriter_lru_multiplier checkpoint_segments checkpoint_timeout checkpoint_completion_target Both the checkpoint process and the background writer changed quite a bit, and these are the main knobs for tuning the new version. We are hoping to set up a duplicate instance and play back the SQL log against it so we can experiment with different settings. Until we have such a setup, I'm not sure what to do with the knobs other than frob them wildly. :-) Are there any statistics, either in PostgreSQL proper or in the OS, that I can use as metrics to guide the tuning? For example, is there anything in pg_stat_bgwriter that can help me tune the bgwriter_lru_* settings? Do transactions that only contain query statements end up writing entries to the WAL when they commit? If yes, can we avoid the writes by wrapping our queries in read only transactions, and would it be worth the effort? Our web application is handling 30 requests per second at peak time; each request is performing dozens queries in autocommit mode, ie. one transaction per query. Only a minority of those requests actually end up modifying the database. PostgreSQL is committing and fsyncing 600-800 transactions per second, so that's probably a good chunk of disk/CPU usage wasted, right? Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Sat, Feb 14, 2009 at 9:49 AM, Craig Ringer cr...@postnewspapers.com.au wrote: Is there any chance you had pg_xlog stored separately on your old database, and I/O for it wasn't being recorded? No, the database files have always been on a single volume. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
Alexander Staubo a...@bengler.no writes: kevin.gritt...@wicourts.gov wrote: You should definitely set effective_cache_size. Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for cache), and today's average write frequency went from 20MB/sec to just 1MB/sec. The documentation suggests that effective_cache_size is only used for query planning in conjunction with indexes. So how come it affects write access? It *is* only used for query planning. A plausible theory is that you caused some join queries to change from hash or merge joining involving a temporary hash or sort file to an index nestloop that doesn't use any temporary storage. If the required hash or sort storage exceeded work_mem, which you have set to just 10MB, that would have created some write traffic. Did you happen to notice whether your queries got faster or slower when you did this? Watching only aggregate write traffic is a pretty limited view of what is happening in your database. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow queries from information schema
Octavio Alvarez alvar...@alvarezp.ods.org writes: The result, on the above view: ~80ms. Fair enough. But if I apply a condition: SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE parent_schema child_schema; it takes ~2 seconds (!) to complete. I'm not sure I'm seeing the exact same case as you, but what I see here is that 8.3 puts the join condition involving _pg_keysequal() at the top of the tree where it will be executed quite a lot of times (way more than the planner expects, because of bad rowcount estimates below) ... and _pg_keysequal() is implemented in a depressingly inefficient way. CVS HEAD seems to avoid this trap in the same case, but I'm not entirely convinced whether it's getting better rowcount estimates or just got lucky. Anyway it seems to help a great deal if you use a less sucky definition of the function, such as create or replace function information_schema._pg_keysequal(smallint[], smallint[]) RETURNS boolean LANGUAGE sql STRICT IMMUTABLE AS 'select $1 @ $2 and $2 @ $1'; regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I/O increase after upgrading to 8.3.5
On Sat, Feb 14, 2009 at 8:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: Alexander Staubo a...@bengler.no writes: Wow -- I set this to 10GB (5GB for shared buffers + another 5GB for cache), and today's average write frequency went from 20MB/sec to just 1MB/sec. The documentation suggests that effective_cache_size is only used for query planning in conjunction with indexes. So how come it affects write access? It *is* only used for query planning. A plausible theory is that you caused some join queries to change from hash or merge joining involving a temporary hash or sort file to an index nestloop that doesn't use any temporary storage. If the required hash or sort storage exceeded work_mem, which you have set to just 10MB, that would have created some write traffic. Interesting. Is there any statistic available that can tell me whether work_mem is being exceeded? The tools to monitor exactly what PostgreSQL is doing -- especially on a production box -- are fairly limited, especially since Linux does not yet have anything close to DTrace in functionality. Did you happen to notice whether your queries got faster or slower when you did this? Watching only aggregate write traffic is a pretty limited view of what is happening in your database. Unfortunately we don't log SQL queries at the moment. We do log application response times, but they look roughly the same as before the change. I could revert the effective_cache_size setting, turn on SQL logging for a while, then reapply the change and compare. Alexander. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow queries from information schema
On Sat, 2009-02-14 at 15:02 -0500, Tom Lane wrote: Octavio Alvarez alvar...@alvarezp.ods.org writes: The result, on the above view: ~80ms. Fair enough. But if I apply a condition: SELECT * FROM ___pgnui_relation_tree.raw_relation_tree WHERE parent_schema child_schema; it takes ~2 seconds (!) to complete. I'm not sure I'm seeing the exact same case as you, but what I see here is that 8.3 puts the join condition involving _pg_keysequal() at the top of the tree where it will be executed quite a lot of times (way more than the planner expects, because of bad rowcount estimates below) ... and _pg_keysequal() is implemented in a depressingly inefficient way. CVS HEAD seems to avoid this trap in the same case, but I'm not entirely convinced whether it's getting better rowcount estimates or just got lucky. Anyway it seems to help a great deal if you use a less sucky definition of the function, such as create or replace function information_schema._pg_keysequal(smallint[], smallint[]) RETURNS boolean LANGUAGE sql STRICT IMMUTABLE AS 'select $1 @ $2 and $2 @ $1'; Wow! Just tried it with the UNION (the original) version of information_schema.table_constraints and it drastically reduced the total runtime to 309 ms! I also tested it with UNION ALL and it took 1.6 sec. (and yet, 50% of the previous time with UNION ALL). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Bad plan for nested loop + limit
This dinky little query takes about 4 seconds to run: select event_occurrences.* from event_occurrences join section_items on section_items.subject_id = event_occurrences.event_id and section_items.subject_type = 'Event' and section_items.sandbox_id = 16399 where event_occurrences.start_time '2009-02-14 18:15:14.739411 +0100' order by event_occurrences.start_time limit 4; Output from explain analyze: Limit (cost=0.00..973.63 rows=4 width=48) (actual time=61.554..4039.704 rows=1 loops=1) - Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual time=61.552..4039.700 rows=1 loops=1) - Nested Loop (cost=0.00..68247.77 rows=297 width=52) (actual time=61.535..4039.682 rows=1 loops=1) - Index Scan using index_event_occurrences_on_start_time on event_occurrences (cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152 rows=155197 loops=1) Index Cond: (start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) - Index Scan using index_section_items_on_subject_type_and_subject_id on section_items (cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0 loops=155197) Index Cond: (((section_items.subject_type)::text = 'Event'::text) AND (section_items.subject_id = event_occurrences.event_id)) Filter: (section_items.sandbox_id = 16399) - Index Scan using event_instances_pkey on events (cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: (events.id = event_occurrences.event_id) Filter: (events.read_permissions = (-1)) Total runtime: 4039.788 ms Now, if I use limit 50 it uses a plan that is several orders of magnitude more efficient: Limit (cost=6202.38..6202.51 rows=50 width=48) (actual time=0.170..0.171 rows=1 loops=1) - Sort (cost=6202.38..6203.20 rows=326 width=48) (actual time=0.170..0.170 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB - Nested Loop (cost=5.09..6191.55 rows=326 width=48) (actual time=0.160..0.161 rows=1 loops=1) - Bitmap Heap Scan on section_items (cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7 loops=1) Recheck Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) - Bitmap Index Scan on index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0) (actual time=0.018..0.018 rows=7 loops=1) Index Cond: (sandbox_id = 16399) - Index Scan using index_event_occurrences_on_event_id on event_occurrences (cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0 loops=7) Index Cond: (event_occurrences.event_id = section_items.subject_id) Filter: (event_occurrences.start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 0.210 ms Similarly if I disable nested joins with set enable_nestloop = off: Limit (cost=10900.13..10900.14 rows=4 width=48) (actual time=191.476..191.478 rows=1 loops=1) - Sort (cost=10900.13..10900.95 rows=326 width=48) (actual time=191.474..191.475 rows=1 loops=1) Sort Key: event_occurrences.start_time Sort Method: quicksort Memory: 25kB - Hash Join (cost=8944.52..10895.24 rows=326 width=48) (actual time=162.104..191.463 rows=1 loops=1) Hash Cond: (section_items.subject_id = event_occurrences.event_id) - Bitmap Heap Scan on section_items (cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7 loops=1) Recheck Cond: (sandbox_id = 16399) Filter: ((subject_type)::text = 'Event'::text) - Bitmap Index Scan on index_section_items_on_sandbox_id (cost=0.00..5.06 rows=107 width=0) (actual time=0.019..0.019 rows=7 loops=1) Index Cond: (sandbox_id = 16399) - Hash (cost=5580.54..5580.54 rows=157752 width=48) (actual time=161.832..161.832 rows=155197 loops=1) - Seq Scan on event_occurrences (cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406 rows=155197 loops=1) Filter: (start_time '2009-02-14 18:15:14.739411+01'::timestamp with time zone) Total runtime: 192.496 ms Some statistics: # # select attname, n_distinct from pg_stats where tablename = 'event_occurrences'; attname | n_distinct + id | -1 created_at | -0.291615 updated_at | -0.294081 created_by |715 updated_by |715 event_id | 2146 start_time | -0.10047 end_time | 5602 # select attname, n_distinct from pg_stats where tablename = 'section_items'; attname| n_distinct --+ id
Re: [PERFORM] Bad plan for nested loop + limit
On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo a...@bengler.no wrote: Output from explain analyze: Limit (cost=0.00..973.63 rows=4 width=48) (actual time=61.554..4039.704 rows=1 loops=1) - Nested Loop (cost=0.00..70101.65 rows=288 width=48) (actual time=61.552..4039.700 rows=1 loops=1) - Nested Loop (cost=0.00..68247.77 rows=297 width=52) (actual time=61.535..4039.682 rows=1 loops=1) Those estimates are pretty far off. Did you try increasing the statistics target? Also, is the first query repeatable (that is, is it already in cache when you do the test, or alternately, are all queries *out* of cache when you test?) -- - David T. Wilson david.t.wil...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance