[PERFORM] hashed subplan 5000x slower than two sequential operations
Can you help me understand how to optimize the following. There's a subplan which in this case returns 3 rows, but it is really expensive: = explain analyze SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE (contexts.parent_key = 392210 OR contexts.context_key IN (SELECT collection_data.context_key FROM collection_data WHERE collection_data.collection_context_key = 392210) ) AND articles.indexed ; QUERY PLAN Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual time=4362.143..6002.808 rows=28 loops=1) Hash Cond: (articles.context_key = contexts.context_key) -> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4) (actual time=0.558..3953.002 rows=517356 loops=1) Filter: indexed -> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual time=829.501..829.501 rows=31 loops=1) -> Seq Scan on contexts (cost=14.31..69921.25 rows=800493 width=4) (actual time=1.641..829.339 rows=31 loops=1) Filter: ((parent_key = 392210) OR (hashed subplan)) SubPlan -> Index Scan using collection_data_context_key_index on collection_data (cost=0.00..14.30 rows=6 width=4) (actual time=0.018..0.023 rows=3 loops=1) Index Cond: (collection_context_key = 392210) Total runtime: 6002.976 ms (11 rows) = explain analyze SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN (392210,392210,395073,1304250)) AND articles.indexed ; QUERY PLAN --- Nested Loop (cost=14.35..1863.85 rows=94 width=4) (actual time=0.098..1.038 rows=57 loops=1) -> Bitmap Heap Scan on contexts (cost=14.35..572.57 rows=288 width=4) (actual time=0.079..0.274 rows=59 loops=1) Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY ('{392210,392210,395073,1304250}'::integer[]))) -> BitmapOr (cost=14.35..14.35 rows=288 width=0) (actual time=0.066..0.066 rows=0 loops=1) -> Bitmap Index Scan on parent_key_idx (cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1) Index Cond: (parent_key = 392210) -> Bitmap Index Scan on parent_key_idx (cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87 loops=1) Index Cond: (parent_key = ANY ('{392210,392210,395073,1304250}'::integer[])) -> Index Scan using article_key_idx on articles (cost=0.00..4.47 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59) Index Cond: (articles.context_key = contexts.context_key) Filter: articles.indexed Total runtime: 1.166 ms (12 rows) production=> explain analyze SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE (contexts.parent_key = 392210 OR contexts.context_key IN (SELECT collection_data.context_key FROM collection_data WHERE collection_data.collection_context_key = 392210) ) AND articles.indexed ; = # select version(); PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hashed subplan 5000x slower than two sequential operations
Shrirang Chitnis wrote: Bryce, The two queries are different: Ah, due to a mistake. The first version with the hashed subplan is from production. The second version should have read: production=> SELECT collection_data.context_key FROM collection_data WHERE collection_data.collection_context_key = 392210; 392210 395073 1304250 production=> explain analyze SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE (contexts.parent_key = 392210 OR contexts.context_key IN (392210,395073,1304250)) AND articles.indexed ; QUERY PLAN - Nested Loop (cost=12.32..414.41 rows=20 width=4) (actual time=0.112..0.533 rows=28 loops=1) -> Bitmap Heap Scan on contexts (cost=12.32..135.13 rows=62 width=4) (actual time=0.079..0.152 rows=31 loops=1) Recheck Cond: ((parent_key = 392210) OR (context_key = ANY ('{392210,392210,395073,1304250}'::integer[]))) -> BitmapOr (cost=12.32..12.32 rows=62 width=0) (actual time=0.070..0.070 rows=0 loops=1) -> Bitmap Index Scan on parent_key_idx (cost=0.00..3.07 rows=58 width=0) (actual time=0.029..0.029 rows=28 loops=1) Index Cond: (parent_key = 392210) -> Bitmap Index Scan on contexts_pkey (cost=0.00..9.22 rows=4 width=0) (actual time=0.037..0.037 rows=4 loops=1) Index Cond: (context_key = ANY ('{392210,392210,395073,1304250}'::integer[])) -> Index Scan using article_key_idx on articles (cost=0.00..4.49 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=31) Index Cond: (articles.context_key = contexts.context_key) Filter: articles.indexed Total runtime: 0.614 ms (12 rows) production=> explain analyze SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE (contexts.parent_key = 392210 OR contexts.context_key IN (SELECT collection_data.context_key FROM collection_data WHERE collection_data.collection_context_key = 392210) ) AND articles.indexed ; QUERY PLAN Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual time=3415.609..6737.863 rows=28 loops=1) Hash Cond: (articles.context_key = contexts.context_key) -> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4) (actual time=0.038..4587.914 rows=517416 loops=1) Filter: indexed -> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual time=926.965..926.965 rows=31 loops=1) -> Seq Scan on contexts (cost=14.31..69921.25 rows=800493 width=4) (actual time=2.113..926.794 rows=31 loops=1) Filter: ((parent_key = 392210) OR (hashed subplan)) SubPlan -> Index Scan using collection_data_context_key_index on collection_data (cost=0.00..14.30 rows=6 width=4) (actual time=0.084..0.088 rows=3 loops=1) Index Cond: (collection_context_key = 392210) Total runtime: 6738.042 ms (11 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] hashed subplan 5000x slower than two sequential operations
Title: AW: [PERFORM] hashed subplan 5000x slower than two sequential operations Marc Mamin wrote: Hello, are the table freshly analyzed, with a sufficient default_statistics_target ? autovacuum = on # Enable autovacuum subprocess? 'on' autovacuum_naptime = 5min # time between autovacuum runs default_statistics_target = 150 # range 1-1000 You may try to get a better plan while rewriting the query as an UNION to get rid of the OR clause. Something like (not tested): It is way better EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) WHERE (contexts.parent_key = 392210) AND articles.indexed UNION SELECT collection_data.context_key FROM collection_data JOIN articles ON (articles.context_key=collection_data.context_key) WHERE collection_data.collection_context_key = 392210 AND articles.indexed; QUERY PLAN Unique (cost=418.50..418.61 rows=22 width=4) (actual time=0.582..0.671 rows=28 loops=1) -> Sort (cost=418.50..418.55 rows=22 width=4) (actual time=0.579..0.608 rows=28 loops=1) Sort Key: contexts.context_key Sort Method: quicksort Memory: 26kB -> Append (cost=0.00..418.01 rows=22 width=4) (actual time=0.042..0.524 rows=28 loops=1) -> Nested Loop (cost=0.00..376.46 rows=19 width=4) (actual time=0.040..0.423 rows=28 loops=1) -> Index Scan using parent_key_idx on contexts (cost=0.00..115.20 rows=58 width=4) (actual time=0.021..0.082 rows=28 loops=1) Index Cond: (parent_key = 392210) -> Index Scan using article_key_idx on articles (cost=0.00..4.49 rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=28) Index Cond: (public.articles.context_key = contexts.context_key) Filter: public.articles.indexed -> Nested Loop (cost=0.00..41.32 rows=3 width=4) (actual time=0.043..0.043 rows=0 loops=1) -> Index Scan using collection_data_context_key_index on collection_data (cost=0.00..14.30 rows=6 width=4) (actual time=0.012..0.015 rows=3 loops=1) Index Cond: (collection_context_key = 392210) -> Index Scan using article_key_idx on articles (cost=0.00..4.49 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=3) Index Cond: (public.articles.context_key = collection_data.context_key) Filter: public.articles.indexed Total runtime: 0.812 ms
Re: [PERFORM] hashed subplan 5000x slower than two sequential operations
Marc Mamin wrote: Another point: would a conditionl index help ? on articles (context_key) where indexed no. production=> select count(*),indexed from articles group by indexed; count | indexed +- 517433 | t 695814 | f -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
I've got a very slow query, which I can make faster by doing something seemingly trivial. The query has been trouble for years (always slow, sometimes taking hours): 512,600ms Original, filter on articles.indexed (622 results) 7,500ms Remove "AND articles.indexed" (726 results, undesirable). 7,675ms Extra join for "AND articles.indexed" (622 results, same as original). Hardware is Postgres 8.3 on a Sunfire X4240 under Debian Lenny, with a fresh ANALYZE. What I don't understand is why the improvement? Is the second way of doing things actually superior, or is this just a query planner edge case? Original (512,600ms) - EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') AND articles.indexed; Extra join (7,675ms) - EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles using (context_key) WHERE contexts.context_key IN ( SELECT contexts.context_key FROM contexts JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') ) AND articles.indexed; # select indexed,count(*) from articles group by indexed; indexed | count -+ t | 354605 f | 513552 QUERY PLAN Hash Join (cost=131663.39..140005.40 rows=4769 width=4) (actual time=511893.241..512599.348 rows=622 loops=1) Hash Cond: (matview_82034.context_key = articles.context_key) -> Seq Scan on matview_82034 (cost=0.00..6596.00 rows=465600 width=4) (actual time=0.019..463.278 rows=438220 loops=1) -> Hash (cost=131663.38..131663.38 rows=1 width=16) (actual time=511659.671..511659.671 rows=622 loops=1) -> Nested Loop IN Join (cost=46757.70..131663.38 rows=1 width=16) (actual time=1142.789..511656.211 rows=622 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=46757.70..46789.06 rows=2 width=12) (actual time=688.057..839.297 rows=1472 loops=1) -> Nested Loop (cost=46757.70..46780.26 rows=2 width=8) (actual time=688.022..799.945 rows=1472 loops=1) -> Subquery Scan "IN_subquery" (cost=46757.70..46757.97 rows=5 width=4) (actual time=687.963..743.587 rows=1652 loops=1) -> SetOp Intersect (cost=46757.70..46757.93 rows=5 width=4) (actual time=687.961..738.955 rows=1652 loops=1) -> Sort (cost=46757.70..46757.81 rows=46 width=4) (actual time=687.943..709.972 rows=19527 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1684kB -> Append (cost=0.00..46756.43 rows=46 width=4) (actual time=8.385..657.839 rows=19527 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..23378.21 rows=23 width=4) (actual time=8.383..215.613 rows=4002 loops=1) -> Nested Loop (cost=0.00..23377.98 rows=23 width=4) (actual time=8.380..207.499 rows=4002 loops=1) -> Index Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual time=0.102..0.105 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..23219.17 rows=12268 width=8) (actual time=8.272..199.224 rows=4002 loops=1) Index Cond: (public.article_words.word_key
[PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Or, if you want to actually read that query plan, try: http://explain.depesz.com/s/qYq -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
If you guys succeed in making this class of query perform, you'll have beat out the professional consulting firm we hired, which was all but useless! The query is usually slow, but particular combinations of words seem to make it obscenely slow. The query plans are now attached (sorry I did not start there: many lists reject attachments). Or you can click on "text" at the query planner analysis site http://explain.depesz.com/s/qYq _Here's typical server load:_ Tasks: 166 total, 1 running, 165 sleeping, 0 stopped, 0 zombie Cpu(s): 1.2%us, 0.9%sy, 0.0%ni, 86.5%id, 11.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 32966936k total, 32873860k used,93076k free, 2080k buffers Swap: 33554424k total, 472k used, 33553952k free, 30572904k cached _ Configurations modified from Postgres 8.3 default are:_ listen_addresses = '10.100.2.11, 10.101.2.11' # what IP address(es) to listen on; port = 5432 # (change requires restart) max_connections = 400 # (change requires restart) shared_buffers = 4096MB # min 128kB or max_connections*16kB work_mem = 16MB # min 64kB max_fsm_pages = 50 # default:2 min:max_fsm_relations*16,6 bytes see:MAIN-5740 max_fsm_relations = 2700# min 100, ~70 bytes each checkpoint_segments = 20# in logfile segments, min 1, 16MB each random_page_cost = 2.0 # same scale as above effective_cache_size = 28672MB default_statistics_target = 150 # range 1-1000 log_destination = 'syslog' # Valid values are combinations of log_min_error_statement = error # values in order of decreasing detail: log_min_duration_statement = 5000 # -1 is disabled, 0 logs all statements log_checkpoints = on# default off autovacuum_naptime = 5min # time between autovacuum runs escape_string_warning = off # default:on (See bepress MAIN-4857) standard_conforming_strings = off # deafult:off (See bepress MAIN-4857) production=# EXPLAIN ANALYZE SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider'; - Nested Loop (cost=0.00..23393.15 rows=23 width=4) (actual time=0.077..15.637 rows=4003 loops=1) -> Index Scan using words_word on words (cost=0.00..5.47 rows=1 width=4) (actual time=0.049..0.051 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237 rows=4003 loops=1) Index Cond: (article_words.word_key = words.word_key) Total runtime: 19.776 ms production=# EXPLAIN ANALYZE SELECT context_key FROM article_words WHERE word_key = 3675; --- Index Scan using article_words_wc on article_words (cost=0.00..21433.53 rows=11309 width=4) (actual time=0.025..7.579 rows=4003 loops=1) Index Cond: (word_key = 3675) Total runtime: 11.704 ms production=# explain analyze select count(*) from article_words; Aggregate (cost=263831.63..263831.64 rows=1 width=0) (actual time=35851.654..35851.655 rows=1 loops=1) -> Seq Scan on words (cost=0.00..229311.30 rows=13808130 width=0) (actual time=0.043..21281.124 rows=13808184 loops=1) Total runtime: 35851.723 ms production=# select count(*) from words; 13,808,184 production=# explain analyze select count(*) from article_words; Aggregate (cost=5453242.40..5453242.41 rows=1 width=0) (actual time=776504.017..776504.018 rows=1 loops=1) -> Seq Scan on article_words (cost=0.00..4653453.52 rows=319915552 width=0) (actual time=0.034..438969.347 rows=319956663 loops=1) Total runtime: 776504.177 ms production=# select count(*) from article_words; 319,956,720 Aggregate (cost=139279.85..139279.86 rows=1 width=4) (actual time=502493.000..502493.001 rows=1 loops=1) -> Hash Join (cost=131314.31..139279.84 rows=2 width=4) (actual time=501787.397..502492.316 rows=622 loops=1) Hash Cond: (matview_82034.context_key = articles.context_key) -> Seq Scan on matview_82034 (cost=0.00..6322.20 rows=438220 width=4) (actual time=0.014..462.312 rows=438220 loops=1) -> Hash (cost=131314.30..131314.30 rows=1 width=16) (actual time=501553.755..501553.755 rows=622 loops=1) -> Nested Loop IN Join (cost=46291.79..131314.30 rows=1 width=16) (actual time=467.546..501550.735 rows=622 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=46291.79..46323.15 rows=2 width=12) (actual time=179.760..303.474 rows=1473 loops=1) -> Nested Loop (cost=46291.79..
Re: [PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Kevin Grittner wrote: Bryce Nesbitt wrote: I've got a very slow query, which I can make faster by doing something seemingly trivial. Out of curiosity, what kind of performance do you get with?: EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key = contexts.context_key) JOIN matview_82034 ON (matview_82034.context_key = contexts.context_key) WHERE EXISTS ( SELECT * FROM article_words JOIN words using (word_key) WHERE context_key = contexts.context_key AND word = 'insider' ) AND EXISTS ( SELECT * FROM article_words JOIN words using (word_key) WHERE context_key = contexts.context_key AND word = 'trading' ) AND EXISTS ( SELECT * FROM virtual_ancestors a JOIN bp_categories ON (bp_categories.context_key = a.ancestor_key) WHERE a.context_key = contexts.context_key AND lower(bp_categories.category) = 'law' ) AND articles.indexed ; 512,600ms query becomes 225,976ms. Twice as fast on pos Definitely not beating the 7500ms version. PostgreSQL 8.3.4 EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key = contexts.context_key) JOIN matview_82034 ON (matview_82034.context_key = contexts.context_key) WHERE EXISTS ( SELECT * FROM article_words JOIN words using (word_key) WHERE context_key = contexts.context_key AND word = 'insider' ) AND EXISTS ( SELECT * FROM article_words JOIN words using (word_key) WHERE context_key = contexts.context_key AND word = 'trading' ) AND EXISTS ( SELECT * FROM virtual_ancestors a JOIN bp_categories ON (bp_categories.context_key = a.ancestor_key) WHERE a.context_key = contexts.context_key AND lower(bp_categories.category) = 'law' ) AND articles.indexed ; Nested Loop (cost=13511.95..13662023.05 rows=19282 width=4) (actual time=3070.834..225973.159 rows=622 loops=1) -> Hash Join (cost=13511.95..356938.38 rows=177937 width=8) (actual time=1106.242..7520.756 rows=351337 loops=1) Hash Cond: (articles.context_key = matview_82034.context_key) -> Seq Scan on articles (cost=0.00..334502.90 rows=386266 width=4) (actual time=0.030..4031.203 rows=355626 loops=1) Filter: indexed -> Hash (cost=6322.20..6322.20 rows=438220 width=4) (actual time=1105.663..1105.663 rows=438220 loops=1) -> Seq Scan on matview_82034 (cost=0.00..6322.20 rows=438220 width=4) (actual time=7.105..544.072 rows=438220 loops=1) -> Index Scan using contexts_pkey on contexts (cost=0.00..74.76 rows=1 width=4) (actual time=0.619..0.619 rows=0 loops=351337) Index Cond: (contexts.context_key = articles.context_key) Filter: ((subplan) AND (subplan) AND (subplan)) SubPlan -> Nested Loop (cost=0.00..30.54 rows=1 width=17) (actual time=6.119..6.119 rows=1 loops=983) -> Index Scan using words_word on words (cost=0.00..5.50 rows=1 width=13) (actual time=0.028..0.029 rows=1 loops=983) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_cw on article_words (cost=0.00..25.02 rows=1 width=8) (actual time=6.082..6.082 rows=1 loops=983) Index Cond: ((public.article_words.context_key = $0) AND (public.article_words.word_key = public.words.word_key)) -> Nested Loop (cost=0.00..30.54 rows=1 width=17) (actual time=6.196..6.196 rows=0 loops=26494) -> Index Scan using words_word on words (cost=0.00..5.50 rows=1 width=13) (actual time=0.022..0.024 rows=1 loops=26494) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_cw on article_words (cost=0.00..25.02 rows=1 width=8) (actual time=6.165..6.165 rows=0 loops=26494) Index Cond: ((public.article_words.context_key = $0) AND (public.article_words.word_key = public.words.word_key)) -> Nested Loop (cost=0.00..38.38 rows=3 width=29) (actual time=0.122..0.122 rows=0 loops=351337) -> Index Scan usin
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Tom Lane wrote: > Given that it estimated 1 row out of "words" (quite correctly) and 12264 > rows out of each scan on article_words, you'd think that the join size > estimate would be 12264, which would be off by "only" a factor of 3 from > the true result. Instead it's 23, off by a factor of 200 :-(. > Has anyone every proposed a "learning" query planner? One that eventually clues in to estimate mismatches like this? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
So as the op, back to the original posting In the real world, what should I do? Does it make sense to pull the "AND articles.indexed" clause into an outer query? Will that query simply perform poorly on other arbitrary combinations of words? I'm happy to test any given query against the same set of servers. If it involves a persistent change it has to run on a test server). For example, the Robert Haas method: # ... Total runtime: 254207.857 ms # ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 1000; # ANALYZE VERBOSE article_words INFO: analyzing "public.article_words" INFO: "article_words": scanned 30 of 1342374 pages, containing 64534899 live rows and 3264839 dead rows; 30 rows in sample, 288766568 estimated total rows ANALYZE # ... Total runtime: 200591.751 ms # ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 50; # ANALYZE VERBOSE article_words # ... Total runtime: 201204.972 ms Sadly, it made essentially zero difference. Attached. preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') AND articles.indexed; QUERY PLAN --- Nested Loop (cost=12861.91..62150.21 rows=2 width=4) (actual time=136.789..200744.455 rows=546 loops=1) -> Nested Loop IN Join (cost=12861.91..62148.32 rows=1 width=16) (actual time=136.777..200737.004 rows=546 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=12861.91..12875.48 rows=2 width=12) (actual time=55.674..116.443 rows=1306 loops=1) -> Nested Loop (cost=12861.91..12871.68 rows=2 width=8) (actual time=55.662..97.863 rows=1306 loops=1) -> Subquery Scan "IN_subquery" (cost=12861.91..12862.18 rows=5 width=4) (actual time=55.639..75.777 rows=1473 loops=1) -> SetOp Intersect (cost=12861.91..12862.14 rows=5 width=4) (actual time=55.638..73.724 rows=1473 loops=1) -> Sort (cost=12861.91..12862.02 rows=46 width=4) (actual time=55.631..62.140 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..12860.63 rows=46 width=4) (actual time=0.040..42.026 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..6430.32 rows=23 width=4) (actual time=0.039..6.909 rows=3583 loops=1) -> Nested Loop (cost=0.00..6430.09 rows=23 width=4) (actual time=0.038..5.110 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6323.81 rows=8325 width=8) (actual time=0.015..3.243 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..6430.32 rows=23 width=4) (actual time=0.036..27.166 rows=14309 loops=1) -> Nested Loop (cost=0.00..6430.09 rows=23 width=4) (actual time=0.035..20.037 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6323.81 rows=8325 width=8) (actual time=0.017.
Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
So how about the removal of the "AND" clause? On a test server, this drops the query from 201204 to 438 ms. Is this just random, or is it a real solution that might apply to any arbitrary combination of words? Attached are three test runs: Total runtime: 201204.972 ms Total runtime: 437.766 ms Total runtime: 341.727 ms preproduction-20091214=# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 50; ALTER TABLE preproduction-20091214=# ANALYZE VERBOSE article_words; INFO: analyzing "public.article_words" INFO: "article_words": scanned 75000 of 1342374 pages, containing 16124750 live rows and 825250 dead rows; 75000 rows in sample, 288605935 estimated total rows ANALYZE preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') AND articles.indexed; QUERY PLAN --- Nested Loop (cost=12665.41..61953.71 rows=2 width=4) (actual time=140.799..201204.025 rows=546 loops=1) -> Nested Loop IN Join (cost=12665.41..61951.82 rows=1 width=16) (actual time=140.786..201196.526 rows=546 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=12665.41..12678.98 rows=2 width=12) (actual time=55.453..116.790 rows=1306 loops=1) -> Nested Loop (cost=12665.41..12675.18 rows=2 width=8) (actual time=55.439..98.132 rows=1306 loops=1) -> Subquery Scan "IN_subquery" (cost=12665.41..12665.68 rows=5 width=4) (actual time=55.415..76.025 rows=1473 loops=1) -> SetOp Intersect (cost=12665.41..12665.64 rows=5 width=4) (actual time=55.413..73.885 rows=1473 loops=1) -> Sort (cost=12665.41..12665.52 rows=46 width=4) (actual time=55.406..62.222 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..12664.14 rows=46 width=4) (actual time=0.060..42.065 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.059..6.962 rows=3583 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.058..5.148 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.036..3.275 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.037..27.136 rows=14309 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.035..19.912 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.018..12.464 rows=14309 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Index Scan using article_key_idx on articles (cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473)