[PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt
Can you help me understand how to optimize the following.  There's a 
subplan which in this case returns 3 rows,

but it is really expensive:



=
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
 OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
 WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
   
QUERY PLAN

 Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual 
time=4362.143..6002.808 rows=28 loops=1)

   Hash Cond: (articles.context_key = contexts.context_key)
   ->  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4) 
(actual time=0.558..3953.002 rows=517356 loops=1)

 Filter: indexed
   ->  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual 
time=829.501..829.501 rows=31 loops=1)
 ->  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493 
width=4) (actual time=1.641..829.339 rows=31 loops=1)

   Filter: ((parent_key = 392210) OR (hashed subplan))
   SubPlan
 ->  Index Scan using collection_data_context_key_index 
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual 
time=0.018..0.023 rows=3 loops=1)

   Index Cond: (collection_context_key = 392210)
 Total runtime: 6002.976 ms
(11 rows)


=
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN 
(392210,392210,395073,1304250))

AND articles.indexed
;
  QUERY PLAN
---
 Nested Loop  (cost=14.35..1863.85 rows=94 width=4) (actual 
time=0.098..1.038 rows=57 loops=1)
   ->  Bitmap Heap Scan on contexts  (cost=14.35..572.57 rows=288 
width=4) (actual time=0.079..0.274 rows=59 loops=1)
 Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY 
('{392210,392210,395073,1304250}'::integer[])))
 ->  BitmapOr  (cost=14.35..14.35 rows=288 width=0) (actual 
time=0.066..0.066 rows=0 loops=1)
   ->  Bitmap Index Scan on parent_key_idx  
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)

 Index Cond: (parent_key = 392210)
   ->  Bitmap Index Scan on parent_key_idx  
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87 
loops=1)
 Index Cond: (parent_key = ANY 
('{392210,392210,395073,1304250}'::integer[]))
   ->  Index Scan using article_key_idx on articles  (cost=0.00..4.47 
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)

 Index Cond: (articles.context_key = contexts.context_key)
 Filter: articles.indexed
 Total runtime: 1.166 ms
(12 rows)

production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
 OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
 WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;


=
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2 
20061115 (prerelease) (Debian 4.1.1-21)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt

Shrirang Chitnis wrote:

Bryce,
The two queries are different:
   
Ah, due to a mistake.  The first version with the hashed subplan is from 
production.

The second version should have read:


production=> SELECT collection_data.context_key FROM collection_data 
WHERE collection_data.collection_context_key = 392210;

  392210
  395073
 1304250
production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.context_key IN 
(392210,395073,1304250))

AND articles.indexed
;
 QUERY PLAN
-
 Nested Loop  (cost=12.32..414.41 rows=20 width=4) (actual 
time=0.112..0.533 rows=28 loops=1)
   ->  Bitmap Heap Scan on contexts  (cost=12.32..135.13 rows=62 
width=4) (actual time=0.079..0.152 rows=31 loops=1)
 Recheck Cond: ((parent_key = 392210) OR (context_key = ANY 
('{392210,392210,395073,1304250}'::integer[])))
 ->  BitmapOr  (cost=12.32..12.32 rows=62 width=0) (actual 
time=0.070..0.070 rows=0 loops=1)
   ->  Bitmap Index Scan on parent_key_idx  
(cost=0.00..3.07 rows=58 width=0) (actual time=0.029..0.029 rows=28 loops=1)

 Index Cond: (parent_key = 392210)
   ->  Bitmap Index Scan on contexts_pkey  (cost=0.00..9.22 
rows=4 width=0) (actual time=0.037..0.037 rows=4 loops=1)
 Index Cond: (context_key = ANY 
('{392210,392210,395073,1304250}'::integer[]))
   ->  Index Scan using article_key_idx on articles  (cost=0.00..4.49 
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=31)

 Index Cond: (articles.context_key = contexts.context_key)
 Filter: articles.indexed
 Total runtime: 0.614 ms
(12 rows)







production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
 OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
 WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
   
QUERY PLAN

 Hash Join  (cost=83054.41..443755.45 rows=261077 width=4) (actual 
time=3415.609..6737.863 rows=28 loops=1)

   Hash Cond: (articles.context_key = contexts.context_key)
   ->  Seq Scan on articles  (cost=0.00..345661.91 rows=522136 width=4) 
(actual time=0.038..4587.914 rows=517416 loops=1)

 Filter: indexed
   ->  Hash  (cost=69921.25..69921.25 rows=800493 width=4) (actual 
time=926.965..926.965 rows=31 loops=1)
 ->  Seq Scan on contexts  (cost=14.31..69921.25 rows=800493 
width=4) (actual time=2.113..926.794 rows=31 loops=1)

   Filter: ((parent_key = 392210) OR (hashed subplan))
   SubPlan
 ->  Index Scan using collection_data_context_key_index 
on collection_data  (cost=0.00..14.30 rows=6 width=4) (actual 
time=0.084..0.088 rows=3 loops=1)

   Index Cond: (collection_context_key = 392210)
 Total runtime: 6738.042 ms
(11 rows)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt
Title: AW: [PERFORM] hashed subplan 5000x slower than two sequential
operations




Marc Mamin wrote:

  
  
  
  
  
  Hello,
are the table freshly analyzed, with a sufficient
default_statistics_target ?
  


autovacuum = on        # Enable autovacuum
subprocess?  'on' 
autovacuum_naptime = 5min # time between autovacuum runs
default_statistics_target = 150   # range 1-1000




  
You may try to get a better plan while rewriting the query as an UNION
to get rid of the OR clause.
Something like (not tested):
  

It is way better


EXPLAIN ANALYZE SELECT contexts.context_key
FROM contexts
    JOIN articles
    ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key =
392210)  
AND articles.indexed

UNION
SELECT collection_data.context_key
FROM collection_data
JOIN articles ON (articles.context_key=collection_data.context_key)
WHERE collection_data.collection_context_key = 392210
AND articles.indexed;


QUERY
PLAN


 Unique  (cost=418.50..418.61 rows=22 width=4) (actual
time=0.582..0.671 rows=28 loops=1)
   ->  Sort  (cost=418.50..418.55 rows=22 width=4) (actual
time=0.579..0.608 rows=28 loops=1)
 Sort Key: contexts.context_key
 Sort Method:  quicksort  Memory: 26kB
 ->  Append  (cost=0.00..418.01 rows=22 width=4) (actual
time=0.042..0.524 rows=28 loops=1)
   ->  Nested Loop  (cost=0.00..376.46 rows=19 width=4)
(actual time=0.040..0.423 rows=28 loops=1)
 ->  Index Scan using parent_key_idx on
contexts  (cost=0.00..115.20 rows=58 width=4) (actual time=0.021..0.082
rows=28 loops=1)
   Index Cond: (parent_key = 392210)
 ->  Index Scan using article_key_idx on
articles  (cost=0.00..4.49 rows=1 width=4) (actual time=0.007..0.008
rows=1 loops=28)
   Index Cond: (public.articles.context_key =
contexts.context_key)
   Filter: public.articles.indexed
   ->  Nested Loop  (cost=0.00..41.32 rows=3 width=4)
(actual time=0.043..0.043 rows=0 loops=1)
 ->  Index Scan using
collection_data_context_key_index on collection_data  (cost=0.00..14.30
rows=6 width=4) (actual time=0.012..0.015 rows=3 loops=1)
   Index Cond: (collection_context_key = 392210)
 ->  Index Scan using article_key_idx on
articles  (cost=0.00..4.49 rows=1 width=4) (actual time=0.006..0.006
rows=0 loops=3)
   Index Cond: (public.articles.context_key =
collection_data.context_key)
   Filter: public.articles.indexed
 Total runtime: 0.812 ms








Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

2010-12-08 Thread Bryce Nesbitt

Marc Mamin wrote:


Another point: would a conditionl index help ?
on articles (context_key) where indexed


no.

production=> select count(*),indexed from articles group by indexed;
 count  | indexed
+-
 517433 | t
 695814 | f

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-09 Thread Bryce Nesbitt
I've got a very slow query, which I can make faster by doing something
seemingly trivial. 
The query has been trouble for years (always slow, sometimes taking hours):

 512,600ms Original, filter on articles.indexed (622 results)
   7,500ms Remove "AND articles.indexed" (726 results, undesirable).
   7,675ms Extra join for "AND articles.indexed" (622 results, same as
original).

Hardware is Postgres 8.3 on a Sunfire X4240 under Debian Lenny, with a
fresh ANALYZE. What I don't understand is why the improvement? Is the
second way of doing things actually superior, or is this just a query
planner edge case?


Original (512,600ms)
-
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
  (SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
   INTERSECT
   SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
  (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
   WHERE lower(bp_categories.category) = 'law')
AND articles.indexed;

Extra join (7,675ms)
-
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts JOIN articles using (context_key)
WHERE contexts.context_key IN
(
SELECT contexts.context_key FROM contexts
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
 INTERSECT
 SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
(SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
 WHERE lower(bp_categories.category) = 'law')
)
AND articles.indexed;



# select indexed,count(*) from articles group by indexed;
 indexed | count
-+
 t   | 354605
 f   | 513552


QUERY PLAN   

 Hash Join  (cost=131663.39..140005.40 rows=4769 width=4) (actual
time=511893.241..512599.348 rows=622 loops=1)
   Hash Cond: (matview_82034.context_key = articles.context_key)
   ->  Seq Scan on matview_82034  (cost=0.00..6596.00 rows=465600
width=4) (actual time=0.019..463.278 rows=438220 loops=1)
   ->  Hash  (cost=131663.38..131663.38 rows=1 width=16) (actual
time=511659.671..511659.671 rows=622 loops=1)
 ->  Nested Loop IN Join  (cost=46757.70..131663.38 rows=1
width=16) (actual time=1142.789..511656.211 rows=622 loops=1)
   Join Filter: (a.context_key = articles.context_key)
   ->  Nested Loop  (cost=46757.70..46789.06 rows=2
width=12) (actual time=688.057..839.297 rows=1472 loops=1)
 ->  Nested Loop  (cost=46757.70..46780.26 rows=2
width=8) (actual time=688.022..799.945 rows=1472 loops=1)
   ->  Subquery Scan "IN_subquery" 
(cost=46757.70..46757.97 rows=5 width=4) (actual time=687.963..743.587
rows=1652 loops=1)
 ->  SetOp Intersect 
(cost=46757.70..46757.93 rows=5 width=4) (actual time=687.961..738.955
rows=1652 loops=1)
   ->  Sort 
(cost=46757.70..46757.81 rows=46 width=4) (actual time=687.943..709.972
rows=19527 loops=1)
 Sort Key: "*SELECT*
1".context_key
 Sort Method:  quicksort 
Memory: 1684kB
 ->  Append 
(cost=0.00..46756.43 rows=46 width=4) (actual time=8.385..657.839
rows=19527 loops=1)
   ->  Subquery Scan
"*SELECT* 1"  (cost=0.00..23378.21 rows=23 width=4) (actual
time=8.383..215.613 rows=4002 loops=1)
 ->  Nested
Loop  (cost=0.00..23377.98 rows=23 width=4) (actual time=8.380..207.499
rows=4002 loops=1)
   ->  Index
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual
time=0.102..0.105 rows=1 loops=1)

Index Cond: ((word)::text = 'insider'::text)
   ->  Index
Scan using article_words_wc on article_words  (cost=0.00..23219.17
rows=12268 width=8) (actual time=8.272..199.224 rows=4002 loops=1)

Index Cond: (public.article_words.word_key

[PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Bryce Nesbitt
Or, if you want to actually read that query plan, try:
http://explain.depesz.com/s/qYq


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-10 Thread Bryce Nesbitt
If you guys succeed in making this class of query perform, you'll have
beat out the professional consulting firm we hired, which was all but
useless!  The query is usually slow, but particular combinations of
words seem to make it obscenely slow.

The query plans are now attached (sorry I did not start there: many
lists reject attachments). Or you can click on "text" at the query
planner analysis site http://explain.depesz.com/s/qYq


_Here's typical server load:_
Tasks: 166 total,   1 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.2%us,  0.9%sy,  0.0%ni, 86.5%id, 11.2%wa,  0.0%hi,  0.1%si, 
0.0%st
Mem:  32966936k total, 32873860k used,93076k free, 2080k buffers
Swap: 33554424k total,  472k used, 33553952k free, 30572904k cached

_
Configurations modified from Postgres 8.3 default are:_
listen_addresses = '10.100.2.11, 10.101.2.11'   # what IP address(es) to
listen on;
port = 5432 # (change requires restart)
max_connections = 400   # (change requires restart)
shared_buffers = 4096MB # min 128kB or max_connections*16kB
work_mem = 16MB # min 64kB
max_fsm_pages =  50 # default:2
min:max_fsm_relations*16,6 bytes see:MAIN-5740
max_fsm_relations = 2700# min 100, ~70 bytes each
checkpoint_segments = 20# in logfile segments, min 1,
16MB each
random_page_cost = 2.0  # same scale as above
effective_cache_size = 28672MB
default_statistics_target = 150 # range 1-1000
log_destination = 'syslog'  # Valid values are combinations of
log_min_error_statement = error # values in order of decreasing detail:
log_min_duration_statement = 5000   # -1 is disabled, 0 logs all
statements
log_checkpoints = on# default off
autovacuum_naptime = 5min   # time between autovacuum runs
escape_string_warning = off # default:on  (See bepress
MAIN-4857)
standard_conforming_strings = off   # deafult:off (See bepress
MAIN-4857)



production=# EXPLAIN ANALYZE SELECT context_key FROM article_words
JOIN words using (word_key) WHERE word = 'insider';
-
 Nested Loop  (cost=0.00..23393.15 rows=23 width=4) (actual
time=0.077..15.637 rows=4003 loops=1)
   ->  Index Scan using words_word on words  (cost=0.00..5.47 rows=1
width=4) (actual time=0.049..0.051 rows=1 loops=1)
 Index Cond: ((word)::text = 'insider'::text)
   ->  Index Scan using article_words_wc on article_words 
(cost=0.00..23234.38 rows=12264 width=8) (actual time=0.020..7.237
rows=4003 loops=1)
 Index Cond: (article_words.word_key = words.word_key)
 Total runtime: 19.776 ms

production=# EXPLAIN ANALYZE SELECT context_key FROM article_words
WHERE word_key = 3675;
---
 Index Scan using article_words_wc on article_words 
(cost=0.00..21433.53 rows=11309 width=4) (actual time=0.025..7.579
rows=4003 loops=1)
   Index Cond: (word_key = 3675)
 Total runtime: 11.704 ms



production=# explain analyze select count(*) from article_words;
Aggregate  (cost=263831.63..263831.64 rows=1 width=0) (actual
time=35851.654..35851.655 rows=1 loops=1)
   ->  Seq Scan on words  (cost=0.00..229311.30 rows=13808130 width=0)
(actual time=0.043..21281.124 rows=13808184 loops=1)
 Total runtime: 35851.723 ms

production=# select count(*) from words;
13,808,184


production=# explain analyze select count(*) from article_words;
Aggregate  (cost=5453242.40..5453242.41 rows=1 width=0) (actual
time=776504.017..776504.018 rows=1 loops=1)
   ->  Seq Scan on article_words  (cost=0.00..4653453.52 rows=319915552
width=0) (actual time=0.034..438969.347 rows=319956663 loops=1)
 Total runtime: 776504.177 ms

production=# select count(*) from article_words;
319,956,720

Aggregate  (cost=139279.85..139279.86 rows=1 width=4) (actual 
time=502493.000..502493.001 rows=1 loops=1)
   ->  Hash Join  (cost=131314.31..139279.84 rows=2 width=4) (actual 
time=501787.397..502492.316 rows=622 loops=1)
 Hash Cond: (matview_82034.context_key = articles.context_key)
 ->  Seq Scan on matview_82034  (cost=0.00..6322.20 rows=438220 
width=4) (actual time=0.014..462.312 rows=438220 loops=1)
 ->  Hash  (cost=131314.30..131314.30 rows=1 width=16) (actual 
time=501553.755..501553.755 rows=622 loops=1)
   ->  Nested Loop IN Join  (cost=46291.79..131314.30 rows=1 
width=16) (actual time=467.546..501550.735 rows=622 loops=1)
 Join Filter: (a.context_key = articles.context_key)
 ->  Nested Loop  (cost=46291.79..46323.15 rows=2 width=12) 
(actual time=179.760..303.474 rows=1473 loops=1)
   ->  Nested Loop  (cost=46291.79..

Re: [PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-12 Thread Bryce Nesbitt






Kevin Grittner wrote:

  Bryce Nesbitt  wrote:
 
  
  
I've got a very slow query, which I can make faster by doing
something seemingly trivial. 

  
   
Out of curiosity, what kind of performance do you get with?:
 
EXPLAIN ANALYZE
SELECT contexts.context_key
  FROM contexts
  JOIN articles ON (articles.context_key = contexts.context_key)
  JOIN matview_82034 ON (matview_82034.context_key =
 contexts.context_key)
  WHERE EXISTS
(
  SELECT *
FROM article_words
JOIN words using (word_key)
WHERE context_key = contexts.context_key
  AND word = 'insider'
)
AND EXISTS
(
  SELECT *
FROM article_words
JOIN words using (word_key)
WHERE context_key = contexts.context_key
  AND word = 'trading'
)
AND EXISTS
(
  SELECT *
FROM virtual_ancestors a
JOIN bp_categories ON (bp_categories.context_key =
   a.ancestor_key)
WHERE a.context_key = contexts.context_key
  AND lower(bp_categories.category) = 'law'
)
AND articles.indexed
;
  

512,600ms query becomes 225,976ms.  Twice as fast on pos
Definitely not beating the 7500ms version.
PostgreSQL 8.3.4






EXPLAIN ANALYZE
SELECT contexts.context_key
  FROM contexts
  JOIN articles ON (articles.context_key = contexts.context_key)
  JOIN matview_82034 ON (matview_82034.context_key =
 contexts.context_key)
  WHERE EXISTS
(
  SELECT *
FROM article_words
JOIN words using (word_key)
WHERE context_key = contexts.context_key
  AND word = 'insider'
)
AND EXISTS
(
  SELECT *
FROM article_words
JOIN words using (word_key)
WHERE context_key = contexts.context_key
  AND word = 'trading'
)
AND EXISTS
(
  SELECT *
FROM virtual_ancestors a
JOIN bp_categories ON (bp_categories.context_key =
   a.ancestor_key)
WHERE a.context_key = contexts.context_key
  AND lower(bp_categories.category) = 'law'
)
AND articles.indexed
;





 Nested Loop  (cost=13511.95..13662023.05 rows=19282 width=4) (actual 
time=3070.834..225973.159 rows=622 loops=1)
   ->  Hash Join  (cost=13511.95..356938.38 rows=177937 width=8) (actual 
time=1106.242..7520.756 rows=351337 loops=1)
 Hash Cond: (articles.context_key = matview_82034.context_key)
 ->  Seq Scan on articles  (cost=0.00..334502.90 rows=386266 width=4) 
(actual time=0.030..4031.203 rows=355626 loops=1)
   Filter: indexed
 ->  Hash  (cost=6322.20..6322.20 rows=438220 width=4) (actual 
time=1105.663..1105.663 rows=438220 loops=1)
   ->  Seq Scan on matview_82034  (cost=0.00..6322.20 rows=438220 
width=4) (actual time=7.105..544.072 rows=438220 loops=1)
   ->  Index Scan using contexts_pkey on contexts  (cost=0.00..74.76 rows=1 
width=4) (actual time=0.619..0.619 rows=0 loops=351337)
 Index Cond: (contexts.context_key = articles.context_key)
 Filter: ((subplan) AND (subplan) AND (subplan))
 SubPlan
   ->  Nested Loop  (cost=0.00..30.54 rows=1 width=17) (actual 
time=6.119..6.119 rows=1 loops=983)
 ->  Index Scan using words_word on words  (cost=0.00..5.50 
rows=1 width=13) (actual time=0.028..0.029 rows=1 loops=983)
   Index Cond: ((word)::text = 'trading'::text)
 ->  Index Scan using article_words_cw on article_words  
(cost=0.00..25.02 rows=1 width=8) (actual time=6.082..6.082 rows=1 loops=983)
   Index Cond: ((public.article_words.context_key = $0) AND 
(public.article_words.word_key = public.words.word_key))
   ->  Nested Loop  (cost=0.00..30.54 rows=1 width=17) (actual 
time=6.196..6.196 rows=0 loops=26494)
 ->  Index Scan using words_word on words  (cost=0.00..5.50 
rows=1 width=13) (actual time=0.022..0.024 rows=1 loops=26494)
   Index Cond: ((word)::text = 'insider'::text)
 ->  Index Scan using article_words_cw on article_words  
(cost=0.00..25.02 rows=1 width=8) (actual time=6.165..6.165 rows=0 loops=26494)
   Index Cond: ((public.article_words.context_key = $0) AND 
(public.article_words.word_key = public.words.word_key))
   ->  Nested Loop  (cost=0.00..38.38 rows=3 width=29) (actual 
time=0.122..0.122 rows=0 loops=351337)
 ->  Index Scan usin

Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-12 Thread Bryce Nesbitt
Tom Lane wrote:
> Given that it estimated 1 row out of "words" (quite correctly) and 12264
> rows out of each scan on article_words, you'd think that the join size
> estimate would be 12264, which would be off by "only" a factor of 3 from
> the true result.  Instead it's 23, off by a factor of 200 :-(.
>   
Has anyone every proposed a "learning" query planner?  One that
eventually clues in to estimate mismatches like this?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-12 Thread Bryce Nesbitt




So as the op, back to the original posting

In the real world, what should I do?  Does it make sense to pull the
"AND articles.indexed" clause into an outer query?  Will that query
simply perform poorly on other arbitrary combinations of words?


I'm happy to test any given query against the
same set of servers. If it involves a persistent change
it has to run on a test server).  For example, the Robert Haas method:
# ...
Total runtime: 254207.857 ms

# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 1000;
# ANALYZE VERBOSE article_words
INFO:  analyzing "public.article_words"
INFO:  "article_words": scanned 30 of 1342374 pages, containing 64534899 live rows and 3264839 dead rows; 30 rows in sample, 288766568 estimated total rows
ANALYZE
# ...
Total runtime: 200591.751 ms

# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 50;
# ANALYZE VERBOSE article_words
# ...
Total runtime: 201204.972 ms


Sadly, it made essentially zero difference.  Attached.






preproduction-20091214=# EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'insider'
 INTERSECT
 SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'trading')
AND contexts.context_key IN
 (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON 
(a.ancestor_key = bp_categories.context_key)
 WHERE lower(bp_categories.category) = 'law') AND articles.indexed;

  QUERY PLAN
   
---
 Nested Loop  (cost=12861.91..62150.21 rows=2 width=4) (actual 
time=136.789..200744.455 rows=546 loops=1)
   ->  Nested Loop IN Join  (cost=12861.91..62148.32 rows=1 width=16) (actual 
time=136.777..200737.004 rows=546 loops=1)
 Join Filter: (a.context_key = articles.context_key)
 ->  Nested Loop  (cost=12861.91..12875.48 rows=2 width=12) (actual 
time=55.674..116.443 rows=1306 loops=1)
   ->  Nested Loop  (cost=12861.91..12871.68 rows=2 width=8) 
(actual time=55.662..97.863 rows=1306 loops=1)
 ->  Subquery Scan "IN_subquery"  (cost=12861.91..12862.18 
rows=5 width=4) (actual time=55.639..75.777 rows=1473 loops=1)
   ->  SetOp Intersect  (cost=12861.91..12862.14 rows=5 
width=4) (actual time=55.638..73.724 rows=1473 loops=1)
 ->  Sort  (cost=12861.91..12862.02 rows=46 
width=4) (actual time=55.631..62.140 rows=17892 loops=1)
   Sort Key: "*SELECT* 1".context_key
   Sort Method:  quicksort  Memory: 1607kB
   ->  Append  (cost=0.00..12860.63 rows=46 
width=4) (actual time=0.040..42.026 rows=17892 loops=1)
 ->  Subquery Scan "*SELECT* 1"  
(cost=0.00..6430.32 rows=23 width=4) (actual time=0.039..6.909 rows=3583 
loops=1)
   ->  Nested Loop  
(cost=0.00..6430.09 rows=23 width=4) (actual time=0.038..5.110 rows=3583 
loops=1)
 ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.019..0.020 
rows=1 loops=1)
   Index Cond: 
((word)::text = 'insider'::text)
 ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..6323.81 rows=8325 width=8) 
(actual time=0.015..3.243 rows=3583 loops=1)
   Index Cond: 
(public.article_words.word_key = public.words.word_key)
 ->  Subquery Scan "*SELECT* 2"  
(cost=0.00..6430.32 rows=23 width=4) (actual time=0.036..27.166 rows=14309 
loops=1)
   ->  Nested Loop  
(cost=0.00..6430.09 rows=23 width=4) (actual time=0.035..20.037 rows=14309 
loops=1)
 ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.015 
rows=1 loops=1)
   Index Cond: 
((word)::text = 'trading'::text)
 ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..6323.81 rows=8325 width=8) 
(actual time=0.017.

Re: [PERFORM] Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

2010-02-13 Thread Bryce Nesbitt
So how about the removal of the "AND" clause? On a test server, this
drops the query from 201204 to 438 ms.
Is this just random, or is it a real solution that might apply to any
arbitrary combination of words?

Attached are three test runs:
Total runtime: 201204.972 ms
Total runtime: 437.766 ms
Total runtime: 341.727 ms

preproduction-20091214=# ALTER TABLE article_words ALTER COLUMN word_key SET 
STATISTICS 50;
ALTER TABLE
preproduction-20091214=# ANALYZE VERBOSE article_words;
INFO:  analyzing "public.article_words"
INFO:  "article_words": scanned 75000 of 1342374 pages, containing 16124750 
live rows and 825250 dead rows; 75000 rows in sample, 288605935 estimated total 
rows
ANALYZE


preproduction-20091214=# EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'insider'
 INTERSECT
 SELECT context_key FROM article_words JOIN words using (word_key) WHERE 
word = 'trading')
AND contexts.context_key IN
 (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON 
(a.ancestor_key = bp_categories.context_key)
 WHERE lower(bp_categories.category) = 'law') AND articles.indexed;

  QUERY PLAN
   
---
 Nested Loop  (cost=12665.41..61953.71 rows=2 width=4) (actual 
time=140.799..201204.025 rows=546 loops=1)
   ->  Nested Loop IN Join  (cost=12665.41..61951.82 rows=1 width=16) (actual 
time=140.786..201196.526 rows=546 loops=1)
 Join Filter: (a.context_key = articles.context_key)
 ->  Nested Loop  (cost=12665.41..12678.98 rows=2 width=12) (actual 
time=55.453..116.790 rows=1306 loops=1)
   ->  Nested Loop  (cost=12665.41..12675.18 rows=2 width=8) 
(actual time=55.439..98.132 rows=1306 loops=1)
 ->  Subquery Scan "IN_subquery"  (cost=12665.41..12665.68 
rows=5 width=4) (actual time=55.415..76.025 rows=1473 loops=1)
   ->  SetOp Intersect  (cost=12665.41..12665.64 rows=5 
width=4) (actual time=55.413..73.885 rows=1473 loops=1)
 ->  Sort  (cost=12665.41..12665.52 rows=46 
width=4) (actual time=55.406..62.222 rows=17892 loops=1)
   Sort Key: "*SELECT* 1".context_key
   Sort Method:  quicksort  Memory: 1607kB
   ->  Append  (cost=0.00..12664.14 rows=46 
width=4) (actual time=0.060..42.065 rows=17892 loops=1)
 ->  Subquery Scan "*SELECT* 1"  
(cost=0.00..6332.07 rows=23 width=4) (actual time=0.059..6.962 rows=3583 
loops=1)
   ->  Nested Loop  
(cost=0.00..6331.84 rows=23 width=4) (actual time=0.058..5.148 rows=3583 
loops=1)
 ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.018..0.019 
rows=1 loops=1)
   Index Cond: 
((word)::text = 'insider'::text)
 ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..6227.18 rows=8195 width=8) 
(actual time=0.036..3.275 rows=3583 loops=1)
   Index Cond: 
(public.article_words.word_key = public.words.word_key)
 ->  Subquery Scan "*SELECT* 2"  
(cost=0.00..6332.07 rows=23 width=4) (actual time=0.037..27.136 rows=14309 
loops=1)
   ->  Nested Loop  
(cost=0.00..6331.84 rows=23 width=4) (actual time=0.035..19.912 rows=14309 
loops=1)
 ->  Index Scan using 
words_word on words  (cost=0.00..2.22 rows=1 width=4) (actual time=0.015..0.017 
rows=1 loops=1)
   Index Cond: 
((word)::text = 'trading'::text)
 ->  Index Scan using 
article_words_wc on article_words  (cost=0.00..6227.18 rows=8195 width=8) 
(actual time=0.018..12.464 rows=14309 loops=1)
   Index Cond: 
(public.article_words.word_key = public.words.word_key)
 ->  Index Scan using article_key_idx on articles  
(cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473)