Hey Ray,

I'm by no means a guru but here is my simple analysis. In the first query, the 10 "documents" specified 'IN' the in are first selected from the "documents" table. Then, those 10 rows are joined with the "comments" table. In the second query, every row in the "documents" table is joined with the "comments" table, which took forever according to the Explain Analyse. Then, the results of the complete join of those two tables from are compared against the sub-select. I'm guessing (emphasis on guessing) that the query planner chose this approach because it does not know how many rows the sub-select will return so it does the join and then checks those results against that sub-select.

Another approach may be to do this.

Create Temp table limit_documents as select doc_num from documents limit 10;
select * from limit_documents left outer join comments on (limit_documents.doc_num = comments.doc_num);


Hope that helps,
Ted

Richard Ray wrote:
Allow me to demonstrate my pitiful SQL knowledge
I have tables documents and comments
If I run join and list doc_nums the query is quite fast
If I run join and use subselect the query is extremely slow
Can someone offer analysis

Thanks
Richard

dcc=# EXPLAIN ANALYZE select doc_num from documents limit 10;
                                                      QUERY PLAN
---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.01 rows=10 width=13) (actual time=0.013..0.061 rows=10 loops=1) -> Seq Scan on documents (cost=0.00..909333.85 rows=9014885 width=13) (actual time=0.008..0.027 rows=10 loops=1)
 Total runtime: 0.125 ms
(3 rows)

dcc=#



dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on (documents.doc_num = comments.doc_num) where documents.doc_num in ('105364107','105513059','105513095','105513112','105585627','102933195','014650340','014650361','014650362','105419865');

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Left Join (cost=21.23..61.54 rows=10 width=444) (actual time=0.507..0.574 rows=10 loops=1)
   Hash Cond: ("outer".doc_num = "inner".doc_num)
-> Bitmap Heap Scan on documents (cost=20.03..60.28 rows=10 width=361) (actual time=0.397..0.432 rows=10 loops=1) Recheck Cond: ((doc_num = '105364107'::bpchar) OR (doc_num = '105513059'::bpchar) OR (doc_num = '105513095'::bpchar) OR (doc_num = '105513112'::bpchar) OR (doc_num = '105585627'::bpchar) OR (doc_num = '102933195'::bpchar) OR (doc_num = '014650340'::bpchar) OR (doc_num = '014650361'::bpchar) OR (doc_num = '014650362'::bpchar) OR (doc_num = '105419865'::bpchar)) -> BitmapOr (cost=20.03..20.03 rows=10 width=0) (actual time=0.383..0.383 rows=0 loops=1) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.059..0.059 rows=1 loops=1)
                     Index Cond: (doc_num = '105364107'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.039..0.039 rows=1 loops=1)
                     Index Cond: (doc_num = '105513059'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1)
                     Index Cond: (doc_num = '105513095'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)
                     Index Cond: (doc_num = '105513112'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1)
                     Index Cond: (doc_num = '105585627'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1)
                     Index Cond: (doc_num = '102933195'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1)
                     Index Cond: (doc_num = '014650340'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)
                     Index Cond: (doc_num = '014650361'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)
                     Index Cond: (doc_num = '014650362'::bpchar)
-> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.035..0.035 rows=1 loops=1)
                     Index Cond: (doc_num = '105419865'::bpchar)
-> Hash (cost=1.16..1.16 rows=16 width=83) (actual time=0.080..0.080 rows=16 loops=1) -> Seq Scan on comments (cost=0.00..1.16 rows=16 width=83) (actual time=0.005..0.037 rows=16 loops=1)
 Total runtime: 0.775 ms
(28 rows)

dcc=#



dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on (documents.doc_num = comments.doc_num) where documents.doc_num in (select doc_num from documents limit 10);

QUERY PLAN

-------------------------------------------------------------------------------- --------------------------------------------------------------------------------
--
Merge IN Join (cost=100000002.19..136154797.93 rows=10 width=654) (actual time
=23.534..2216180.550 rows=10 loops=1)
   Merge Cond: ("outer".doc_num = "inner".doc_num)
-> Merge Left Join (cost=0.00..36129585.92 rows=10083868 width=654) (actual
 time=23.239..2188733.430 rows=6696218 loops=1)
         Merge Cond: ("outer".doc_num = "inner".doc_num)
-> Index Scan using documents_pkey on documents (cost=0.00..35723277. 60 rows=10083868 width=569) (actual time=6.845..2107300.767 rows=6695853 loops=1
)
-> Index Scan using doc_num_idx on comments (cost=0.00..377203.50 row
s=311612 width=85) (actual time=16.368..6984.365 rows=243797 loops=1)
-> Sort (cost=100000002.19..100000002.22 rows=10 width=13) (actual time=0.2
52..0.293 rows=10 loops=1)
         Sort Key: "IN_subquery".doc_num
-> Limit (cost=100000000.00..100000001.92 rows=10 width=13) (actual t
ime=0.019..0.128 rows=10 loops=1)
-> Seq Scan on documents (cost=100000000.00..101940460.68 rows=
10083868 width=13) (actual time=0.010..0.055 rows=10 loops=1)
 Total runtime: 2216180.973 ms
(11 rows)

dcc=#

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match



--

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Reply via email to