I'm having a strange problem with a query. The query is fairly simple, with a few constants and two joins. All relevant columns should be indexed, and I'm pretty sure there aren't any type conversion issues. But the query plan includes a fairly heavy seq scan. The only possible complication is that the tables involved are fairly large - hundreds of millions of rows each.

Can anyone explain this? There should only ever be a maximum of about 50 rows returned when the query is executed.

Query:

select fls.function_verified, fls.score, fls.go_category_group_ref, fs1.gene_ref, fs1.function_verified_exactly, fs2.gene_ref, fs2.function_verified_exactly from functional_linkage_scores fls, gene_prediction_view fs1, gene_prediction_view fs2 where fls.gene_ref1 = fs1.gene_ref and fls.gene_ref2 = fs2.gene_ref and fs1.go_term_ref = 2 and fs2.go_term_ref = 2

Explain on query:
Merge Join  (cost=1331863800.16..6629339921.15 rows=352770803726 width=22)
  Merge Cond: (fs2.gene_ref = fls.gene_ref2)
-> Index Scan using gene_prediction_view_gene_ref on gene_prediction_view fs2 (cost=0.00..6235287.98 rows=197899 width=5)
        Index Cond: (go_term_ref = 2)
-> Materialize (cost=1331794730.41..1416453931.72 rows=6772736105 width=21) -> Sort (cost=1331794730.41..1348726570.67 rows=6772736105 width=21)
              Sort Key: fls.gene_ref2
-> Merge Join (cost=38762951.04..146537410.33 rows=6772736105 width=21)
                    Merge Cond: (fs1.gene_ref = fls.gene_ref1)
-> Index Scan using gene_prediction_view_gene_ref on gene_prediction_view fs1 (cost=0.00..6235287.98 rows=197899 width=5)
                          Index Cond: (go_term_ref = 2)
-> Materialize (cost=38713921.60..41618494.20 rows=232365808 width=20) -> Sort (cost=38713921.60..39294836.12 rows=232365808 width=20)
                                Sort Key: fls.gene_ref1
-> Seq Scan on functional_linkage_scores fls (cost=0.00..3928457.08 rows=232365808 width=20)


\d on functional_linkage_scores (232241678 rows):
                                   Table "public.functional_linkage_scores"
Column | Type | Modifiers -----------------------+---------------+------------------------------------------------------------------------ id | integer | not null default nextval('functional_linkage_scores_id_seq'::regclass)
gene_ref1             | integer       | not null
gene_ref2             | integer       | not null
function_verified     | boolean       | not null
score                 | numeric(12,4) | not null
go_category_group_ref | integer       | not null
go_term_ref           | integer       |
Indexes:
   "functional_linkage_scores_pkey" PRIMARY KEY, btree (id)
"functional_linkage_scores_gene_ref1_key" UNIQUE, btree (gene_ref1, gene_ref2, go_category_group_ref, go_term_ref)
   "ix_functional_linkage_scores_gene_ref2" btree (gene_ref2)
Foreign-key constraints:
"functional_linkage_scores_gene_ref1_fkey" FOREIGN KEY (gene_ref1) REFERENCES genes(id) "functional_linkage_scores_gene_ref2_fkey" FOREIGN KEY (gene_ref2) REFERENCES genes(id) "functional_linkage_scores_go_category_group_ref_fkey" FOREIGN KEY (go_category_group_ref) REFERENCES go_category_groups(id)

\d on gene_prediction_view (568654245 rows):
Table "public.gene_prediction_view" Column | Type | Modifiers ----------------------------------+------------------------+------------------------------------------------------------------- id | integer | not null default nextval('gene_prediction_view_id_seq'::regclass)
gene_ref                         | integer                | not null
go_term_ref                      | integer                | not null
go_description                   | character varying(200) | not null
go_category                      | character varying(50)  | not null
function_verified_exactly        | boolean                | not null
function_verified_with_parent_go | boolean                | not null
score                            | numeric(12,4)          | not null
prediction_method_ref            | integer                |
functional_score_ref             | integer                |
Indexes:
   "gene_prediction_view_pkey" PRIMARY KEY, btree (id)
"gene_prediction_view_functional_score_ref_key" UNIQUE, btree (functional_score_ref) "gene_prediction_view_gene_ref" UNIQUE, btree (gene_ref, go_term_ref, prediction_method_ref)
Foreign-key constraints:
"gene_prediction_view_functional_score_ref_fkey" FOREIGN KEY (functional_score_ref) REFERENCES functional_scores(id) "gene_prediction_view_gene_ref_fkey" FOREIGN KEY (gene_ref) REFERENCES genes(id) "gene_prediction_view_go_term_ref_fkey" FOREIGN KEY (go_term_ref) REFERENCES go_terms(term)

...and just in case someone can give advice on more aggressive settings that might help out the planner for this particular comptuer... This computer: Mac Pro / 4 gigs ram / software Raid 0 across two hard drives.
Production computer: Xeon 3ghz / 32 gigs ram / Debian


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to