I am struggeling with the following query which fetches a random subset of 200 questions that matches certain tags within certain languages. However, the query takes forever to evaluate, even though I have a "limit 200" appended. Any ideas on how to optimize it?

QUERY: ================================================

SELECT distinct q.question_id
               FROM question_tags qt, questions q
               WHERE q.question_id = qt.question_id
                     AND q.STATUS = 1
                     AND not q.author_id = 105
AND ((qt.language_id = 5 and qt.tag_id in (1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in (856,428)) or (qt.language_id = 3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or (qt.language_id = 2 and qt.tag_id in (1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
                      and q.question_id not in (413)
               LIMIT 200

EXPLAIN ANALYZE: =========================================

Limit (cost=1.50..1267.27 rows=200 width=4) (actual time=278.169..880.934 rows=200 loops=1) -> Unique (cost=1.50..317614.50 rows=50185 width=4) (actual time=278.165..880.843 rows=200 loops=1) -> Merge Join (cost=1.50..317489.04 rows=50185 width=4) (actual time=278.162..880.579 rows=441 loops=1)
                Merge Cond: (qt.question_id = q.question_id)
-> Index Scan using question_tags_question_id on question_tags qt (cost=0.00..301256.96 rows=82051 width=4) (actual time=24.171..146.811 rows=6067 loops=1) Filter: (((language_id = 5) AND (tag_id = ANY ('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND (tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND (tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[]))) OR ((language_id = 2) AND (tag_id = ANY ('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[])))) -> Index Scan using questions_pkey on questions q (cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737 rows=1000 loops=1) Filter: ((q.author_id <> 105) AND (q.question_id <> 413) AND (q.status = 1))
  Total runtime: 881.152 ms
(9 rows)

        An index on (language_id,tag_id) should be the first thing to try.
Or perhaps even (status,language_id,tag_id) or (language_id, tad_id, status) (but that depends on the stats on "status" column).

An index on author_id will probably not be useful for this particular query because your condition is "author_id != constant".

        Also CLUSTER question_tags on (language_id, tad_id).

What is the database size versus RAM ? You must have a hell of a lot of questions to make this slow... (or bloat ?)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to