First let me explain the situation:
I came into the #postgresql irc channel talking about this problem, and
someone advised me to use this mailing list (so i'm not just wasting your
time i hope). I'm not sure how to describe the problem fully, so I'll start
by explaining what my database does, a little about the table structure,
then an example of a problematic query, and some other information that
might be relevant.
My database is a Chinese-English/English-Chinese dictionary. It lets users
search Chinese words by any character in the word, or any sequence of
characters (starting anywhere the user wants). Characters are often
searched by their pinyin values (a romanization of the sounds). The
dictionary has an average word length of 2 (but it sucks), but there are
also many words of length 4 and 6 characters. So it wouldn't be uncommon to
search for something like "a1 la1 bo yu" (arabic). There are also some very
long words with 12 or more characters (where my problem becomes more
pronounced).
That being said, the most important table here is the words table:
Table "public.words"
Column| Type | Modifiers
--+--+---
wid | integer | not null
sequence | smallint | not null
variant | smallint | not null
char_count | smallint | not null
unicode | character varying(5) | not null
pinyin | character varying(8) | not null
simpvar | character varying(5) |
zvar | character varying(5) |
compatvar| character varying(5) |
def_exists | boolean | not null
num_variants | smallint |
page_order | integer |
pinyins | character varying|
unicodes | character varying|
Indexes:
"words2_pkey" primary key, btree (wid, variant, "sequence")
"page_index" btree (page_order)
"pinyin_index" btree (pinyin)
"unicode_index" btree (unicode)
The best example of the problem I have when using this table is this query:
SELECT
w8.wid,
w8.variant,
w8.num_variants,
sum_text(w8.unicode) as unicodes,
sum_text(w8.pinyin) as pinyins
FROM
words as w0, words as w1,
words as w2, words as w3,
words as w4, words as w5,
words as w6, words as w7,
words as w8
WHERE
w0.wid > 0 AND
w0.pinyin = 'zheng4' AND
w0.def_exists = 't' AND
w0.sequence = 0 AND
w1.wid = w0.wid AND
w1.pinyin LIKE 'fu_' AND
w1.variant = w0.variant AND
w1.sequence = (w0.sequence + 1) AND
w2.wid = w1.wid AND
w2.pinyin LIKE 'ji_' AND
w2.variant = w1.variant AND
w2.sequence = (w1.sequence + 1) AND
w3.wid = w2.wid AND
w3.pinyin LIKE 'guan_' AND
w3.variant = w2.variant AND
w3.sequence = (w2.sequence + 1) AND
w4.wid = w3.wid AND
w4.pinyin LIKE 'kai_' AND
w4.variant = w3.variant AND
w4.sequence = (w3.sequence + 1) AND
w5.wid = w4.wid AND
w5.pinyin LIKE 'fang_' AND
w5.variant = w4.variant AND
w5.sequence = (w4.sequence + 1) AND
w6.wid = w5.wid AND
w6.pinyin LIKE 'xi_' AND
w6.variant = w5.variant AND
w6.sequence = (w5.sequence + 1) AND
w7.wid = w6.wid AND
w7.pinyin LIKE 'tong_' AND
w7.variant = w6.variant AND
w7.sequence = (w6.sequence + 1) AND
w8.wid = w7.wid AND
w8.variant = w7.variant
GROUP BY
w8.wid,
w8.variant,
w8.num_variants,
w8.page_order ,
w0.sequence ,
w1.sequence ,
w2.sequence ,
w3.sequence ,
w4.sequence ,
w5.sequence ,
w6.sequence ,
w7.sequence
ORDER BY
w8.page_order;
(phew!)
with the default geqo_threshold of 11, this query takes 3155ms on my machine
(a 1ghz athlon with 384 megs of pc133 ram). This is very very long.
if i first do prepare blah as SELECT ., then run execute blah, the time
goes down to about 275ms (i had been running this query a lot, and did a
vacuum update before all this).
the ouput from EXPLAIN ANALYZE :
QUERY PLAN
--
Sort (cost=54.13..54.14 rows=1 width=43) (actual time=315.357..315.357
rows=1 loops=1)
Sort Key: w8.page_order
-> HashAggregate (cost=54.12..54.12 rows=1 width=43) (actual
time=315.328..315.330 rows=1 loops=1)
-> Nested Loop (cost=0.00..54.08 rows=1 width=43) (actual
time=6.229..314.566 rows=12 loops=1)
Join Filter: (("outer".wid = "inner".wid) AND
("outer".variant = "inner".variant) AND ("outer"."sequence" =
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" +
1)))
-> Nested Loop (cost=0.00..48.07 rows=1 width=83) (actual
time=6.088..279.745 rows=12 loops=1)
Join Filter: (("inner"."sequence" = ("outer"."sequence"