Re: [PERFORM] severe performance issue with planner (fwd)

2004-03-18 Thread Eric Brown
I also tried this (printf1 in irc suggested "folding" the joins) :

SELECT
w8.wid,
w8.variant,
w8.num_variants,
sum_text(w8.unicode) as unicodes,
sum_text(w8.pinyin) as pinyins
FROM
(words as w8 JOIN
(words as w7 JOIN
(words as w6 JOIN
(words as w5 JOIN
(words as w4 JOIN
(words as w3 JOIN
(words as w2 JOIN
(words as w0 JOIN words as w1
ON(w1.wid = w0.wid AND w1.variant = w0.variant AND w1.sequence = w0.sequence 
+ 1 AND w1.pinyin LIKE 'fu_'))
ON(w2.wid = w1.wid AND w2.variant = w1.variant AND w2.sequence = w1.sequence 
+ 1 AND w2.pinyin LIKE 'ji_'))
ON(w3.wid = w2.wid AND w3.variant = w2.variant AND w3.sequence = w2.sequence 
+ 1 AND w3.pinyin LIKE 'guan_'))
ON(w4.wid = w3.wid AND w4.variant = w3.variant AND w4.sequence = w3.sequence 
+ 1 AND w4.pinyin LIKE 'kai_'))
ON(w5.wid = w4.wid AND w5.variant = w4.variant AND w5.sequence = w4.sequence 
+ 1 AND w5.pinyin LIKE 'fang_'))
ON(w6.wid = w5.wid AND w6.variant = w5.variant AND w6.sequence = w5.sequence 
+ 1 AND w6.pinyin LIKE 'xi_'))
ON(w7.wid = w6.wid AND w7.variant = w6.variant AND w7.sequence = w6.sequence 
+ 1 AND w7.pinyin LIKE 'tong_'))
ON(w8.wid = w7.wid AND w8.variant = w7.variant))

WHERE
w0.wid > 0 AND
w0.pinyin = 'zheng4' AND
w0.def_exists = 't' AND
w0.sequence = 0
GROUP BY
w8.wid,
w8.variant,
w8.num_variants,
w8.page_order,
w0.sequence ,
w1.sequence ,
w2.sequence ,
w3.sequence ,
w4.sequence ,
w5.sequence ,
w6.sequence ,
w7.sequence
ORDER BY
w8.page_order;
this gets teh time down to 800ms (not too shabby..).. and as a prepared 
statement, it only takes 15ms!!! i am hopeful there is a way to totally 
bypass most of this overhead.. but i need more help :\

_
MSN Toolbar provides one-click access to Hotmail from any Web page – FREE 
download! http://clk.atdmt.com/AVE/go/onm00200413ave/direct/01/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[PERFORM] severe performance issue with planner

2004-03-11 Thread Eric Brown
First let me explain the situation:

I came into the #postgresql irc channel talking about this problem, and 
someone advised me to use this mailing list (so i'm not just wasting your 
time i hope).  I'm not sure how to describe the problem fully, so I'll start 
by explaining what my database does, a little about the table structure, 
then an example of a problematic query, and some other information that 
might be relevant.

My database is a Chinese-English/English-Chinese dictionary.  It lets users 
search Chinese words by any character in the word, or any sequence of 
characters (starting anywhere the user wants).  Characters are often 
searched by their pinyin values (a romanization of the sounds).  The 
dictionary has an average word length of 2 (but it sucks), but there are 
also many words of length 4 and 6 characters.  So it wouldn't be uncommon to 
search for something like "a1 la1 bo yu" (arabic).  There are also some very 
long words with 12 or more characters (where my problem becomes more 
pronounced).

That being said, the most important table here is the words table:
 Table "public.words"
   Column| Type | Modifiers
--+--+---
wid  | integer  | not null
sequence | smallint | not null
variant  | smallint | not null
char_count   | smallint | not null
unicode  | character varying(5) | not null
pinyin   | character varying(8) | not null
simpvar  | character varying(5) |
zvar | character varying(5) |
compatvar| character varying(5) |
def_exists   | boolean  | not null
num_variants | smallint |
page_order   | integer  |
pinyins  | character varying|
unicodes | character varying|
Indexes:
   "words2_pkey" primary key, btree (wid, variant, "sequence")
   "page_index" btree (page_order)
   "pinyin_index" btree (pinyin)
   "unicode_index" btree (unicode)
The best example of the problem I have when using this table is this query:

SELECT
   w8.wid,
   w8.variant,
   w8.num_variants,
   sum_text(w8.unicode) as unicodes,
   sum_text(w8.pinyin) as pinyins
FROM
   words as w0, words as w1,
   words as w2, words as w3,
   words as w4, words as w5,
   words as w6, words as w7,
   words as w8
WHERE
   w0.wid > 0 AND
   w0.pinyin = 'zheng4' AND
   w0.def_exists = 't' AND
   w0.sequence = 0 AND
   w1.wid = w0.wid AND
   w1.pinyin LIKE 'fu_' AND
   w1.variant = w0.variant AND
   w1.sequence = (w0.sequence + 1) AND
   w2.wid = w1.wid AND
   w2.pinyin LIKE 'ji_' AND
   w2.variant = w1.variant AND
   w2.sequence = (w1.sequence + 1) AND
   w3.wid = w2.wid AND
   w3.pinyin LIKE 'guan_' AND
   w3.variant = w2.variant AND
   w3.sequence = (w2.sequence + 1) AND
   w4.wid = w3.wid AND
   w4.pinyin LIKE 'kai_' AND
   w4.variant = w3.variant AND
   w4.sequence = (w3.sequence + 1) AND
   w5.wid = w4.wid AND
   w5.pinyin LIKE 'fang_' AND
   w5.variant = w4.variant AND
   w5.sequence = (w4.sequence + 1) AND
   w6.wid = w5.wid AND
   w6.pinyin LIKE 'xi_' AND
   w6.variant = w5.variant AND
   w6.sequence = (w5.sequence + 1) AND
   w7.wid = w6.wid AND
   w7.pinyin LIKE 'tong_' AND
   w7.variant = w6.variant AND
   w7.sequence = (w6.sequence + 1) AND
   w8.wid = w7.wid AND
   w8.variant = w7.variant
GROUP BY
   w8.wid,
   w8.variant,
   w8.num_variants,
   w8.page_order ,
   w0.sequence ,
   w1.sequence ,
   w2.sequence ,
   w3.sequence ,
   w4.sequence ,
   w5.sequence ,
   w6.sequence ,
   w7.sequence
ORDER BY
   w8.page_order;
(phew!)

with the default geqo_threshold of 11, this query takes 3155ms on my machine 
(a 1ghz athlon with 384 megs of pc133 ram).  This is very very long.

if i first do prepare blah as SELECT ., then run execute blah, the time 
goes down to about 275ms  (i had been running this query a lot, and did a 
vacuum update before all this).

the ouput from EXPLAIN ANALYZE :


   QUERY PLAN
--
Sort  (cost=54.13..54.14 rows=1 width=43) (actual time=315.357..315.357 
rows=1 loops=1)
  Sort Key: w8.page_order
  ->  HashAggregate  (cost=54.12..54.12 rows=1 width=43) (actual 
time=315.328..315.330 rows=1 loops=1)
->  Nested Loop  (cost=0.00..54.08 rows=1 width=43) (actual 
time=6.229..314.566 rows=12 loops=1)
  Join Filter: (("outer".wid = "inner".wid) AND 
("outer".variant = "inner".variant) AND ("outer"."sequence" = 
("inner"."sequence" + 1)) AND ("inner"."sequence" = ("outer"."sequence" + 
1)))
  ->  Nested Loop  (cost=0.00..48.07 rows=1 width=83) (actual 
time=6.088..279.745 rows=12 loops=1)
Join Filter: (("inner"."sequence" = ("outer"."sequence"