[PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
Hi,

I'm having a performance problem on a request using Tsearch2: the
request strangely takes several minutes.

I've tried to follow Tsearch tuning recommendations, I've searched
through the archives, but I can't seem to find a solution to solve my
problem.

The ts_vector field was created using dictionnary fr_ispell only on
types lword, lpart_hword and lhword. An index was created on this
field.

According to the stat() function, there are only 42,590 word stems indexed.
I also did a VACUUM FULL ANALYZE.

Here's the result of EXPLAIN ANALYZE on a filtered version of my
request (the non-filtered version takes so long I usually cancel it):
**
explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test,
q) DESC;
QUERY PLAN
-
Sort (cost=2345.54..2345.58 rows=16 width=308) (actual
time=270638.774..270643.142 rows=7106 loops=1)
Sort Key: rank(tab_ocr.zoneindex_test, q.q)
- Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual
time=40886.553..270619.730 rows=7106 loops=1)
- Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual
time=38209.193..173932.313 rows=272414 loops=1)
- Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual
time=0.006..0.007 rows=1 loops=1)
- Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392
width=276) (actual time=38209.180..173507.052 rows=272414 loops=1)
Filter: (tab_ocr.zoneindex_test @@ q.q)
- Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392
width=0) (actual time=38204.261..38204.261 rows=283606 loops=1)
Index Cond: (tab_ocr.zoneindex_test @@ q.q)
- Index Scan using tab_chemin_label_index on tab_chemin
(cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0
loops=272414)
Index Cond: (tab_ocr.idstruct = tab_chemin.label)
Filter: ((chemin)::text ~~ '%;2;%'::text)
Total runtime: 270647.946 ms
**

Could someone help me analyze this problem?
I don't manage to see if the problem comes from bad tsearch tuning,
postgresql configuration, or something else...

Thanks.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance issue using Tsearch2

2008-02-05 Thread Viviane Lestic
2008/2/5, Ansgar -59cobalt- Wiechers wrote:
 Your planner estimates are way off. Try increasing the statistics target
 for the columns used in this query and re-analyze the tables after doing
 so.

I first set STATISTICS to 1000 for column zoneindex_test and saw no
significant improvement (with a vacuum full analyze in between). Then
I set default_statistics_target to 1000: there is now an improvement,
but the overall time is still way too long... (and the estimated costs
didn't change...)
Here are the results with default_statistics_target set to 1000:

explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test,
q) DESC;

  QUERY PLAN
-
 Sort  (cost=2345.30..2345.32 rows=8 width=327) (actual
time=229913.715..229918.172 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   -  Nested Loop  (cost=80.04..2345.18 rows=8 width=327) (actual
time=28159.626..229892.957 rows=7106 loops=1)
 -  Nested Loop  (cost=80.04..1465.76 rows=392 width=327)
(actual time=26084.558..130979.395 rows=272414 loops=1)
   -  Function Scan on q  (cost=0.00..0.01 rows=1
width=32) (actual time=0.006..0.007 rows=1 loops=1)
   -  Bitmap Heap Scan on tab_ocr  (cost=80.04..1460.85
rows=392 width=295) (actual time=26084.544..130562.220 rows=272414
loops=1)
 Filter: (tab_ocr.zoneindex_test @@ q.q)
 -  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315
rows=283606 loops=1)
   Index Cond: (tab_ocr.zoneindex_test @@ q.q)
 -  Index Scan using tab_chemin_label_index on tab_chemin
(cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0
loops=272414)
   Index Cond: (tab_ocr.idstruct = tab_chemin.label)
   Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 229922.864 ms


2008/2/5, Guillaume Smet wrote:
 Can you try to replace zoneindex_test @@ q with zoneindex_test @@
 to_tsquery('partir')?

The improvement seems negligible (with default_statistics_target back
to 10, its default value):
explain analyze SELECT idstruct, headline(zonetext, q),
rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir')
AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct =
tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY
rank(zoneindex_test, q) DESC;

  QUERY PLAN
-
 Sort  (cost=4358.91..4358.95 rows=16 width=308) (actual
time=266489.667..266494.132 rows=7106 loops=1)
   Sort Key: rank(tab_ocr.zoneindex_test, q.q)
   -  Nested Loop  (cost=80.04..4358.59 rows=16 width=308) (actual
time=42245.881..266469.644 rows=7106 loops=1)
 -  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
(actual time=0.007..0.008 rows=1 loops=1)
 -  Nested Loop  (cost=80.04..4358.34 rows=16 width=276)
(actual time=42239.570..178496.761 rows=7106 loops=1)
   -  Bitmap Heap Scan on tab_ocr  (cost=80.04..1461.83
rows=392 width=276) (actual time=38317.423..174188.779 rows=272414
loops=1)
 Filter: (zoneindex_test @@ '''partir'''::tsquery)
 -  Bitmap Index Scan on zoneindex_test_idx
(cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289
rows=283606 loops=1)
   Index Cond: (zoneindex_test @@ '''partir'''::tsquery)
   -  Index Scan using tab_chemin_label_index on
tab_chemin  (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014
rows=0 loops=272414)
 Index Cond: (tab_ocr.idstruct = tab_chemin.label)
 Filter: ((chemin)::text ~~ '%;2;%'::text)
 Total runtime: 266498.704 ms

 Increasing the statistics for zoneindex_test may
 be a good idea too (see ALTER TABLE ... ALTER COLUMN doc).

I posted the results above.

 I'm surprised you have the word partir in so many documents? Do you
 use real data?

I'm using real data. The indexed documents are extracted from
newspapers, and partir (and its derivates) is quite a common verb in
the French language, so I'm not that surprised to see it show up in
many documents.

---(end of broadcast)---
TIP 6: explain analyze is your friend