[PERFORM] Performance issue using Tsearch2
Hi, I'm having a performance problem on a request using Tsearch2: the request strangely takes several minutes. I've tried to follow Tsearch tuning recommendations, I've searched through the archives, but I can't seem to find a solution to solve my problem. The ts_vector field was created using dictionnary fr_ispell only on types lword, lpart_hword and lhword. An index was created on this field. According to the stat() function, there are only 42,590 word stems indexed. I also did a VACUUM FULL ANALYZE. Here's the result of EXPLAIN ANALYZE on a filtered version of my request (the non-filtered version takes so long I usually cancel it): ** explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN - Sort (cost=2345.54..2345.58 rows=16 width=308) (actual time=270638.774..270643.142 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual time=40886.553..270619.730 rows=7106 loops=1) - Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual time=38209.193..173932.313 rows=272414 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=276) (actual time=38209.180..173507.052 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38204.261..38204.261 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 270647.946 ms ** Could someone help me analyze this problem? I don't manage to see if the problem comes from bad tsearch tuning, postgresql configuration, or something else... Thanks. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance issue using Tsearch2
On 2008-02-05 Viviane Lestic wrote: QUERY PLAN - Sort (cost=2345.54..2345.58 rows=16 width=308) (actual time=270638.774..270643.142 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual time=40886.553..270619.730 rows=7106 loops=1) - Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual time=38209.193..173932.313 rows=272414 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=276) (actual time=38209.180..173507.052 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38204.261..38204.261 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 270647.946 ms ** Could someone help me analyze this problem? Your planner estimates are way off. Try increasing the statistics target for the columns used in this query and re-analyze the tables after doing so. Regards Ansgar Wiechers -- The Mac OS X kernel should never panic because, when it does, it seriously inconveniences the user. --http://developer.apple.com/technotes/tn2004/tn2118.html ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance issue using Tsearch2
On Feb 5, 2008 12:47 PM, Viviane Lestic [EMAIL PROTECTED] wrote: Could someone help me analyze this problem? I don't manage to see if the problem comes from bad tsearch tuning, postgresql configuration, or something else... Can you try to replace zoneindex_test @@ q with zoneindex_test @@ to_tsquery('partir')? Increasing the statistics for zoneindex_test may be a good idea too (see ALTER TABLE ... ALTER COLUMN doc). I'm surprised you have the word partir in so many documents? Do you use real data? -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance issue using Tsearch2
2008/2/5, Ansgar -59cobalt- Wiechers wrote: Your planner estimates are way off. Try increasing the statistics target for the columns used in this query and re-analyze the tables after doing so. I first set STATISTICS to 1000 for column zoneindex_test and saw no significant improvement (with a vacuum full analyze in between). Then I set default_statistics_target to 1000: there is now an improvement, but the overall time is still way too long... (and the estimated costs didn't change...) Here are the results with default_statistics_target set to 1000: explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN - Sort (cost=2345.30..2345.32 rows=8 width=327) (actual time=229913.715..229918.172 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..2345.18 rows=8 width=327) (actual time=28159.626..229892.957 rows=7106 loops=1) - Nested Loop (cost=80.04..1465.76 rows=392 width=327) (actual time=26084.558..130979.395 rows=272414 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=295) (actual time=26084.544..130562.220 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 229922.864 ms 2008/2/5, Guillaume Smet wrote: Can you try to replace zoneindex_test @@ q with zoneindex_test @@ to_tsquery('partir')? The improvement seems negligible (with default_statistics_target back to 10, its default value): explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN - Sort (cost=4358.91..4358.95 rows=16 width=308) (actual time=266489.667..266494.132 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) - Nested Loop (cost=80.04..4358.59 rows=16 width=308) (actual time=42245.881..266469.644 rows=7106 loops=1) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.007..0.008 rows=1 loops=1) - Nested Loop (cost=80.04..4358.34 rows=16 width=276) (actual time=42239.570..178496.761 rows=7106 loops=1) - Bitmap Heap Scan on tab_ocr (cost=80.04..1461.83 rows=392 width=276) (actual time=38317.423..174188.779 rows=272414 loops=1) Filter: (zoneindex_test @@ '''partir'''::tsquery) - Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289 rows=283606 loops=1) Index Cond: (zoneindex_test @@ '''partir'''::tsquery) - Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 266498.704 ms Increasing the statistics for zoneindex_test may be a good idea too (see ALTER TABLE ... ALTER COLUMN doc). I posted the results above. I'm surprised you have the word partir in so many documents? Do you use real data? I'm using real data. The indexed documents are extracted from newspapers, and partir (and its derivates) is quite a common verb in the French language, so I'm not that surprised to see it show up in many documents. ---(end of broadcast)--- TIP 6: explain analyze is your friend