Re: [PERFORM] Planner selects different execution plans depending on limit
Tom Lane mailto:t...@sss.pgh.pa.us writes: Bill Martin bill.mar...@communote.com writes: Tom Lane t...@sss.pgh.pa.us writes: He can do it without having to change his schema --- but it's the index column, not the underlying content column, that needs its statistics target adjusted. How can I adjust the statistics target of the index? Just pretend it's a table. ALTER TABLE index_name ALTER COLUMN column_name SET STATISTICS ... You'll need to look at the index (eg with \d) to see what the name of the desired column is, since index expressions have system-assigned column names. regards, tom lane I tried: ALTER TABLE ft_simple_core_content_content_idx ALTER column to_tsvector SET STATISTICS 1; ANALYZE; and REINDEX INDEX ft_simple_core_content_content_idx; All the trouble was for nothing. Are there any other possibilities to solve my problem? Best regards, Bill Martin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
Tom Lane tgl(at)sss(dot)pgh(dot)pa(dot)us writes: Bill Martin bill(dot)martin(at)communote(dot)com writes: I´ve created following table which contains one million records. ... Limit (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 rows=0 loops=1) - Bitmap Heap Scan on core_content content (cost=10091.09..57046.32 rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1) Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) - Bitmap Index Scan on ft_simple_core_content_content_idx (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 loops=1) Index Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 0.277 ms Is there any posibility to tune up the performance even if the limit is only 10? The problem is the way-off rowcount estimate (20011 rows when it's really none); with a smaller estimate there, the planner wouldn't decide to switch to a seqscan. Did you take the advice to increase the column's statistics target? Because 20011 looks suspiciously close to the default estimate that tsquery_opr_selec will fall back on if it hasn't got enough stats to come up with a trustworthy estimate for a *-pattern query. (I think there are probably some bugs in tsquery_opr_selec's estimate for this, as I just posted about on pgsql-hackers. But this number looks like you're not even getting to the estimation code, for lack of enough statistics entries.) The other thing that seems kind of weird here is that the cost estimate for the bitmap index scan seems out of line even given the 2-entries-to-fetch estimate. I'd have expected a cost estimate of a few hundred for that, not 1. Perhaps this index is really bloated, and it's time to REINDEX it? regards, tom lane Hi, thank you for helping me. I´ve tried different values for the statistics but it is all the same (the planner decide to switch to a seqscan if the limit is 10). ALTER TABLE core_content ALTER column content SET STATISTICS 1000; I also tried to reindex the index but the planner decide to switch to a seqscan. REINDEX INDEX ft_simple_core_content_content_idx; Disable the seqscan helps me but is this a good decision for all use cases? SET enable_seqscan = off; Are there any other possibilities to solve my problem? Best regards, Bill Martin
Re: [PERFORM] Planner selects different execution plans depending on limit
Bill Martin bill.mar...@communote.com writes: I´ve tried different values for the statistics but it is all the same (the planner decide to switch to a seqscan if the limit is 10). ALTER TABLE core_content ALTER column content SET STATISTICS 1000; Um, did you actually do an ANALYZE after changing that? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
Tom Lane t...@sss.pgh.pa.us writes: Bill Martin bill.mar...@communote.com writes: I've tried different values for the statistics but it is all the same (the planner decide to switch to a seqscan if the limit is 10). ALTER TABLE core_content ALTER column content SET STATISTICS 1000; Um, did you actually do an ANALYZE after changing that? regards, tom lane Yes, I've run the ANALYZE command. Regards, Bill Martin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
On 13/09/12 16:42, Bill Martin wrote: Yes, I've run the ANALYZE command. Regards, Bill Martin The main problem in your case is actually that you dont store the tsvector in the table. If you store to_tsvector('simple',content.content) in a column in the database and search against that instead then you'll allow PG to garther statistics on the column and make the query-planner act according to that. Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
Jesper Krogh jes...@krogh.cc writes: On 13/09/12 16:42, Bill Martin wrote: Yes, I've run the ANALYZE command. Regards, Bill Martin The main problem in your case is actually that you dont store the tsvector in the table. Oh, duh, obviously I lack caffeine this morning. If you store to_tsvector('simple',content.content) in a column in the database and search against that instead then you'll allow PG to garther statistics on the column and make the query-planner act according to that. He can do it without having to change his schema --- but it's the index column, not the underlying content column, that needs its statistics target adjusted. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
Tom Lane t...@sss.pgh.pa.us writes: He can do it without having to change his schema --- but it's the index column, not the underlying content column, that needs its statistics target adjusted. regards, tom lane How can I adjust the statistics target of the index? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
Bill Martin bill.mar...@communote.com writes: Tom Lane t...@sss.pgh.pa.us writes: He can do it without having to change his schema --- but it's the index column, not the underlying content column, that needs its statistics target adjusted. How can I adjust the statistics target of the index? Just pretend it's a table. ALTER TABLE index_name ALTER COLUMN column_name SET STATISTICS ... You'll need to look at the index (eg with \d) to see what the name of the desired column is, since index expressions have system-assigned column names. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Planner selects different execution plans depending on limit
On 10/09/12 16:24, bill_mar...@freenet.demailto:bill_mar...@freenet.de wrote: Hi All I´ve ft_simple_core_content_content_idx ON core_content USING gin (to_tsvector('simple'::regconfig, content) ); If I´m seaching for a word which is NOT in the column content the query plan and the execution time differs with the given limit. If I choose 3927 or any higher number the query execution took only few milliseconds. core_content content where to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true Limit 3927 Limit (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1) - Seq Scan on core_content content (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1) Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 52147.173 ms Is there any posibility to improve the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes only the fast bitmap heap scan instead of the slow seq scan? The big hammer is: set enable_seqscan = off, but if you tell which PG version you're on there may be something to do. I suggest you'd start by bumping the statistics target for the column to 1 and run analyze to see what that changes. -- Jesper Hi, my email client delete a lot of the content of the original thread message. Here is the full content: Hi All I´ve created following table which contains one million records. CREATE TABLE core_content ( id bigint NOT NULL, content text NOT NULL, short_content text, CONSTRAINT core_content_pkey PRIMARY KEY (id ) ) CREATE INDEX ft_simple_core_content_content_idx ON core_content USING gin (to_tsvector('simple'::regconfig, content) ); If I´m seaching for a word which is not in the column content the query plan and the execution time differs with the given limit. If I choose 3927 or any higher number the query execution took only few milliseconds. select * from core_content content where to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true Limit 3927 Limit (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 rows=0 loops=1) - Bitmap Heap Scan on core_content content (cost=10091.09..57046.32 rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1) Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) - Bitmap Index Scan on ft_simple_core_content_content_idx (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 loops=1) Index Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 0.277 ms If I choose 3926 or any lower number (e.g. 10) the query execution took more than fifty seconds. select * from core_content content where to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true Limit 3927 Limit (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1) - Seq Scan on core_content content (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1) Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 52147.173 ms Is there any posibility to tune up the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes only the fast bitmap heap scan instead of the slow seq scan? I use PostgreSQL 9.1.5.; Intel i5-2400 @ 3.1 GHz, 16GB; Windows 7 64 Bit Regards, Bill Martin
Re: [PERFORM] Planner selects different execution plans depending on limit
Bill Martin bill.mar...@communote.com writes: I´ve created following table which contains one million records. ... Limit (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 rows=0 loops=1) - Bitmap Heap Scan on core_content content (cost=10091.09..57046.32 rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1) Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) - Bitmap Index Scan on ft_simple_core_content_content_idx (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 loops=1) Index Cond: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 0.277 ms Is there any posibility to tune up the performance even if the limit is only 10? The problem is the way-off rowcount estimate (20011 rows when it's really none); with a smaller estimate there, the planner wouldn't decide to switch to a seqscan. Did you take the advice to increase the column's statistics target? Because 20011 looks suspiciously close to the default estimate that tsquery_opr_selec will fall back on if it hasn't got enough stats to come up with a trustworthy estimate for a *-pattern query. (I think there are probably some bugs in tsquery_opr_selec's estimate for this, as I just posted about on pgsql-hackers. But this number looks like you're not even getting to the estimation code, for lack of enough statistics entries.) The other thing that seems kind of weird here is that the cost estimate for the bitmap index scan seems out of line even given the 2-entries-to-fetch estimate. I'd have expected a cost estimate of a few hundred for that, not 1. Perhaps this index is really bloated, and it's time to REINDEX it? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Planner selects different execution plans depending on limit
Hi All I´ve ft_simple_core_content_content_idx ON core_content USING gin (to_tsvector('simple'::regconfig, content) ); If I´m seaching for a word which is NOT in the column content the query plan and the execution time differs with the given limit. If I choose 3927 or any higher number the query execution took only few milliseconds. core_content content where to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true Limit 3927 Limit (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1) - Seq Scan on core_content content (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1) Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 52147.173 ms Is there any posibility to improve the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes only the fast bitmap heap scan instead of the slow seq scan? Regards, Bill Martin --- E-Mail ist da wo du bist! Jetzt mit freenetMail ganz bequem auch unterwegs E-Mails verschicken. Am besten gleich informieren unter http://mail.freenet.de/mobile-email/index.html
Re: [PERFORM] Planner selects different execution plans depending on limit
On 10/09/12 16:24, bill_mar...@freenet.de wrote: Hi All I´ve ft_simple_core_content_content_idx ON core_content USING gin (to_tsvector('simple'::regconfig, content) ); If I´m seaching for a word which is NOT in the column content the query plan and the execution time differs with the given limit. If I choose 3927 or any higher number the query execution took only few milliseconds. core_content content where to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true Limit 3927 Limit (cost=0.00..19302.23 rows=3926 width=621) (actual time=52147.149..52147.149 rows=0 loops=1) - Seq Scan on core_content content (cost=0.00..98384.34 rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1) Filter: (to_tsvector('simple'::regconfig, content) @@ '''asdasdadas'':*'::tsquery) Total runtime: 52147.173 ms Is there any posibility to improve the performance even if the limit is only 10? Is it possible to determine that the query optimizer takes only the fast bitmap heap scan instead of the slow seq scan? The big hammer is: set enable_seqscan = off, but if you tell which PG version you're on there may be something to do. I suggest you'd start by bumping the statistics target for the column to 1 and run analyze to see what that changes. -- Jesper