Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-18 Thread Bill Martin
Tom Lane mailto:t...@sss.pgh.pa.us writes:
 Bill Martin bill.mar...@communote.com writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 He can do it without having to change his schema --- but it's the 
 index column, not the underlying content column, that needs its 
 statistics target adjusted.

 How can I adjust the statistics target of the index?

 Just pretend it's a table.

   ALTER TABLE index_name ALTER COLUMN column_name SET STATISTICS ...

 You'll need to look at the index (eg with \d) to see what the name of the 
 desired column is, since index expressions have system-assigned
 column names.

   regards, tom lane

I tried: 
ALTER TABLE ft_simple_core_content_content_idx ALTER column to_tsvector SET 
STATISTICS 1;
ANALYZE;

and
REINDEX INDEX ft_simple_core_content_content_idx;

All the trouble was for nothing.

Are there any other possibilities to solve my problem?

Best regards,
Bill Martin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Bill Martin
Tom Lane tgl(at)sss(dot)pgh(dot)pa(dot)us writes:
 Bill Martin bill(dot)martin(at)communote(dot)com writes:
 I´ve created following table which contains one million records.
 ...

 Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual 
 time=0.255..0.255 rows=0 loops=1)
   -  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 
 rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1)
 Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ 
 '''asdasdadas'':*'::tsquery)
 -  Bitmap Index Scan on ft_simple_core_content_content_idx  
 (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 
 loops=1)
   Index Cond: (to_tsvector('simple'::regconfig, content) @@ 
 '''asdasdadas'':*'::tsquery)
 Total runtime: 0.277 ms

 Is there any posibility to tune up the performance even if the limit is only 
 10?

 The problem is the way-off rowcount estimate (20011 rows when it's
 really none); with a smaller estimate there, the planner wouldn't decide
 to switch to a seqscan.

 Did you take the advice to increase the column's statistics target?
 Because 20011 looks suspiciously close to the default estimate that
 tsquery_opr_selec will fall back on if it hasn't got enough stats
 to come up with a trustworthy estimate for a *-pattern query.

 (I think there are probably some bugs in tsquery_opr_selec's estimate
 for this, as I just posted about on pgsql-hackers.  But this number
 looks like you're not even getting to the estimation code, for lack
 of enough statistics entries.)

 The other thing that seems kind of weird here is that the cost estimate
 for the bitmap index scan seems out of line even given the
 2-entries-to-fetch estimate.  I'd have expected a cost estimate of a
 few hundred for that, not 1.  Perhaps this index is really bloated,
 and it's time to REINDEX it?

  regards, tom lane

Hi,
thank you for helping me.

I´ve tried different values for the statistics but it is all the same (the 
planner decide to switch to a seqscan if the limit is 10).


ALTER TABLE core_content ALTER column content SET STATISTICS 1000;


I also tried to reindex the index but the planner decide to switch to a seqscan.


REINDEX INDEX ft_simple_core_content_content_idx;



Disable the seqscan helps me but is this a good decision for all use cases?



SET enable_seqscan = off;



Are there any other possibilities to solve my problem?



Best regards,

Bill Martin



Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Tom Lane
Bill Martin bill.mar...@communote.com writes:
 I´ve tried different values for the statistics but it is all the same (the 
 planner decide to switch to a seqscan if the limit is 10).

 ALTER TABLE core_content ALTER column content SET STATISTICS 1000;

Um, did you actually do an ANALYZE after changing that?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Bill Martin
 Tom Lane t...@sss.pgh.pa.us writes:
 Bill Martin bill.mar...@communote.com writes:
 I've tried different values for the statistics but it is all the same (the 
 planner decide to switch to a seqscan if the limit is 10).

 ALTER TABLE core_content ALTER column content SET STATISTICS 1000;

 Um, did you actually do an ANALYZE after changing that?
 
   regards, tom lane

Yes, I've run the ANALYZE command.

Regards,
Bill Martin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Jesper Krogh

On 13/09/12 16:42, Bill Martin wrote:
Yes, I've run the ANALYZE command. Regards, Bill Martin 
The main problem in your case is actually that you dont store the 
tsvector in the table.


If you store to_tsvector('simple',content.content) in a column in
the database and search against that instead
then you'll allow PG to garther statistics on the column and make the
query-planner act according to that.

Jesper


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Tom Lane
Jesper Krogh jes...@krogh.cc writes:
 On 13/09/12 16:42, Bill Martin wrote:
 Yes, I've run the ANALYZE command. Regards, Bill Martin 

 The main problem in your case is actually that you dont store the 
 tsvector in the table.

Oh, duh, obviously I lack caffeine this morning.

 If you store to_tsvector('simple',content.content) in a column in
 the database and search against that instead
 then you'll allow PG to garther statistics on the column and make the
 query-planner act according to that.

He can do it without having to change his schema --- but it's the index
column, not the underlying content column, that needs its statistics
target adjusted.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Bill Martin
 Tom Lane t...@sss.pgh.pa.us writes:

 He can do it without having to change his schema --- but it's the index
 column, not the underlying content column, that needs its statistics
 target adjusted.

 regards, tom lane

How can I adjust the statistics target of the index?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-13 Thread Tom Lane
Bill Martin bill.mar...@communote.com writes:
 Tom Lane t...@sss.pgh.pa.us writes:
 He can do it without having to change his schema --- but it's the index
 column, not the underlying content column, that needs its statistics
 target adjusted.

 How can I adjust the statistics target of the index?

Just pretend it's a table.

ALTER TABLE index_name ALTER COLUMN column_name SET STATISTICS ...

You'll need to look at the index (eg with \d) to see what the name of
the desired column is, since index expressions have system-assigned
column names.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-11 Thread Bill Martin
On 10/09/12 16:24, bill_mar...@freenet.demailto:bill_mar...@freenet.de wrote:

Hi All

I´ve ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );


If I´m seaching for a word which is NOT in the column content the query plan 
and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only few 
milliseconds.

core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual 
time=52147.149..52147.149 rows=0 loops=1)
  -  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 
width=621) (actual time=52147.147..52147.147 rows=0 loops=1)
Filter: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)
Total runtime: 52147.173 ms

Is there any posibility to improve the performance even if the limit is only 
10? Is it possible to determine that the query optimizer takes only the fast 
bitmap heap scan instead of the slow seq scan?

The big hammer is: set enable_seqscan = off, but if you tell which PG version 
you're on there may be something to do. I suggest you'd start by bumping the 
statistics target for the column to 1 and run analyze to see what that 
changes.

--
Jesper

Hi,
my email client delete a lot of the content of the original thread message. 
Here is the full content:

Hi All

I´ve created following table which contains one million records.

CREATE TABLE core_content
(
  id bigint NOT NULL,
  content text NOT NULL,
  short_content text,
  CONSTRAINT core_content_pkey PRIMARY KEY (id )
)

CREATE INDEX ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );


If I´m seaching for a word which is not in the column content the query plan 
and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only few 
milliseconds.

select * from core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual time=0.255..0.255 
rows=0 loops=1)
  -  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 
rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1)
Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)
-  Bitmap Index Scan on ft_simple_core_content_content_idx  
(cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 
loops=1)
  Index Cond: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)
Total runtime: 0.277 ms

If I choose 3926 or any lower number (e.g. 10) the query execution took more 
than fifty seconds.

select * from core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual 
time=52147.149..52147.149 rows=0 loops=1)
  -  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 
width=621) (actual time=52147.147..52147.147 rows=0 loops=1)
Filter: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)
Total runtime: 52147.173 ms

Is there any posibility to tune up the performance even if the limit is only 
10? Is it possible to determine that the query optimizer takes
only the fast bitmap heap scan instead of the slow seq scan?

I use PostgreSQL 9.1.5.; Intel i5-2400 @ 3.1 GHz, 16GB; Windows 7 64 Bit

Regards,
Bill Martin



Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-11 Thread Tom Lane
Bill Martin bill.mar...@communote.com writes:
 I´ve created following table which contains one million records.
 ...

 Limit  (cost=10091.09..19305.68 rows=3927 width=621) (actual 
 time=0.255..0.255 rows=0 loops=1)
   -  Bitmap Heap Scan on core_content content  (cost=10091.09..57046.32 
 rows=20011 width=621) (actual time=0.254..0.254 rows=0 loops=1)
 Recheck Cond: (to_tsvector('simple'::regconfig, content) @@ 
 '''asdasdadas'':*'::tsquery)
 -  Bitmap Index Scan on ft_simple_core_content_content_idx  
 (cost=0.00..10086.09 rows=20011 width=0) (actual time=0.251..0.251 rows=0 
 loops=1)
   Index Cond: (to_tsvector('simple'::regconfig, content) @@ 
 '''asdasdadas'':*'::tsquery)
 Total runtime: 0.277 ms

 Is there any posibility to tune up the performance even if the limit is only 
 10?

The problem is the way-off rowcount estimate (20011 rows when it's
really none); with a smaller estimate there, the planner wouldn't decide
to switch to a seqscan.

Did you take the advice to increase the column's statistics target?
Because 20011 looks suspiciously close to the default estimate that
tsquery_opr_selec will fall back on if it hasn't got enough stats
to come up with a trustworthy estimate for a *-pattern query.

(I think there are probably some bugs in tsquery_opr_selec's estimate
for this, as I just posted about on pgsql-hackers.  But this number
looks like you're not even getting to the estimation code, for lack
of enough statistics entries.)

The other thing that seems kind of weird here is that the cost estimate
for the bitmap index scan seems out of line even given the
2-entries-to-fetch estimate.  I'd have expected a cost estimate of a
few hundred for that, not 1.  Perhaps this index is really bloated,
and it's time to REINDEX it?

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Planner selects different execution plans depending on limit

2012-09-10 Thread bill_martin
Hi All

I´ve  ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );

  
If I´m seaching for a word which is NOT in the column content the query plan 
and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only few 
milliseconds. 
  
 core_content content where
to_tsvector('simple', content.content) @@ tsquery(plainto_tsquery('simple', 
'asdasdadas') :: varchar || ':*')=true
Limit 3927

Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual 
time=52147.149..52147.149 rows=0 loops=1)
  -  Seq Scan on core_content content  (cost=0.00..98384.34 rows=20011 
width=621) (actual time=52147.147..52147.147 rows=0 loops=1)
    Filter: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)
Total runtime: 52147.173 ms

Is there any posibility to improve the performance even if the limit is only 
10? Is it possible to determine that the query optimizer takes only the fast 
bitmap heap scan instead of the slow seq scan?

Regards,
Bill Martin


---
E-Mail ist da wo du bist! Jetzt mit freenetMail ganz bequem auch unterwegs 
E-Mails verschicken.
Am besten gleich informieren unter 
http://mail.freenet.de/mobile-email/index.html

Re: [PERFORM] Planner selects different execution plans depending on limit

2012-09-10 Thread Jesper Krogh

On 10/09/12 16:24, bill_mar...@freenet.de wrote:


Hi All

I´ve ft_simple_core_content_content_idx
  ON core_content
  USING gin
  (to_tsvector('simple'::regconfig, content) );


If I´m seaching for a word which is NOT in the column content the 
query plan and the execution time differs with the given limit.
If I choose 3927 or any higher number the query execution took only 
few milliseconds.


core_content content where
to_tsvector('simple', content.content) @@ 
tsquery(plainto_tsquery('simple', 'asdasdadas') :: varchar || ':*')=true

Limit 3927

Limit  (cost=0.00..19302.23 rows=3926 width=621) (actual 
time=52147.149..52147.149 rows=0 loops=1)
  -  Seq Scan on core_content content  (cost=0.00..98384.34 
rows=20011 width=621) (actual time=52147.147..52147.147 rows=0 loops=1)
Filter: (to_tsvector('simple'::regconfig, content) @@ 
'''asdasdadas'':*'::tsquery)

Total runtime: 52147.173 ms

Is there any posibility to improve the performance even if the limit 
is only 10? Is it possible to determine that the query optimizer takes 
only the fast bitmap heap scan instead of the slow seq scan?




The big hammer is: set enable_seqscan = off, but if you tell which PG 
version you're on there may be something to do. I suggest you'd start by 
bumping the statistics target for the column to 1 and run analyze to 
see what that changes.


--
Jesper