Re: [PERFORM] text search: tablescan cost for a tsvector

2012-02-29 Thread Robert Haas
On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin m.ma...@intershop.de wrote:
 without analyze: http://explain.depesz.com/s/6At
 with analyze:    http://explain.depesz.com/s/r3B

I think this is the same issue complained about here:

http://archives.postgresql.org/message-id/4ed68eec.9030...@krogh.cc

And here:

http://archives.postgresql.org/message-id/CANxtv6XiuiqEkXRJU2vk=xkafxrlep7uvhgr-xmcyjgqz29...@mail.gmail.com

The problem seems to be that the cost estimator doesn't know that
detoasting is expensive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] text search: tablescan cost for a tsvector

2012-02-29 Thread Marc Mamin
 Von: Robert Haas [mailto:robertmh...@gmail.com]
 Gesendet: Mi 2/29/2012 7:32

  
 On Mon, Feb 6, 2012 at 6:05 AM, Marc Mamin m.ma...@intershop.de wrote:
  without analyze: http://explain.depesz.com/s/6At
  with analyze:http://explain.depesz.com/s/r3B
... 
 The problem seems to be that the cost estimator doesn't know that
 detoasting is expensive.

Hello,

Tom Lane has started a follow up thread in the hacker list.
Detoasting is indeed the main obstacle, but I've repeated my test using plain 
storage
and the planer still choose (systematically?) the slowest query.
It seems that I bumped into 2 different issues at the same time.

http://archives.postgresql.org/pgsql-hackers/2012-02/msg00896.php

Backround: 
Our reporting system offers amongst others time histograms 
combined with a FTS filtering on error occurences (imported from error logs), 
It is hence not unusual that given search terms are found within a majority of 
the documents...

best regards,

Marc Mamin


[PERFORM] text search: tablescan cost for a tsvector

2012-02-06 Thread Marc Mamin
Hello,

I have quite systematically better performance with the text search when
I disable the statistics collection for the tsvector column.
So I wonder if such statistics ever make sense.

Here a testcase:

The table contains 200'000 tsvector, whereas the lexeme 'fooblablabla'
exists in all tsvector:
Without statistics, the planner decide as expected for the gin index.
After analyze, it switch to a table scan which is also expected, but the
query is 3 times slower.

My first thought was that the Bitmap Heap Scan was really fast as the
searched term is always at the first position.
So I repeated the test with an additional search term at the last
position, but without significant change:

(result from the 6. test below)

without analyze: http://explain.depesz.com/s/6At
with analyze:http://explain.depesz.com/s/r3B


best regards,

Marc Mamin




Here all my results, always one of the fastest from a few runs.


CREATE TABLE tsv_test
(
  id bigserial NOT NULL,
  v tsvector
);


The code to fill the table with test data can be found below


The test query:

explain analyze
select id from tsv_test where v @@ 'lexeme3179'::tsquery 
UNION ALL
select id from tsv_test where v @@ 'lexeme5'::tsquery
UNION ALL
select id from tsv_test where v @@ 'fooblablabla'::tsquery

The results

A) on first lexeme

1) without indexes without analyze:
   http://explain.depesz.com/s/bOv

2) alter table tsv_test add constraint tsv_test_pk primary key(id);
   http://explain.depesz.com/s/9QQ (same as previous);

3) create index tsv_gin on tsv_test using gin(v);
   http://explain.depesz.com/s/r4M = fastest

4) ANALYZE tsv_test (id);
   http://explain.depesz.com/s/MyC (same as previous);

5) ANALYZE tsv_test;
   http://explain.depesz.com/s/qu3S 
   

B) on lastlexeme   

6) create table  tsv_test2 as select id,
   v||'zzthisisalongerlexemethisisalongerlexeme'::tsvector 
   from tsv_test;
   
   explain analyze
   select id from tsv_test2 where v @@
'zzthisisalongerlexemethisisalongerlexeme'::tsquery 
   
   http://explain.depesz.com/s/6At 
   
   ANALYZE tsv_test2;
   
   http://explain.depesz.com/s/r3B 



test data:

insert into tsv_test (v) 
select
cast('fooblablabla' ||
' lexeme'||s%2|| ' lexeme'||s%3|| ' lexeme'||s%4||
' lexeme'||s%4|| ' lexeme'||s%5|| ' lexeme'||s%6||
' lexeme'||s%7|| ' lexeme'||s%8|| ' lexeme'||s%9||
' lexeme'||s%10 || ' lexeme2'||s%11 || ' lexeme3'||s%12 ||
' lexeme'||s%11 || ' lexeme2'||s%12 || ' lexeme3'||s%22 ||
' lexeme'||s%12 || ' lexeme2'||s%13 || ' lexeme3'||s%32 ||
' lexeme'||s%13 || ' lexeme2'||s%14 || ' lexeme3'||s%42 ||
' lexeme'||s%14 || ' lexeme2'||s%15 || ' lexeme3'||s%52 ||
' lexeme'||s%15 || ' lexeme2'||s%16 || ' lexeme3'||s%62 ||
' lexeme'||s%16 || ' lexeme2'||s%17 || ' lexeme3'||s%72 ||
' lexeme'||s%17 || ' lexeme2'||s%18 || ' lexeme3'||s%82 ||
' lexeme'||s%18 || ' lexeme2'||s%19 || ' lexeme3'||s%92 ||
' lexeme'||s%19 || ' lexeme2'||s%10 || ' lexeme3'||s%15 ||
' lexeme'||s%12 || ' lexeme2'||s%71 || ' lexeme3'||s%16 ||
' lexeme'||s%20 || ' lexeme2'||s%81 || ' lexeme3'||s%17 ||
' lexeme'||s%35 || ' lexeme2'||s%91 || ' lexeme3'||s%18 ||
' lexeme'||s%100 || ' lexeme2'||s%110 || ' lexeme3'||s%120 ||
' lexeme'||s%110 || ' lexeme2'||s%120 || ' lexeme3'||s%220 ||
' lexeme'||s%120 || ' lexeme2'||s%130 || ' lexeme3'||s%320 ||
' lexeme'||s%130 || ' lexeme2'||s%140 || ' lexeme3'||s%420 ||
' lexeme'||s%140 || ' lexeme2'||s%150 || ' lexeme3'||s%520 ||
' lexeme'||s%150 || ' lexeme2'||s%160 || ' lexeme3'||s%620 ||
' lexeme'||s%160 || ' lexeme2'||s%170 || ' lexeme3'||s%720 ||
' lexeme'||s%170 || ' lexeme2'||s%180 || ' lexeme3'||s%820 ||
' lexeme'||s%180 || ' lexeme2'||s%190 || ' lexeme3'||s%920 ||
' lexeme'||s%190 || ' lexeme2'||s%100 || ' lexeme3'||s%150 ||
' lexeme'||s%120 || ' lexeme2'||s%710 || ' lexeme3'||s%160 ||
' lexeme'||s%200 || ' lexeme2'||s%810 || ' lexeme3'||s%170 ||
' lexeme'||s%350 || ' lexeme2'||s%910 || ' lexeme3'||s%180 
as tsvector)
FROM generate_series(1,10) s
UNION ALL
select
cast('fooblablabla' ||
' thisisalongerlexemethisisalongerlexeme'||s%2|| '
thisisalongerlexemethisisalongerlexeme'||s%3|| '
thisisalongerlexemethisisalongerlexeme'||s%4||
' thisisalongerlexemethisisalongerlexeme'||s%4|| '
thisisalongerlexemethisisalongerlexeme'||s%5|| '
thisisalongerlexemethisisalongerlexeme'||s%6||
' thisisalongerlexemethisisalongerlexeme'||s%7|| '
thisisalongerlexemethisisalongerlexeme'||s%8|| '
thisisalongerlexemethisisalongerlexeme'||s%9||
' thisisalongerlexemethisisalongerlexeme'||s%10 || '
thisisalongerlexemethisisalongerlexeme2'||s%11 || '
thisisalongerlexemethisisalongerlexeme3'||s%12 ||
' thisisalongerlexemethisisalongerlexeme'||s%11 || '
thisisalongerlexemethisisalongerlexeme2'||s%12 || '
thisisalongerlexemethisisalongerlexeme3'||s%22 ||
' thisisalongerlexemethisisalongerlexeme'||s%12 || '
thisisalongerlexemethisisalongerlexeme2'||s%13 || '
thisisalongerlexemethisisalongerlexeme3'||s%32 ||
' thisisalongerlexemethisisalongerlexeme'||s%13 || '