Re: [PERFORM] FTS performance issue - planner problem identified (but only partially resolved)

Marc Mamin Sat, 20 Jul 2013 01:47:35 -0700


________________________________________
Von: Stefan Keller [[email protected]]
>Gesendet: Samstag, 20. Juli 2013 01:55
>
>Hi Marc
>
>Thanks a lot for your hint!
>
>You mean doing a "SET track_counts (true);" for the whole session?


No, 
I mean 

ALTER TABLE <table> ALTER  <ts_vector_column> SET STATISTICS 0;

And remove existing statistics

DELETE FROM pg_catalog.pg_statistic 
where starelid='<table>':: regclass
AND staattnum = (SELECT attnum FROM     pg_attribute
                WHERE attrelid = '<table>':: regclass
                AND  attname  =  '<ts_vector_column>'::name
                )

But you should first try to find out which proportion of your ts queries are 
faster 
when using a table scan as they will probably not happen anymore afterwards !
(Except if further columns on your table 'FullTextSearch' are considered by the 
planner)




>That would be ok if it would be possible just for the gin index.
>
>It's obviously an issue of the planner estimation costs.
>The data I'm speaking about ("movies") has a text attribute which has
>a length of more than 8K so it's obviously having to do with
>detoasting.
>But the thoughts about @@ operators together with this GIN index seem
>also to be valid.
>
>I hope this issue is being tracked in preparation for 9.3.
>
>Regards, Stefan
>
>
>2013/7/19 Marc Mamin <[email protected]>:
>>
>>> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
>>> plainto_tsquery('english', 'good');
>>>
>>> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB 
>>> zipped).
>>> The planner obviously always chooses table scan
>>
>>
>> Hello,
>>
>> A probable reason for the time difference is the cost for decompressing 
>> toasted content.
>> At least in 8.3, the planner was not good at estimating it.
>>
>> I'm getting better overall performances since I've stopped collect statistic 
>> on tsvectors.
>> An alternative would have been to disallow compression on them.
>>
>> I'm aware this is a drastic way and would not recommend it without testing. 
>> The benefit may depend on the type of data you are indexing.
>> In our use case these are error logs with many java stack traces, hence with 
>> many lexemes poorly discriminative.
>>
>> see: http://www.postgresql.org/message-id/[email protected]
>> as a comment on
>> http://www.postgresql.org/message-id/c4dac901169b624f933534a26ed7df310861b...@jenmail01.ad.intershop.net
>>
>> regards,
>>
>> Marc Mamin


-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] FTS performance issue - planner problem identified (but only partially resolved)

Reply via email to