Le 2012-10-09 à 17:38, Shane Hathaway a écrit :

> Hello,
> 
> The database has a text index of around 200,000 documents. Investigation 
> revealed that text queries are slow only when using ts_rank or ts_rank_cd.  
> Without a ts_rank function, any query is answered within 200ms or so; with 
> ts_rank function, queries take up to 30 seconds.  Deeper investigation using 
> gprof showed that the problem is probably not ts_rank or ts_rank_cd, but the 
> fact that those functions retrieve thousands of TOASTed tsvectors.

Is the query perhaps doing something like this:

SELECT ...
FROM table
WHERE tsvectorcol @@ plainto_tsquery('...')
ORDER BY ts_rank(...)

If so, ts_rank() is run for every document. What you should do instead is:

SELECT *
FROM (
    SELECT ...
    FROM table
    WHERE tsvectorcol @@ plainto_tsquery('...')) AS t1
ORDER BY ts_rank(...)

Notice the ts_rank() is on the outer query, which means it'll only run on the 
subset of documents which match the query. This is explicitly mentioned in the 
docs:

"""Ranking can be expensive since it requires consulting the tsvector of each 
matching document, which can be I/O bound and therefore slow. Unfortunately, it 
is almost impossible to avoid since practical queries often result in large 
numbers of matches."""

(last paragraph of) 
http://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-RANKING

Hope that helps!
François Beausoleil

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to