Re: [HACKERS] Term positions in GIN fulltext index

Florian Pflug Fri, 04 Nov 2011 04:17:02 -0700

On Nov4, 2011, at 11:15 , Yoann Moreau wrote:
> On 03/11/11 19:19, Florian Pflug wrote:
>> Postgres doesn't seem to contain such a function currently (don't believe 
>> that,
>> though - go and recheck the documentation. I don't know all thousands of 
>> built-in
>> functions by heart). But it's easy to add one. You could either use PL/pgSQL
>> to parse the tsvector's textual representation, or write a C function. If you
>> go the PL/pgSQL route, regexp_split_to_table() might come in handy.
> 
> This seems easier to program than what I was thinking about, I'm going to do 
> that.
> But I'm wondering about size of database with the GIN index plus the tsvector 
> column,
> and performance about parsing the whole tsvectors for each document I need 
> positions
> from (as I need them for a very few terms).


AFAICS, the internal storage layout of tsvector should allow you to extract an
individual lexem's positions quite efficiently (with time complexity log(N) 
where
N is the number of lexems in the tsvector). Doing so will require you to 
implement
your function in C though - any solution that works from a tsvector's textual
representation will obviously have time complexity N.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Term positions in GIN fulltext index

Reply via email to