Re: [HACKERS] Searching for substring with tsearch(1/2)

Teodor Sigaev Wed, 10 Dec 2003 02:25:35 -0800

Tsearch was never minded as prefix search, and index structure doesn't support any kind of prefix or suffix. But you can write extension to tsearch, which will search by prefix. But such solution wiil not use index, only sequence scan.
How efficient would tsearch be for really big expressions (where 'hu%'
would be expanded (using a btree word index on one column word table) to
tsearch equivalent of ( "human" or "humanity" or "humming" or "huge" or
..1000 words here...) before passing the expression to tsearch?

GiST index of tsearch doen't support prefix search, so it will works only by seqscan, as we know :) disk is much more slow than processor, speed will be limited by disk.

Prefix searches easy realized with inverted index, but it require a lot of programing. The simplest way is: create table invidx ( lexeme text not null primary key, ids[] int );

where ids[] - array with identificators of documents which contains this word.
How hard (or sensible ;) would be creating such an index using GiST ?
As proved by tsearch GiST can cope well with many-to-many indexes.

Sorry, I don't understand. Do you mean that GiST supports one heap tuple in several index tuple? If yes then no :). GiST doesn't support this feature. I don't think that GiST may help in this situation.

create table invidx (
        lexeme   text not null,
        textdate date not null,
        ids[]    int,
        primary  key (lexeme, textdate)
);

which would partition the invidx table on textdate (or some other
suitable datum)

2 If word is frequent then query with 'IN (select * from func()) may works slow...

if it is often too slow then creating a temp table and doing a plain
join may be faster.

Table structure as indidx decrease this problem.
--
Teodor Sigaev                                  E-mail: [EMAIL PROTECTED]


---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
     joining column's datatypes do not match

Re: [HACKERS] Searching for substring with tsearch(1/2)

Reply via email to