Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

Teodor Sigaev Fri, 07 Mar 2008 05:57:37 -0800

To be precise about tsvector:

1) GiST index is lossy for any kind of tserach queries, GIN index for @@operation is not lossy, for @@@ - is lossy.

2) Number of positions per word is limited to 256 number - bigger number ofpositions is not helpful for ranking, but produces a big tsvector. If word has alot of positions in document then it close to be a stopword. We could easyincrease this limit to 65536 positions

3) Maximum value of position is 2^14, because for position's storage we useuint16. In this integer it's needed to reserve 2 bits to store weight of thisposition. It's possible to increase int16 to int32, but it will doubled tsvectorsize, which is unpractical, I suppose. So, part of document used for rankingcontains first 16384 words - that is about first 50-100 kilobytes.

4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. Itcontains number of bytes between first lexeme in tsvector and needed lexeme.

So, limitation is total length of lexemes plus  theirs positional information.


--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

Reply via email to