To be precise about tsvector:

1) GiST index is lossy for any kind of tserach queries, GIN index for @@ operation is not lossy, for @@@ - is lossy.

2) Number of positions per word is limited to 256 number - bigger number of positions is not helpful for ranking, but produces a big tsvector. If word has a lot of positions in document then it close to be a stopword. We could easy increase this limit to 65536 positions

3) Maximum value of position is 2^14, because for position's storage we use uint16. In this integer it's needed to reserve 2 bits to store weight of this position. It's possible to increase int16 to int32, but it will doubled tsvector size, which is unpractical, I suppose. So, part of document used for ranking contains first 16384 words - that is about first 50-100 kilobytes.

4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. It contains number of bytes between first lexeme in tsvector and needed lexeme.
So, limitation is total length of lexemes plus  theirs positional information.


--
Teodor Sigaev                                   E-mail: [EMAIL PROTECTED]
                                                   WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches

Reply via email to