Euler Taveira de Oliveira <[EMAIL PROTECTED]> writes:
> The problem with this approach is how to select the part of the document 
> to index. How will you ensure you're not ignoring the more important 
> words of the document?

That's *always* a risk, anytime you do any sort of processing or
normalization on the text.  The question here is not whether or not
we will make tradeoffs, only which ones to make.

> IMHO Postgres shouldn't decide it; it would be good if an user could set 
> it runtime and/or on postgresql.conf.

Well, there is exactly zero chance of that happening in 8.3.x, because
the bit allocations for on-disk tsvector representation are already
determined.  It's fairly hard to see a way of doing it in future
releases that would have acceptable costs, either.

But more to the point: no matter what the document length limit is,
why should it be a hard error to exceed it?  The downside of not
indexing words beyond the length limit is that searches won't find
documents in which the search terms occur only very far into the
document.  The downside of throwing an error is that we can't store such
documents at all, which surely guarantees that searches won't find
them.  How can you possibly argue that that option is better?

                        regards, tom lane

Sent via pgsql-patches mailing list (
To make changes to your subscription:

Reply via email to