Euler Taveira de Oliveira <[EMAIL PROTECTED]> writes: > The problem with this approach is how to select the part of the document > to index. How will you ensure you're not ignoring the more important > words of the document?
That's *always* a risk, anytime you do any sort of processing or normalization on the text. The question here is not whether or not we will make tradeoffs, only which ones to make. > IMHO Postgres shouldn't decide it; it would be good if an user could set > it runtime and/or on postgresql.conf. Well, there is exactly zero chance of that happening in 8.3.x, because the bit allocations for on-disk tsvector representation are already determined. It's fairly hard to see a way of doing it in future releases that would have acceptable costs, either. But more to the point: no matter what the document length limit is, why should it be a hard error to exceed it? The downside of not indexing words beyond the length limit is that searches won't find documents in which the search terms occur only very far into the document. The downside of throwing an error is that we can't store such documents at all, which surely guarantees that searches won't find them. How can you possibly argue that that option is better? regards, tom lane -- Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-patches