Teodor Sigaev wrote: >> The problem as I remember it is pg_tgrm not tsearch2 directly, I've >> sent a self contained test case directly to Teodor which shows the >> error. >> 'ERROR: index row requires 8792 bytes, maximum size is 8191' > Uh, I see. But I'm really surprised why do you use pg_trgm on big text? > pg_trgm is designed to find similar words and use technique known as > trigrams. This will work good on small pieces of text such as words or > set expression. But all big texts (on the same language) will be similar > :(. So, I didn't take care about guarantee that index tuple's size > limitation. In principle, it's possible to modify pg_trgm to have such > guarantee, but index becomes lossy - all tuples gotten from index > should be checked by table's tuple evaluation.
We are trying to get something faster than ~ '%foo%'; Which Tsearch2 does not give us :) Joshua D. Drake > > If you want to search similar documents I can recommend to have a look > to fingerprint technique (http://webglimpse.net/pubs/TR93-33.pdf). It's > pretty close to trigrams and metrics of similarity is the same, but uses > another signature calculations. And, there are some tips and trics: > removing HTML marking,removing punctuation, lowercasing text and so on - > it's interesting and complex task. -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq