Well, the index does, indeed, get bigger. But the searches get much faster because there's no term expansion going on. It's another time/space tradeoff. I'm afraid you'll have to just experiment a bit to see if this is an acceptable tradeoff. in your particular situation....
The real memory hit in Lucene comes from *sorting* a field with many unique terms. And you won't sort on the NGram field I don't think.... and disk space is cheap. Best Erick On Sat, May 29, 2010 at 3:44 AM, Gert Brinkmann <g...@netcologne.de> wrote: > > Thank you, Chris and Erick, for the answers, > > it was new to me that "the*" is expanded to all known the* words in the > index. Good to know. > > And yes, the AND operation between the query terms are certainly the > problem. (I would like to switch to OR instead. The result set will grow the > more words you are searching for, but as the results are ordered for the hit > quality this would be ok. But the customer does not like this behaviour, > because he thinks that the more words you are searching for, the smaller the > result set should become. So this is not an option.). > > > On 28.05.2010 22:06, Chris Hostetter wrote: > >> word2*) ..." in the client, that you instead consider using multiple >> fields -- one "text" defined as you have it now, and one "text_prefix" >> defined similarly but with an additional EdgeNGramTokenFilter used when >> indexing to generate "prefix" tokens. then search those fields using >> dismax... >> >> q=word1 word2 word3& qf=text text_prefix& mm=100%& tie=0 >> > > Ok, I will think about this. But I wonder if this will be more efficient > than just not filtering stopwords? (But I have to study the EdgeNGram thing > first. AFAIK it indexes all WORDS as WORDS, WORD, WOR, WO. So the index will > be blown up, too?) > > What I do not understand in your idea, why I should use a second > text_prefix field. Wouldn't it work with just this text_prefix without the > normal text field, too, as I always let search for "word" and "word*" and > never without the prefix? > > Thanks, > Gert >