Well, the index does, indeed, get bigger. But the searches
get much faster because there's no term expansion going
on. It's another time/space tradeoff.  I'm afraid you'll have
to just experiment a bit to see if this is an acceptable tradeoff.
in your particular situation....

The real memory hit in Lucene comes from *sorting* a field
with many unique terms. And you won't sort on the NGram
field I don't think.... and disk space is cheap.

Best
Erick

On Sat, May 29, 2010 at 3:44 AM, Gert Brinkmann <g...@netcologne.de> wrote:

>
> Thank you, Chris and Erick, for the answers,
>
> it was new to me that "the*" is expanded to all known the* words in the
> index. Good to know.
>
> And yes, the AND operation between the query terms are certainly the
> problem. (I would like to switch to OR instead. The result set will grow the
> more words you are searching for, but as the results are ordered for the hit
> quality this would be ok. But the customer does not like this behaviour,
> because he thinks that the more words you are searching for, the smaller the
> result set should become. So this is not an option.).
>
>
> On 28.05.2010 22:06, Chris Hostetter wrote:
>
>> word2*) ..." in the client, that you instead consider using multiple
>> fields -- one "text" defined as you have it now, and one "text_prefix"
>> defined similarly but with an additional EdgeNGramTokenFilter used when
>> indexing to generate "prefix" tokens. then search those fields using
>> dismax...
>>
>> q=word1 word2 word3&  qf=text text_prefix&  mm=100%&  tie=0
>>
>
> Ok, I will think about this. But I wonder if this will be more efficient
> than just not filtering stopwords? (But I have to study the EdgeNGram thing
> first. AFAIK it indexes all WORDS as WORDS, WORD, WOR, WO. So the index will
> be blown up, too?)
>
> What I do not understand in your idea, why I should use a second
> text_prefix field. Wouldn't it work with just this text_prefix without the
> normal text field, too, as I always let search for "word" and "word*" and
> never without the prefix?
>
> Thanks,
> Gert
>

Reply via email to