I implemented a custom QueryComponent that issues the edismax query with
mm=100%, and if no results are found, it reissues the query with mm=1. This
doubled our query throughput (compared to mm=1 always), as we do some
expensive RankQuery processing. For your very long student queries, mm=100%
would obviously be too high, so you'd have to experiment.

On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> Great!
>
> We have some very long queries, where students paste entire homework
> problems. One of them was 1051 words. Many of them are over 100 words. This
> could help.
>
> In the Jira discussion, I saw some comments about handling the most sparse
> lists first. We did something like that in the Infoseek Ultra engine about
> twenty years ago. Short termlists (documents matching a term) were
> processed first, which kept the in-memory lists of matching docs small. It
> also allowed early short-circuiting for no-hits queries.
>
> What would be a high mm value, 75%?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <mkhlud...@griddynamics.com>
> wrote:
>
> > indeed https://issues.apache.org/jira/browse/LUCENE-4571
> > my feeling is it gives a significant gain in mm high values.
> >
> >
> >
> > On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <wun...@wunderwood.org>
> > wrote:
> >
> >> Are there any speed advantages to using “mm”? I can imagine pruning the
> >> set of matching documents early, which could help, but is that (or
> >> something else) done?
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/
> >>
> >>
> >>
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mkhlud...@griddynamics.com>
>
>

Reply via email to