I implemented a custom QueryComponent that issues the edismax query with mm=100%, and if no results are found, it reissues the query with mm=1. This doubled our query throughput (compared to mm=1 always), as we do some expensive RankQuery processing. For your very long student queries, mm=100% would obviously be too high, so you'd have to experiment.
On Fri, Sep 5, 2014 at 1:34 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Great! > > We have some very long queries, where students paste entire homework > problems. One of them was 1051 words. Many of them are over 100 words. This > could help. > > In the Jira discussion, I saw some comments about handling the most sparse > lists first. We did something like that in the Infoseek Ultra engine about > twenty years ago. Short termlists (documents matching a term) were > processed first, which kept the in-memory lists of matching docs small. It > also allowed early short-circuiting for no-hits queries. > > What would be a high mm value, 75%? > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ > > > On Sep 4, 2014, at 11:52 PM, Mikhail Khludnev <mkhlud...@griddynamics.com> > wrote: > > > indeed https://issues.apache.org/jira/browse/LUCENE-4571 > > my feeling is it gives a significant gain in mm high values. > > > > > > > > On Fri, Sep 5, 2014 at 3:01 AM, Walter Underwood <wun...@wunderwood.org> > > wrote: > > > >> Are there any speed advantages to using “mm”? I can imagine pruning the > >> set of matching documents early, which could help, but is that (or > >> something else) done? > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ > >> > >> > >> > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > Principal Engineer, > > Grid Dynamics > > > > <http://www.griddynamics.com> > > <mkhlud...@griddynamics.com> > >