Ah... I think there are two issues likely at play here. One is LUCENE-8531 <https://issues.apache.org/jira/browse/LUCENE-8531>, which reverts a bug related to SpanNearQuery semantics, causing possible query paths to be enumarated up front. Setting ps=0 (although perhaps not appropriate for some use cases) should address problems related to this issue.
The other (likely affecting Gregg, for whom ps=0 did not help) is SOLR-12243 <https://issues.apache.org/jira/browse/SOLR-12243>. Prior to 7.6, SpanNearQuery (generated for relatively complex "graph" tokenized queries, such as would be generated with WDGF, SynonymGraphFilter, etc.) were simply getting dropped. This was surely a bug, in that pf did not contribute at all to boosting such queries; but the silver lining was that performance was great ;-) Markus, Gregg, could send examples (parsed query toString()) of problematic queries (and perhaps relevant analysis chain configs)? Michael On Fri, Feb 22, 2019 at 11:00 AM Gregg Donovan <gregg...@gmail.com> wrote: > FWIW: we have also seen serious Query of Death issues after our upgrade to > Solr 7.6. Are there any open issues we can watch? Is Markus' findings > around `pf` our best guess? We've seen these issues even with ps=0. We also > use the WDF. > > On Fri, Feb 22, 2019 at 8:58 AM Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > Hello Michael, > > > > Sorry it took so long to get back to this, too many things to do. > > > > Anyway, yes, we have WDF on our query-time analysers. I uploaded two log > > files, both the same query of death with and without synonym filter > enabled. > > > > https://mail.openindex.io/export/solr-8983-console.log 23 MB > > https://mail.openindex.io/export/solr-8983-console-without-syns.log 1.9 > MB > > > > Without the synonym we still see a huge number of entries. Many different > > parts of our analyser chain contribute to the expansion of queries, but > pf > > itself really turns the problem on or off. > > > > Since SOLR-12243 is new in 7.6, does anyone know that SOLR-12243 could > > have this side-effect? > > > > Thanks, > > Markus > > > > > > -----Original message----- > > > From:Michael Gibney <mich...@michaelgibney.net> > > > Sent: Friday 8th February 2019 17:19 > > > To: solr-user@lucene.apache.org > > > Subject: Re: Query of Death Lucene/Solr 7.6 > > > > > > Hi Markus, > > > As of 7.6, LUCENE-8531 < > > https://issues.apache.org/jira/browse/LUCENE-8531> > > > reverted a graph/Spans-based phrase query implementation (introduced in > > 6.5 > > > -- LUCENE-7699 <https://issues.apache.org/jira/browse/LUCENE-7699>) to > > an > > > implementation that builds a separate phrase query for each possible > > > enumerated path through the graph described by a parsed query. > > > The potential for combinatoric explosion of the enumerated approach was > > (as > > > far as I can tell) one of the main motivations for introducing the > > > Spans-based implementation. Some real-world use cases would be good to > > > explore. Markus, could you send (as an attachment) the debug toString() > > for > > > the queries with/without synonyms enabled? I'm also guessing you may > have > > > WordDelimiterGraphFilter on the query analyzer? > > > As an alternative to disabling pf, LUCENE-8531 only reverts to the > > > enumerated approach for phrase queries where slop>0, so setting ps=0 > > would > > > probably also help. > > > Michael > > > > > > On Fri, Feb 8, 2019 at 5:57 AM Markus Jelsma < > markus.jel...@openindex.io > > > > > > wrote: > > > > > > > Hello (apologies for cross-posting), > > > > > > > > While working on SOLR-12743, using 7.6 on two nodes and 7.2.1 on the > > > > remaining four, we stumbled upon a situation where the 7.6 nodes > > quickly > > > > succumb when a 'Query-of-Death' is issued, 7.2.1 up to 7.5 are all > > > > unaffected (tested and confirmed). > > > > > > > > Following Smiley's suggestion i used Eclipse MAT to find the problem > in > > > > the heap dump i obtained, this fantastic tool revealed within minutes > > that > > > > a query thread ate 65 % of all resources, in the class variables i > > could > > > > find the the query, and reproduce the problem. > > > > > > > > The problematic query is 'dubbele dijk/rijke dijkproject in het > > dijktracé > > > > eemshaven-delfzijl', on 7.6 this input produces a 40+ MB toString() > > output > > > > in edismax' newFieldQuery. If the node survives it takes 2+ seconds > > for the > > > > query to run (150 ms otherwise). If i disable all query time > > > > SynonymGraphFilters it still takes a second and produces just a 9 MB > > > > toString() for the query. > > > > > > > > I could not find anything like this in Jira. I did think of > LUCENE-8479 > > > > and LUCENE-8531 but they were about graphs, this problem looked > related > > > > though. > > > > > > > > I think i tracked it further down to LUCENE-8589 or SOLR-12243. When > i > > > > leave Solr's edismax' pf parameter empty, everything runs fast. When > > all > > > > fields are configured for pf, the node dies. > > > > > > > > I am now unsure whether this is a Solr or a Lucene issue. > > > > > > > > Please let me know. > > > > > > > > Many thanks, > > > > Markus > > > > > > > > ps. in Solr i even got an 'Impossible Exception', my first! > > > > > > > > > >