Hmm... if you're manually constructing phrase queries during
pre-parsing, and those are set sow=true,
autogeneratePhraseQueries=true, then despite lack of pf, phrase
queries could still be a key to this. Would any of the phrase queries
explicitly introduced by your pre-parsing hactually  trigger
autogeneratePhraseQueries to kick in? (i.e., would any of the
whitespace-separated tokens in your phrases be further split by your
Solr-internal analysis chain -- WordDelimiter, (Solr-internal)
Synonym, etc.?). Would you be able to share the analysis chain on the
relevant fields, and perhaps (forgiving readability challenges) an
example of pre-parsed input that suffers particularly from performance
degradation?

On Thu, Aug 20, 2020 at 2:28 PM Elaine Cario <etca...@gmail.com> wrote:
>
> Thanks Michael, I took a look, but we don't have any pf or pf1,2,3 phrase
> params set at all.  Also, we don't add synonyms through Solr filters,
> rather we parse the user's query in our own application and add synonyms
> there, before it gets to Solr.
>
> Some additional info:  we have sow=true (to be compatible with Solr 6), and
> autogeneratePhraseQueries=true.  In our A/B testing, we didn't see any
> difference in search results (aside from some minor scoring variations), so
> functionally everything is working fine.
>
> I compared the debugQuery results between Solr 6 and 8 on a somewhat
> simplified query (they quickly become unreadable otherwise):
>
> Solr 6:
>   <str name="parsedquery">(+(DisjunctionMaxQuery((wkxmlsource:"new york" |
> title:"new york")~1.0) DisjunctionMaxQuery((wkxmlsource:ny | title:ny)~1.0)
> DisjunctionMaxQuery((wkxmlsource:"big apple" | title:"big
> apple")~1.0)))/no_coord</str>
>   <str name="parsedquery_toString">+((wkxmlsource:"new york" | title:"new
> york")~1.0 (wkxmlsource:ny | title:ny)~1.0 (wkxmlsource:"big apple" |
> title:"big apple")~1.0)</str>
>
> Solr 8:
>   <str name="parsedquery">+(DisjunctionMaxQuery((wkxmlsource:"new york" |
> title:"new york")~1.0) DisjunctionMaxQuery((wkxmlsource:ny | title:ny)~1.0)
> DisjunctionMaxQuery((wkxmlsource:"big apple" | title:"big
> apple")~1.0))</str>
>   <str name="parsedquery_toString">+((wkxmlsource:"new york" | title:"new
> york")~1.0 (wkxmlsource:ny | title:ny)~1.0 (wkxmlsource:"big apple" |
> title:"big apple")~1.0)</str>
>
> The only substantial difference is the removal of /no_coord (which is
> probably a result of LUCENE-7347 and likely accounts also for scoring
> variations).
>
> We do see generally higher CPU load with Solr 8 (although it is well within
> tolerance), and we do see much higher thread count (60 for Solr 6 vs 150
> for Solr 8 on average) even on a relatively quiet system.  That seems an
> interesting statistic, but not really sure what it signifies.  We mostly
> take the OOTB defaults for most everything, and config changes were
> minimal, mostly to maintain Solr 6 query behavior (uf=*_query_, sow=true).
>
> On Wed, Aug 19, 2020 at 5:46 PM Michael Gibney <mich...@michaelgibney.net>
> wrote:
>
> > Hi Elaine,
> > I'm curious what happens if you remove "pf" (phrase field) setting
> > from your edismax config?
> >
> > This question brought to mind
> >
> > https://issues.apache.org/jira/browse/SOLR-12243?focusedCommentId=16836448#comment-16836448
> > and https://issues.apache.org/jira/browse/LUCENE-8531. This *could*
> > have directly explained the behavior you're observing, except for the
> > fact that pre-6.5.0, analyzeGraphPhrase(...) generated a
> > fully-enumerated Lucene "GraphQuery" (since removed, but afaict
> > similar to MultiPhraseQuery). But the direct topic of SOLR-12243 was
> > that SpanNearQuery, nevermind its performance characteristics, was
> > getting completely ignored by edismax. Curious about your case, I
> > looked at ExtendedDismaxQParser for 6.4.2, and it appears that
> > GraphQuery was similarly ignored?:
> >
> >
> > https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.4.2/solr/core/src/java/org/apache/solr/search/ExtendedDismaxQParser.java#L1219-L1252
> >
> > If this is in fact the case (and I could well be overlooking
> > something), then it's possible that 6.4.2 was more performant mainly
> > because edismax was completely ignoring the more complex phrase
> > queries generated by analyzeGraphPhrase(...).
> >
> > I'll be curious to hear what you find, and eager to be corrected if
> > the above speculation is off-base!
> >
> > Michael
> >
> >
> > On Wed, Aug 19, 2020 at 10:56 AM Elaine Cario <etca...@gmail.com> wrote:
> > >
> > > Hi Solr experts,
> > >
> > > We're in the process of upgrading SolrCloud from 6.4.2 to 8.3.1, and our
> > > performance testing is consistently showing search latencies are
> > measurably
> > > higher in 8.3.1, for certain kinds of queries it may be as much as 200 ms
> > > higher on average.
> > >
> > > We've seen this now in 2 different environments.  In one environment, we
> > > effectively doubled the OS memory for Solr 8 (by removing a replica set),
> > > and saw little improvement.
> > >
> > > The specs on the VM's we're using are the same from Solr 6 and 8, and the
> > > index sizes and shard distribution are also the same.  We reviewed
> > garbage
> > > collection logs, and didn't see any red flags there.  We're still using
> > > Java 8 (sorry!).  Content was re-fed into Solr 8 from scratch.
> > >
> > > We re-ran queries removing all the usual suspects for high latencies:
> > > grouping, faceting, highlighting.We saw some improvement (as we would
> > > expect), but nothing approaching the average Solr 6 latencies with all
> > > those features turned on.
> > >
> > > We've narrowed the largest overall latencies to queries which contain
> > many
> > > terms OR'd together (essentially synonyms we add to the query ourselves);
> > > there may be as many as 0-38 or more quoted phrases OR'd together.
> > > Latencies increase the more synonyms we add (we always knew this), but it
> > > seems much worse in Solr 8. (It is an unfortunate quirk of our content
> > that
> > > these terms often have pretty high frequencies).  But it's not clear if
> > > this is just amplifying an underlying issue, or if something fundamental
> > > changed in the way Solr (or Lucene) resolves queries with OR'd terms.  We
> > > use a custom variant of edismax (but we also modified the queries to
> > enable
> > > use of OOTB edismax, and still saw no improvement).
> > >
> > > We also noted that 0-term queries (*:*) with lots of facets perform as
> > well
> > > as Solr 6, so it definitely seems related to searching for terms.
> > >
> > > I'm out of ideas here.  Has anyone experienced similar degradation from
> > > older Solr versions?
> > >
> > > Thanks in advance for any help you can provide.
> >

Reply via email to