Forwarding the note to users. Thanks Uwe for sharing your observations. Thanks to Mr Woodward who brought intervals to the party.
On Fri, Dec 16, 2022 at 7:33 PM Uwe Schindler <[email protected]> wrote: > Spans seem to have the problem of creating huge "List<Something>" during > query iteration to track some stuff. I never understood the code, but to me > it was always crazy to have Lists populated during execution. We replaced > all SpanQueries by Intervals in patent search and speed is much faster and > heap usage is tiny. > > A span/phrase with inOrder=false can always replaced by a phrase with > slop. The slop is always without order, as it is an "edit distance" only > (see documentation). If you need in order, an interval is required. > > Phrases are only in order for "slop=0". Compare to "slop=1" which means > "next to each other" and is no longer in order. > > Uwe > Am 15.12.2022 um 16:44 schrieb Mikhail Khludnev: > > Michael, thanks for stepping in! > > > it seems that simple phrase > queries would suffice here in place of spanNear? > > I think it wouldn't. It seems to me 4 is slop, and false is inOrder. > Sjoerd, can you comment about particualt span queries you uses? > Also, do you have any heap dump summary to confirm high memory consumption > by spans? > > On Thu, Dec 15, 2022 at 5:33 PM Michael Gibney <[email protected]> > wrote: > >> I don't think that nested boolean disjunctions consisting of isolated >> spanNear queries at the leaves should have memory issues (as opposed >> to nested spanNear queries around disjunctions, which might well do). >> Am I misreading the string representation of that query? A little bit >> more explicit information about how the query is built, so that we can >> be certain of what we're dealing with, would be helpful. >> >> It'd certainly be worth trying IntervalsQuery -- but part of what >> makes me think I must be missing something in interpreting the string >> representation of the query provided: it seems that simple phrase >> queries would suffice here in place of spanNear? >> >> Regarding SpanQuery vs. IntervalsQuery performance and >> characteristics, there's some possibly-relevant discussion on >> LUCENE-9204: >> >> >> https://issues.apache.org/jira/browse/LUCENE-9204?focusedCommentId=17352589#comment-17352589 >> >> Michael >> >> >> On Wed, Dec 14, 2022 at 1:27 PM Mikhail Khludnev <[email protected]> wrote: >> > >> > Developers, >> > Is it expected for Spans? Can IntervalsQuery help here? >> > >> > On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <[email protected]> >> wrote: >> >> >> >> Hi, >> >> >> >> I've implemented a Span Query parser and when running the below query, >> I'm >> >> seeing Heap Size Space messages on certain shards: >> >> >> >> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException: >> >> java.lang.OutOfMemoryError: Java heap space >> >> >> >> The span query that I'm running is the following: >> >> >> >> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false) >> >> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false)) >> >> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false)) >> >> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false) >> >> >> >> The heap size at the moment is set to 48Gb. We are running 4 shards in >> 1 >> >> JVM and the 4 shards combined have 24M docs evenly distributed across >> the >> >> shards. We do use the collapse feature as well. >> >> >> >> This is on Solr 8.6.0 >> >> >> >> What are the considerations for running Span Queries and heap sizes? >> >> >> >> Any suggestions are welcome >> >> >> >> Sjoerd >> > >> > >> > >> > -- >> > Sincerely yours >> > Mikhail Khludnev >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >> > > -- > Sincerely yours > Mikhail Khludnev > > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de > eMail: [email protected] > > -- Sincerely yours Mikhail Khludnev
