Forwarding the note to users. Thanks Uwe for sharing your observations.
Thanks to Mr Woodward who brought intervals to the party.


On Fri, Dec 16, 2022 at 7:33 PM Uwe Schindler <[email protected]> wrote:

> Spans seem to have the problem of creating huge "List<Something>" during
> query iteration to track some stuff. I never understood the code, but to me
> it was always crazy to have Lists populated during execution. We replaced
> all SpanQueries by Intervals in patent search and speed is much faster and
> heap usage is tiny.
>
> A span/phrase with inOrder=false can always replaced by a phrase with
> slop. The slop is always without order, as it is an "edit distance" only
> (see documentation). If you need in order, an interval is required.
>
> Phrases are only in order for "slop=0". Compare to "slop=1" which means
> "next to each other" and is no longer in order.
>
> Uwe
> Am 15.12.2022 um 16:44 schrieb Mikhail Khludnev:
>
> Michael, thanks for stepping in!
>
> >   it seems that simple phrase
> queries would suffice here in place of spanNear?
>
> I think it wouldn't. It seems to me 4 is slop, and false is inOrder.
> Sjoerd, can you comment about particualt span queries you uses?
> Also, do you have any heap dump summary to confirm high memory consumption
> by spans?
>
> On Thu, Dec 15, 2022 at 5:33 PM Michael Gibney <[email protected]>
> wrote:
>
>> I don't think that nested boolean disjunctions consisting of isolated
>> spanNear queries at the leaves should have memory issues (as opposed
>> to nested spanNear queries around disjunctions, which might well do).
>> Am I misreading the string representation of that query? A little bit
>> more explicit information about how the query is built, so that we can
>> be certain of what we're dealing with, would be helpful.
>>
>> It'd certainly be worth trying IntervalsQuery -- but part of what
>> makes me think I must be missing something in interpreting the string
>> representation of the query provided: it seems that simple phrase
>> queries would suffice here in place of spanNear?
>>
>> Regarding SpanQuery vs. IntervalsQuery performance and
>> characteristics, there's some possibly-relevant discussion on
>> LUCENE-9204:
>>
>>
>> https://issues.apache.org/jira/browse/LUCENE-9204?focusedCommentId=17352589#comment-17352589
>>
>> Michael
>>
>>
>> On Wed, Dec 14, 2022 at 1:27 PM Mikhail Khludnev <[email protected]> wrote:
>> >
>> > Developers,
>> > Is it expected for Spans? Can IntervalsQuery help here?
>> >
>> > On Wed, Dec 14, 2022 at 5:41 PM Sjoerd Smeets <[email protected]>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> I've implemented a Span Query parser and when running the below query,
>> I'm
>> >> seeing Heap Size Space messages on certain shards:
>> >>
>> >> o.a.s.s.HttpSolrCall null:java.lang.RuntimeException:
>> >> java.lang.OutOfMemoryError: Java heap space
>> >>
>> >> The span query that I'm running is the following:
>> >>
>> >> ((spanNear([unstemmed_text:charge, unstemmed_text:account], 4, false)
>> >> spanNear([unstemmed_text:pledge, unstemmed_text:account], 4, false))
>> >> spanNear([unstemmed_text:pledge, unstemmed_text:deposit], 4, false))
>> >> spanNear([unstemmed_text:charge, unstemmed_text:deposit], 4, false)
>> >>
>> >> The heap size at the moment is set to 48Gb. We are running 4 shards in
>> 1
>> >> JVM and the 4 shards combined have 24M docs evenly distributed across
>> the
>> >> shards. We do use the collapse feature as well.
>> >>
>> >> This is on Solr 8.6.0
>> >>
>> >> What are the considerations for running Span Queries and heap sizes?
>> >>
>> >> Any suggestions are welcome
>> >>
>> >> Sjoerd
>> >
>> >
>> >
>> > --
>> > Sincerely yours
>> > Mikhail Khludnev
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>
> --
> Sincerely yours
> Mikhail Khludnev
>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremenhttps://www.thetaphi.de
> eMail: [email protected]
>
>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to