I created a little test that fires off fuzzy queries from random UUID
strings for 5 minutes
*FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"*

The change in heap usage is really severe.

On 8.5.1 Solr went OOM almost immediately on a 512mb heap, and with a 4GB
heap it only just stayed alive.
On 8.3.1 it was completely happy.

I'm guessing that the memory might be being leaked if the FuzzyQuery
objects are referenced from the cache, while the FuzzyTermsEnum would not
have been.

I'm going to raise an issue


On Wed, 22 Apr 2020 at 19:44, Colvin Cowie <colvin.cowie....@gmail.com>
wrote:

> Hello,
>
> I'm moving our product from 8.3.1 to 8.5.1 in dev and we've got tests
> failing because Solr is getting OOMEs with a 512mb heap where it was
> previously fine.
>
> I ran our tests on both versions with jconsole to track the heap usage.
> Here's a little comparison. 8.5.1 dies part way through
> https://drive.google.com/open?id=113Ujts-lzv9ZBJOUB78LA2Qw5PsIsajO
>
> We have our own query parser as an extension to Solr, and we do various
> things with user queries, including generating FuzzyQuery-s. Our
> implementation of org.apache.solr.search.QParser.parse() isn't stateful and
> parses the qstr and returns new Query objects each time it's called.
> With JProfiler on I can see that the majority of the heap is being
> allocated through FuzzyQuery's constructor.
> https://issues.apache.org/jira/browse/LUCENE-9068 moved construction of
> the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor.
>
> When profiling on 8.3.1 we still have a fairly large number of
> FuzzyTermEnums created at times, but that's accounting for about ~40mb of
> the heap for a few seconds rather than the 100mb to 300mb of continual
> allocation for FuzzyQuery I'm seeing in 8.5.
>
> It's definitely possible that we're doing something wrong in our extension
> (which I can't share the source of) but it seems like the memory cost of
> FuzzyQuery now is totally disproportionate to what it was before. We've not
> had issues like this with our extension before (which doesn't mean that our
> parser is flawless, but it's not been causing noticeable problems for the
> last 4 years).
>
>
> So I suppose the question is, are we misusing FuzzyQuery in some way (hard
> for you to say without seeing the source), or are the recent changes using
> more memory than they should?
>
> I will investigate further into what we're doing. But I could maybe use
> some help to create a stress test for Lucene itself that compares the
> memory consumption of the old FuzzyQuery vs the new, to see whether it's
> fundamentally bad for memory or if it's just how we're using it.
>
> Regards,
> Colvin
>
>
>
>

Reply via email to