https://issues.apache.org/jira/browse/SOLR-14428



On Thu, 23 Apr 2020 at 08:45, Colvin Cowie <colvin.cowie....@gmail.com>
wrote:

> I created a little test that fires off fuzzy queries from random UUID
> strings for 5 minutes
> *FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"*
>
> The change in heap usage is really severe.
>
> On 8.5.1 Solr went OOM almost immediately on a 512mb heap, and with a 4GB
> heap it only just stayed alive.
> On 8.3.1 it was completely happy.
>
> I'm guessing that the memory might be being leaked if the FuzzyQuery
> objects are referenced from the cache, while the FuzzyTermsEnum would not
> have been.
>
> I'm going to raise an issue
>
>
> On Wed, 22 Apr 2020 at 19:44, Colvin Cowie <colvin.cowie....@gmail.com>
> wrote:
>
>> Hello,
>>
>> I'm moving our product from 8.3.1 to 8.5.1 in dev and we've got tests
>> failing because Solr is getting OOMEs with a 512mb heap where it was
>> previously fine.
>>
>> I ran our tests on both versions with jconsole to track the heap usage.
>> Here's a little comparison. 8.5.1 dies part way through
>> https://drive.google.com/open?id=113Ujts-lzv9ZBJOUB78LA2Qw5PsIsajO
>>
>> We have our own query parser as an extension to Solr, and we do various
>> things with user queries, including generating FuzzyQuery-s. Our
>> implementation of org.apache.solr.search.QParser.parse() isn't stateful and
>> parses the qstr and returns new Query objects each time it's called.
>> With JProfiler on I can see that the majority of the heap is being
>> allocated through FuzzyQuery's constructor.
>> https://issues.apache.org/jira/browse/LUCENE-9068 moved construction of
>> the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor.
>>
>> When profiling on 8.3.1 we still have a fairly large number of
>> FuzzyTermEnums created at times, but that's accounting for about ~40mb of
>> the heap for a few seconds rather than the 100mb to 300mb of continual
>> allocation for FuzzyQuery I'm seeing in 8.5.
>>
>> It's definitely possible that we're doing something wrong in our
>> extension (which I can't share the source of) but it seems like the memory
>> cost of FuzzyQuery now is totally disproportionate to what it was before.
>> We've not had issues like this with our extension before (which doesn't
>> mean that our parser is flawless, but it's not been causing noticeable
>> problems for the last 4 years).
>>
>>
>> So I suppose the question is, are we misusing FuzzyQuery in some way
>> (hard for you to say without seeing the source), or are the recent changes
>> using more memory than they should?
>>
>> I will investigate further into what we're doing. But I could maybe use
>> some help to create a stress test for Lucene itself that compares the
>> memory consumption of the old FuzzyQuery vs the new, to see whether it's
>> fundamentally bad for memory or if it's just how we're using it.
>>
>> Regards,
>> Colvin
>>
>>
>>
>>

Reply via email to