https://issues.apache.org/jira/browse/SOLR-14428
On Thu, 23 Apr 2020 at 08:45, Colvin Cowie <colvin.cowie....@gmail.com> wrote: > I created a little test that fires off fuzzy queries from random UUID > strings for 5 minutes > *FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"* > > The change in heap usage is really severe. > > On 8.5.1 Solr went OOM almost immediately on a 512mb heap, and with a 4GB > heap it only just stayed alive. > On 8.3.1 it was completely happy. > > I'm guessing that the memory might be being leaked if the FuzzyQuery > objects are referenced from the cache, while the FuzzyTermsEnum would not > have been. > > I'm going to raise an issue > > > On Wed, 22 Apr 2020 at 19:44, Colvin Cowie <colvin.cowie....@gmail.com> > wrote: > >> Hello, >> >> I'm moving our product from 8.3.1 to 8.5.1 in dev and we've got tests >> failing because Solr is getting OOMEs with a 512mb heap where it was >> previously fine. >> >> I ran our tests on both versions with jconsole to track the heap usage. >> Here's a little comparison. 8.5.1 dies part way through >> https://drive.google.com/open?id=113Ujts-lzv9ZBJOUB78LA2Qw5PsIsajO >> >> We have our own query parser as an extension to Solr, and we do various >> things with user queries, including generating FuzzyQuery-s. Our >> implementation of org.apache.solr.search.QParser.parse() isn't stateful and >> parses the qstr and returns new Query objects each time it's called. >> With JProfiler on I can see that the majority of the heap is being >> allocated through FuzzyQuery's constructor. >> https://issues.apache.org/jira/browse/LUCENE-9068 moved construction of >> the automata from the FuzzyTermsEnum to the FuzzyQuery's constructor. >> >> When profiling on 8.3.1 we still have a fairly large number of >> FuzzyTermEnums created at times, but that's accounting for about ~40mb of >> the heap for a few seconds rather than the 100mb to 300mb of continual >> allocation for FuzzyQuery I'm seeing in 8.5. >> >> It's definitely possible that we're doing something wrong in our >> extension (which I can't share the source of) but it seems like the memory >> cost of FuzzyQuery now is totally disproportionate to what it was before. >> We've not had issues like this with our extension before (which doesn't >> mean that our parser is flawless, but it's not been causing noticeable >> problems for the last 4 years). >> >> >> So I suppose the question is, are we misusing FuzzyQuery in some way >> (hard for you to say without seeing the source), or are the recent changes >> using more memory than they should? >> >> I will investigate further into what we're doing. But I could maybe use >> some help to create a stress test for Lucene itself that compares the >> memory consumption of the old FuzzyQuery vs the new, to see whether it's >> fundamentally bad for memory or if it's just how we're using it. >> >> Regards, >> Colvin >> >> >> >>