Hello,

I'm moving our product from 8.3.1 to 8.5.1 in dev and we've got tests
failing because Solr is getting OOMEs with a 512mb heap where it was
previously fine.

I ran our tests on both versions with jconsole to track the heap usage.
Here's a little comparison. 8.5.1 dies part way through
https://drive.google.com/open?id=113Ujts-lzv9ZBJOUB78LA2Qw5PsIsajO

We have our own query parser as an extension to Solr, and we do various
things with user queries, including generating FuzzyQuery-s. Our
implementation of org.apache.solr.search.QParser.parse() isn't stateful and
parses the qstr and returns new Query objects each time it's called.
With JProfiler on I can see that the majority of the heap is being
allocated through FuzzyQuery's constructor.
https://issues.apache.org/jira/browse/LUCENE-9068 moved construction of the
automata from the FuzzyTermsEnum to the FuzzyQuery's constructor.

When profiling on 8.3.1 we still have a fairly large number of
FuzzyTermEnums created at times, but that's accounting for about ~40mb of
the heap for a few seconds rather than the 100mb to 300mb of continual
allocation for FuzzyQuery I'm seeing in 8.5.

It's definitely possible that we're doing something wrong in our extension
(which I can't share the source of) but it seems like the memory cost of
FuzzyQuery now is totally disproportionate to what it was before. We've not
had issues like this with our extension before (which doesn't mean that our
parser is flawless, but it's not been causing noticeable problems for the
last 4 years).


So I suppose the question is, are we misusing FuzzyQuery in some way (hard
for you to say without seeing the source), or are the recent changes using
more memory than they should?

I will investigate further into what we're doing. But I could maybe use
some help to create a stress test for Lucene itself that compares the
memory consumption of the old FuzzyQuery vs the new, to see whether it's
fundamentally bad for memory or if it's just how we're using it.

Regards,
Colvin

Reply via email to