[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095689#comment-17095689 ] Andrzej Bialecki commented on SOLR-14428: - Alan, seems you're on top of things here, feel free to assign the issue to yourself. :) > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17095452#comment-17095452 ] Alan Woodward commented on SOLR-14428: -- I opened https://github.com/apache/lucene-solr/pull/1467 - we need to restore the weird and hacky 'cache automata on a shared attribute source' behaviour. I'm not brilliantly happy with this, but the mechanics of MultiTermQuery don't really make it feasible to share the automata anywhere else. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094850#comment-17094850 ] Mike Drob commented on SOLR-14428: -- bq. I don't think we need to give queries the ability to produce cache keys. Queries are abstractions for an information need, which is typically something that users would type in a search box. If users can express an information need in a few characters, then there is no reason why the Query representation would take much more memory I've been thinking about this and I think this is the most intuitive path. bq. I think we can try building the automata in `getTermsEnum()`, and adding some laziness in `visit()` to prevent it being built unnecessarily. I'll open a PR. My PR mostly does this already, in addition to the cache key piece. If you take a look, it might make for a reasonable starting point. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094411#comment-17094411 ] Alan Woodward commented on SOLR-14428: -- I think we can try building the automata in `getTermsEnum()`, and adding some laziness in `visit()` to prevent it being built unnecessarily. I'll open a PR. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094401#comment-17094401 ] Adrien Grand commented on SOLR-14428: - We currently store automata on the Query because it's more convenient given the state of the MultiTermQuery API, but they really belong to a Weight? If we fixed it then we wouldn't have memory issues with the cache anymore. I don't think we need to give queries the ability to produce cache keys. Queries are abstractions for an information need, which is typically something that users would type in a search box. If users can express an information need in a few characters, then there is no reason why the Query representation would take much more memory? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094309#comment-17094309 ] Colvin Cowie commented on SOLR-14428: - {quote}The QueryResultKey could cache by the original string input instead of the Query object, thus it wouldn't be affected. This would short-circuit query parsing and be a performance benefit as well? {quote} On this point specifically, I think the downside to that would be that any variations of query strings which can be rewritten to Query objects that are equal() wouldn't get hits that they currently do, e.g. 'A or B' and 'B or A' could create equal() Queries. And if the behaviour of the query parser is modified by params - rather than local params - then the cached results may be wrong if they're only keyed by the query string, right? Or do you mean only use the String in the case of excessivly large Query objects? In which case the first point is a reasonable compromise. I'm not sure about the second though. {quote}BTW This issue should probably be a Lucene JIRA issue but let's see where we go with this thread further. {quote} I don't think I can change it now myself. But maybe there should be separate issue for doing some nice and generic to deal with caching (large) Query objects which I imagine might be a broader effort, and this/another for resolving the immediate issue with FuzzyQuery? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17094304#comment-17094304 ] Alan Woodward commented on SOLR-14428: -- There's some prior art on this one: https://issues.apache.org/jira/browse/LUCENE-8855 > Or maybe we just accept that sometimes, queries can be big I think we have to go with this, really - for example, TermInSetQuery can get very big. FuzzyQuery is probably particularly prone to this because it builds multiple Levenshtein Automata, which are sizeable in and of themselves. We've generally dealt with this in the past by saying that large queries shouldn't be cacheable, but I think that's the wrong trade-off in this situation - as a heavy query that can take a lot of resources to compute, FuzzyQuery is a really good candidate for caching. I like the idea of having a separate cache key for large queries; we can have a default implementation on Query that just returns `this`, and then heavy queries can have a specialised class that they return. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093979#comment-17093979 ] David Smiley commented on SOLR-14428: - BTW This issue should probably be a Lucene JIRA issue but let's see where we go with this thread further. This is a tricky issue with no easy solutions. I think I lean towards a SoftReference providing the best trade-off. Query objects ought to be light-weight in general, so in that respect, the change to FuzzyQuery is disappointing. Maybe the choice of FuzzyQuery computing the automaton up-front should be an option? It could be a boolean; that's the simplest thing. Or imagine a Function supplied in the constructor that is potentially memo-izable. Someone could even plug in a cache. This is all a bit complicated and the query parser is in charge of the choice. No parsers have an option for this hypothetical choice yet. Or maybe we just accept that sometimes, queries can be big, and thus Lucene users like Solr just have to deal with it. If a Query is beyond the size of some threshold, maybe we don't cache it by default unless the user explicitly chooses to. That's a nice generic solution. The QueryResultKey could cache by the original string input instead of the Query object, thus it wouldn't be affected. This would short-circuit query parsing and be a performance benefit as well? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093864#comment-17093864 ] Mike Drob commented on SOLR-14428: -- That's a pretty good approach, I'm glad you thought of it! The structure of your changes and the reference usage looks correct to me. I probably would leave termLength as final and just worry about the CompiledAutomaton array. I think SoftReferences might be better - the problem with WeakReferences will be that the Automata get rebuilt on every pass through a query visitor. [~romseygeek] WDYT? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17093772#comment-17093772 ] Colvin Cowie commented on SOLR-14428: - That's fair. I think that having a distinction between the (Query) key used for cache lookups and the Query object used for actual querying is definitely a _good thing_. But I don't know enough about the internals of it all, so happy to hear what other people think. The one thing I have tried doing just now is using a {{SoftReference}} and a {{WeakReference}} on the {{automata}} so that it can be GCed and rebuilt if it's gone. With SoftReferences the OOMEs stop, but the heap usage stays higher, and with WeakReferences the heap usage looks pretty much the same as in 8.3.1. I'm not sure though whether it provides any real improvement on how things were before the changes made in LUCENE-9068, when the automata was just built on demand. Plus I have never actually used SoftReference or WeakReference before, so can't claim to truely know the ins and outs of them. [^SOLR-14428-WeakReferences.patch] > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, > image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091912#comment-17091912 ] Mike Drob commented on SOLR-14428: -- I'm going to pause on the path with {{interface Minimizable}} because after trying to start on it, it ends up being a huge mess of generic types and covariant types and I'm not sure it actually makes anything better or more readable. Let's wait to get some more feedback from the other folks involved? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091900#comment-17091900 ] Mike Drob commented on SOLR-14428: -- So another approach here would be to intercept the keys in CaffeineCache.put and if they implement some marker interface then we can call {{toCacheKey}}. If it's more generic in this way, then it probably makes sense to do {code} public interface Minimizable { /** return an object equal to this object, but possibly with a minimized memory footprint */ Minimizable minimize(); } {code} Then we don't have to make it invasive on {{Query}} and can stick to the places where it makes a difference. Which would be FuzzyQuery, took a glance and maybe AutomataQuery, and then probably anything that wraps around other queries. BooleanQuery? Are there others, I haven't looked yet. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091882#comment-17091882 ] Colvin Cowie commented on SOLR-14428: - I did the same thing with {{filters}} and that's ok... Now I'm thinking, since some Queries must hold references to other queries when they are constructed, toCacheKey will need to be implemented on those in order to retrieve the cache key variants of the Queries they are constructed with. e.g. the clause sets in BooleanQuery {noformat} private final Map> clauseSets; // used for equals/hashcode{noformat} And any third-party Query implementations that hold on to references to other Queries would need updating too. Just seems like there might be quite a lot of fallout from this. Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the query SolrCache, so need to update the puts to it too. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091869#comment-17091869 ] Colvin Cowie commented on SOLR-14428: - Oh yes, you're absolutely right > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091862#comment-17091862 ] Mike Drob commented on SOLR-14428: -- Good catch. I think we might also need to do a similar operation on {{filters}} now that I'm looking at it more closely. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091843#comment-17091843 ] Colvin Cowie commented on SOLR-14428: - Ah, the statistics plugin uses RamUsageQueryVisitor, which triggers the building of the automata > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091838#comment-17091838 ] Colvin Cowie commented on SOLR-14428: - Thanks, I patched it in. The heap usage looks a lot closer to how it was before on my stress test. !image-2020-04-24-20-09-31-179.png! I'm surprised to see the cache statistics are still reporting high ramBytesUsed though, e.g. a search for "field_s:e41848af85d24ac197c71db6888e17bc~2" still results in a ramBytesUsed of 648863 {code:java} this.ramBytesUsed = BASE_RAM_BYTES + term.ramBytesUsed(); {code} in the new constructor looks like it should do the right thing > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091754#comment-17091754 ] Mike Drob commented on SOLR-14428: -- I've created https://github.com/apache/lucene-solr/pull/1455 for this, but that I haven't really tested it yet. Wanted to get feedback on the approach before spending too much time on it. [~ab] WDYT? [~cjcowie] do you think this is something you're able to try in your environment as well? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091714#comment-17091714 ] Mike Drob commented on SOLR-14428: -- bq. Maybe there is an elegant way to store a stripped down FuzzyQuery in the cache? I think this runs into problems with auto warming based on the contents of the cache when we open a new searcher. Possibly need a way to rebuild it afterwards, which feels like we're going backwards from LUCENE-9068. Related, [~romseygeek] - should [BASE_RAM_BYTES|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/FuzzyQuery.java#L58] use {{FuzzyQuery.class}} instead of {{AutomatonQuery.class}}? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090877#comment-17090877 ] Mike Drob commented on SOLR-14428: -- Adding some notes here as I've been diving through this... The queryResultsCache is built in [SolrIndexSearcher|https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java#L278], and uses a {{QueryResultKey}}, which stores the full {{Query}} object, which in this case means the FuzzyQuery with the automata already built. However, we don't really need to store the automata that we built, since they aren't used for the equality comparison. Maybe there is an elegant way to store a stripped down FuzzyQuery in the cache? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090721#comment-17090721 ] Mike Drob commented on SOLR-14428: -- If there are no terms in the index, then the fuzzy query should be collapsing pretty quickly and there would be no reason for it to take up so much memory. Do we only do that at query processing time now? I thought we would be doing that as aggressively as possible. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090609#comment-17090609 ] Colvin Cowie commented on SOLR-14428: - Thanks, we'll just stick on 8.3.1 for the time being. Though I will look at moving to CaffeineCache in general since I see the other caches are being removed anyway. Cheers > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090564#comment-17090564 ] Andrzej Bialecki commented on SOLR-14428: - [~cjcowie] as a temporary workaround you can switch to using {{CaffeineCache}} and see if it behaves differently. Also, configuring the queryResultCache using {{maxRamMB}} instead of {{maxSize}} should cap the max RAM usage. Of course, these are just stopgaps, they don't address the underlying issue. I'll look into this in a few days (need to wrap up other stuff). > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090534#comment-17090534 ] Alan Woodward commented on SOLR-14428: -- Hi [~cjcowie], thanks for opening this and for the thorough investigation! I'm afraid I don't really know much about the Solr query caches choose which queries to cache - [~ab] might have a better idea of how to fix this? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090482#comment-17090482 ] Colvin Cowie commented on SOLR-14428: - Hi [~romseygeek], what are your thoughts on this? Thanks > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org