[GitHub] [lucene-solr] tflobbe commented on pull request #1456: SOLR-13289: Support for BlockMax WAND
tflobbe commented on pull request #1456: URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619253348 ah, yes, that makes sense This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-8278: Comment: was deleted (was: Portal situs judi online terlengkap https://www.caradaftarsbobetterbaru.com/) > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Assignee: Steven Rowe >Priority: Minor > Fix For: 7.4, 8.0 > > Attachments: LUCENE-8278.patch, patched.png, unpatched.png > > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091912#comment-17091912 ] Mike Drob commented on SOLR-14428: -- I'm going to pause on the path with {{interface Minimizable}} because after trying to start on it, it ends up being a huge mess of generic types and covariant types and I'm not sure it actually makes anything better or more readable. Let's wait to get some more feedback from the other folks involved? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #1456: SOLR-13289: Support for BlockMax WAND
jpountz commented on pull request #1456: URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619245099 > Any number > Integer.MAX_VALUE means that each shard will reply with accurate count, right? so the sum of them all will also be accurate. Yes indeed. The perspective I had was rather than since this parameter is a number of hits, as a user I would expect any legal number of hits to also be a legal value for this parameter. So it would be a shame to fail or worse silently cast to an int if a user passes 3B as a value? I believe it would be fine to keep it an integer internally but carefully accepting longs when parsing URL parameters and rounding to the nearest integer, ie. all value greater than Integer.MAX_VALUE would get converted to Integer.MAX_VALUE? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091900#comment-17091900 ] Mike Drob commented on SOLR-14428: -- So another approach here would be to intercept the keys in CaffeineCache.put and if they implement some marker interface then we can call {{toCacheKey}}. If it's more generic in this way, then it probably makes sense to do {code} public interface Minimizable { /** return an object equal to this object, but possibly with a minimized memory footprint */ Minimizable minimize(); } {code} Then we don't have to make it invasive on {{Query}} and can stick to the places where it makes a difference. Which would be FuzzyQuery, took a glance and maybe AutomataQuery, and then probably anything that wraps around other queries. BooleanQuery? Are there others, I haven't looked yet. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on pull request #1456: SOLR-13289: Support for BlockMax WAND
tflobbe commented on pull request #1456: URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619239862 > Does minExactHits need to accept longs since the number of hits across shards might exceed Integer.MAX_VALUE? Good point, I didn't think about the distributed aspect yet. However, does it make sense to you? Any number > Integer.MAX_VALUE means that each shard will reply with accurate count, right? so the sum of them all will also be accurate. > Also I see you chose to make BMW an opt-in, I hope we'll find ways to enable it by default eventually. :) Yes, I think this should be default in master. I was planning in making it in another PR to make it explicit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091882#comment-17091882 ] Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 8:52 PM: --- I did the same thing with {{filters}} and that's ok... Now I'm thinking, since some Queries must hold references to other queries when they are constructed, toCacheKey will need to be implemented on those in order to retrieve the cache key variants of the Queries they are constructed with. e.g. the clause sets in BooleanQuery {noformat} private final Map> clauseSets; // used for equals/hashcode{noformat} And any third-party Query implementations that hold on to references to other Queries would need updating too. Just seems like there might be quite a lot of fallout from this. Though it might still be best in the long run. Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the query SolrCache, so need to update the puts to it too. Calling toCacheKey on all the puts to the {{filterCache}} in SolrIndexSearcher has got the memory usage sorted out again with me firing the Fuzzy Queries as fq was (Author: cjcowie): I did the same thing with {{filters}} and that's ok... Now I'm thinking, since some Queries must hold references to other queries when they are constructed, toCacheKey will need to be implemented on those in order to retrieve the cache key variants of the Queries they are constructed with. e.g. the clause sets in BooleanQuery {noformat} private final Map> clauseSets; // used for equals/hashcode{noformat} And any third-party Query implementations that hold on to references to other Queries would need updating too. Just seems like there might be quite a lot of fallout from this. Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the query SolrCache, so need to update the puts to it too. Calling toCacheKey on all the puts to the {{filterCache}} in SolrIndexSearcher has got the memory usage sorted out again with me firing the Fuzzy Queries as fq > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091882#comment-17091882 ] Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 8:45 PM: --- I did the same thing with {{filters}} and that's ok... Now I'm thinking, since some Queries must hold references to other queries when they are constructed, toCacheKey will need to be implemented on those in order to retrieve the cache key variants of the Queries they are constructed with. e.g. the clause sets in BooleanQuery {noformat} private final Map> clauseSets; // used for equals/hashcode{noformat} And any third-party Query implementations that hold on to references to other Queries would need updating too. Just seems like there might be quite a lot of fallout from this. Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the query SolrCache, so need to update the puts to it too. Calling toCacheKey on all the puts to the {{filterCache}} in SolrIndexSearcher has got the memory usage sorted out again with me firing the Fuzzy Queries as fq was (Author: cjcowie): I did the same thing with {{filters}} and that's ok... Now I'm thinking, since some Queries must hold references to other queries when they are constructed, toCacheKey will need to be implemented on those in order to retrieve the cache key variants of the Queries they are constructed with. e.g. the clause sets in BooleanQuery {noformat} private final Map> clauseSets; // used for equals/hashcode{noformat} And any third-party Query implementations that hold on to references to other Queries would need updating too. Just seems like there might be quite a lot of fallout from this. Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the query SolrCache, so need to update the puts to it too. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091882#comment-17091882 ] Colvin Cowie commented on SOLR-14428: - I did the same thing with {{filters}} and that's ok... Now I'm thinking, since some Queries must hold references to other queries when they are constructed, toCacheKey will need to be implemented on those in order to retrieve the cache key variants of the Queries they are constructed with. e.g. the clause sets in BooleanQuery {noformat} private final Map> clauseSets; // used for equals/hashcode{noformat} And any third-party Query implementations that hold on to references to other Queries would need updating too. Just seems like there might be quite a lot of fallout from this. Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the query SolrCache, so need to update the puts to it too. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14437) Remove/refactor "ApiSupport" interface? (for V2 API)
David Smiley created SOLR-14437: --- Summary: Remove/refactor "ApiSupport" interface? (for V2 API) Key: SOLR-14437 URL: https://issues.apache.org/jira/browse/SOLR-14437 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: v2 API Affects Versions: master (9.0) Reporter: David Smiley ApiSupport.java is an interface relating to the V2 API that is implemented by all request handlers, both those at a core level and others. It's essentially this: (comments removed) {code:java} public interface ApiSupport { Collection getApis(); default Boolean registerV1() { return Boolean.TRUE; } default Boolean registerV2() { return Boolean.FALSE; } } {code} Firstly, let's always assume that the handler will always be registered in V2. All implementations I've seen explicitly return true here; maybe I'm missing something though. Secondly, getApis() seems problematic for the ability to lazily load request handlers. Can we assume, at least for core level request handlers, that there is exactly one API and where necessary rely on the "spec" JSON definition -- see org.apache.solr.api.ApiBag#registerLazy ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091869#comment-17091869 ] Colvin Cowie commented on SOLR-14428: - Oh yes, you're absolutely right > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14436) When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym
[ https://issues.apache.org/jira/browse/SOLR-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091868#comment-17091868 ] Atin commented on SOLR-14436: - Hi, Thank you for your response. I had put it on the user-list. However I received no resolution, where one user responded that a multi-word synonym cannot be tokenized further. Now how can it de decided of it is a code issue? As nobody on user-list had a solution. > When using Synonym Graph Filter, Solr does not tokenize query-string if it > has multi-word synonym > - > > Key: SOLR-14436 > URL: https://issues.apache.org/jira/browse/SOLR-14436 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query, Schema and Analysis >Affects Versions: 8.3.1 >Reporter: Atin >Priority: Major > Attachments: Scenario1.png, Scenario2.png > > > > While using Synonym Graph Filter, if the query string contains a multi-word > synonym, it considers that multi-word synonym as a single term and does not > tokenize it further. > > For example- *soap powder* is a search *query* which is also a _multi-word > synonym_ in the synonym file as- > {quote}s(104254535,1,'soap powder',n,1,1). > s(104254535,2,'built-soap powder',n,1,0). > s(104254535,3,'washing powder',n,1,0).{quote} > > There are 2 documents having _soap_(2) and _powder_(1) altogether. > doc1: "Sunny Berlin breast tumors soap powder" > doc2: "She is in soap Berlin today" > > > +Scenario 1 (screenshot attached)+ > *without* Synonym Graph Filter => 2 docs returned , as it checks for > *"soap"* and *"powder"* separately. > > +Scenario 2 (screenshot attached)+ > *with* Synonym Graph Filter => only 1 doc returned, but 2 were expected. Here > only *"soap powder"* is being checked and it is not tokenized into "soap" and > "powder" and searched further. > Is it possible to expand query string - *soap powder* as: > Synonym(soap powder) + Synonym(soap) + Synonym(powder) > > Thank You. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-11960) Add collection level properties
[ https://issues.apache.org/jira/browse/SOLR-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomas Eduardo Fernandez Lobbe updated SOLR-11960: - Comment: was deleted (was: Cara Daftar Sbobet https://www.caradaftarsbobetterbaru.com/) > Add collection level properties > --- > > Key: SOLR-11960 > URL: https://issues.apache.org/jira/browse/SOLR-11960 > Project: Solr > Issue Type: New Feature >Reporter: Peter Rusko >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: 7.3, 8.0 > > Attachments: SOLR-11960.patch, SOLR-11960.patch, SOLR-11960.patch, > SOLR-11960.patch, SOLR-11960.patch, SOLR-11960_2.patch > > > Solr has cluster properties, but no easy and extendable way of defining > properties that affect a single collection. Collection properties could be > stored in a single zookeeper node per collection, making it possible to > trigger zookeeper watchers for only those Solr nodes that have cores of that > collection. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14436) When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym
[ https://issues.apache.org/jira/browse/SOLR-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14436. --- Resolution: Invalid This is more a usage question than a bug/code issue, it would be best to raise this question the user's list. See: http://lucene.apache.org/solr/community.html#mailing-lists-irc there are links to both Lucene and Solr mailing lists there. A _lot_ more people will see your question on that list and may be able to help more quickly. If it's determined that this really is a code issue or enhancement to Lucene or Solr and not a configuration/usage problem, we can raise a new JIRA or reopen this one. > When using Synonym Graph Filter, Solr does not tokenize query-string if it > has multi-word synonym > - > > Key: SOLR-14436 > URL: https://issues.apache.org/jira/browse/SOLR-14436 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: query, Schema and Analysis >Affects Versions: 8.3.1 >Reporter: Atin >Priority: Major > Attachments: Scenario1.png, Scenario2.png > > > > While using Synonym Graph Filter, if the query string contains a multi-word > synonym, it considers that multi-word synonym as a single term and does not > tokenize it further. > > For example- *soap powder* is a search *query* which is also a _multi-word > synonym_ in the synonym file as- > {quote}s(104254535,1,'soap powder',n,1,1). > s(104254535,2,'built-soap powder',n,1,0). > s(104254535,3,'washing powder',n,1,0).{quote} > > There are 2 documents having _soap_(2) and _powder_(1) altogether. > doc1: "Sunny Berlin breast tumors soap powder" > doc2: "She is in soap Berlin today" > > > +Scenario 1 (screenshot attached)+ > *without* Synonym Graph Filter => 2 docs returned , as it checks for > *"soap"* and *"powder"* separately. > > +Scenario 2 (screenshot attached)+ > *with* Synonym Graph Filter => only 1 doc returned, but 2 were expected. Here > only *"soap powder"* is being checked and it is not tokenized into "soap" and > "powder" and searched further. > Is it possible to expand query string - *soap powder* as: > Synonym(soap powder) + Synonym(soap) + Synonym(powder) > > Thank You. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091862#comment-17091862 ] Mike Drob commented on SOLR-14428: -- Good catch. I think we might also need to do a similar operation on {{filters}} now that I'm looking at it more closely. > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type
[ https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091861#comment-17091861 ] agen hoqbet commented on LUCENE-8278: - Portal situs judi online terlengkap https://www.caradaftarsbobetterbaru.com/ > UAX29URLEmailTokenizer is not detecting some tokens as URL type > --- > > Key: LUCENE-8278 > URL: https://issues.apache.org/jira/browse/LUCENE-8278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Junte Zhang >Assignee: Steven Rowe >Priority: Minor > Fix For: 7.4, 8.0 > > Attachments: LUCENE-8278.patch, patched.png, unpatched.png > > > We are using the UAX29URLEmailTokenizer so we can use the token types in our > plugins. > However, I noticed that the tokenizer is not detecting certain URLs as > but instead. > Examples that are not working: > * example.com is > * example.net is > But: > * https://example.com is > * as is https://example.net > Examples that work: > * example.ch is > * example.co.uk is > * example.nl is > I have checked this JIRA, and could not find an issue. I have tested this on > Lucene (Solr) 6.4.1 and 7.3. > Could someone confirm my findings and advise what I could do to (help) > resolve this issue? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091843#comment-17091843 ] Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 7:47 PM: --- -Ah, I assume the statistics plugin uses RamUsageQueryVisitor, which triggers the building of the automata- It's because QueryResultKey gets the ramBytesUsed from the original query still {code:java} ramBytesUsed = BASE_RAM_BYTES_USED + ramSfields + RamUsageEstimator.sizeOfObject(query, RamUsageEstimator.QUERY_DEFAULT_RAM_BYTES_USED) + RamUsageEstimator.sizeOfObject(filters, RamUsageEstimator.QUERY_DEFAULT_RAM_BYTES_USED); {code} was (Author: cjcowie): Ah, I assume the statistics plugin uses RamUsageQueryVisitor, which triggers the building of the automata > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11960) Add collection level properties
[ https://issues.apache.org/jira/browse/SOLR-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091860#comment-17091860 ] agen hoqbet commented on SOLR-11960: Cara Daftar Sbobet https://www.caradaftarsbobetterbaru.com/ > Add collection level properties > --- > > Key: SOLR-11960 > URL: https://issues.apache.org/jira/browse/SOLR-11960 > Project: Solr > Issue Type: New Feature >Reporter: Peter Rusko >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Blocker > Fix For: 7.3, 8.0 > > Attachments: SOLR-11960.patch, SOLR-11960.patch, SOLR-11960.patch, > SOLR-11960.patch, SOLR-11960.patch, SOLR-11960_2.patch > > > Solr has cluster properties, but no easy and extendable way of defining > properties that affect a single collection. Collection properties could be > stored in a single zookeeper node per collection, making it possible to > trigger zookeeper watchers for only those Solr nodes that have cores of that > collection. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091843#comment-17091843 ] Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 7:43 PM: --- Ah, I assume the statistics plugin uses RamUsageQueryVisitor, which triggers the building of the automata was (Author: cjcowie): Ah, the statistics plugin uses RamUsageQueryVisitor, which triggers the building of the automata > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on pull request #1456: SOLR-13289: Support for BlockMax WAND
jpountz commented on pull request #1456: URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619206053 Does minExactHits need to accept longs since the number of hits across shards might exceed Integer.MAX_VALUE? Also I see you chose to make BMW an opt-in, I hope we'll find ways to enable it by default eventually. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide
[ https://issues.apache.org/jira/browse/SOLR-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091844#comment-17091844 ] Lucene/Solr QA commented on SOLR-14435: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || || || || || {color:brown} master Compile Tests {color} || || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 0m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 0m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate ref guide {color} | {color:green} 0m 3s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:black}{color} | {color:black} {color} | {color:black} 1m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-14435 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/13001059/SOLR-14435-01.patch | | Optional Tests | ratsources validatesourcepatterns validaterefguide | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / ecc98e8 | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | modules | C: solr/solr-ref-guide U: solr/solr-ref-guide | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/742/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > createNodeSet and createNodeSet.shuffle parameters missing from Collection > Restore RefGuide > --- > > Key: SOLR-14435 > URL: https://issues.apache.org/jira/browse/SOLR-14435 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Andras Salamon >Priority: Minor > Attachments: SOLR-14435-01.patch > > > Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are > supported by the Collection RESTORE command (I've tested it), they are > missing from the documentation: > [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091843#comment-17091843 ] Colvin Cowie commented on SOLR-14428: - Ah, the statistics plugin uses RamUsageQueryVisitor, which triggers the building of the automata > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091838#comment-17091838 ] Colvin Cowie commented on SOLR-14428: - Thanks, I patched it in. The heap usage looks a lot closer to how it was before on my stress test. !image-2020-04-24-20-09-31-179.png! I'm surprised to see the cache statistics are still reporting high ramBytesUsed though, e.g. a search for "field_s:e41848af85d24ac197c71db6888e17bc~2" still results in a ramBytesUsed of 648863 {code:java} this.ramBytesUsed = BASE_RAM_BYTES + term.ramBytesUsed(); {code} in the new constructor looks like it should do the right thing > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colvin Cowie updated SOLR-14428: Attachment: image-2020-04-24-20-09-31-179.png > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, > screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe opened a new pull request #1456: SOLR-13289: Support for BlockMax WAND
tflobbe opened a new pull request #1456: URL: https://github.com/apache/lucene-solr/pull/1456 This is still very much WIP. Some SolrJ tests are failing This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13289) Support for BlockMax WAND
[ https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091835#comment-17091835 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-13289: -- Here is my current progress: https://github.com/apache/lucene-solr/pull/1456 > Support for BlockMax WAND > - > > Key: SOLR-13289 > URL: https://issues.apache.org/jira/browse/SOLR-13289 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13289.patch, SOLR-13289.patch > > Time Spent: 10m > Remaining Estimate: 0h > > LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to > expose this via Solr. When enabled, the numFound returned will not be exact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14436) When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym
Atin created SOLR-14436: --- Summary: When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym Key: SOLR-14436 URL: https://issues.apache.org/jira/browse/SOLR-14436 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: query, Schema and Analysis Affects Versions: 8.3.1 Reporter: Atin Attachments: Scenario1.png, Scenario2.png While using Synonym Graph Filter, if the query string contains a multi-word synonym, it considers that multi-word synonym as a single term and does not tokenize it further. For example- *soap powder* is a search *query* which is also a _multi-word synonym_ in the synonym file as- {quote}s(104254535,1,'soap powder',n,1,1). s(104254535,2,'built-soap powder',n,1,0). s(104254535,3,'washing powder',n,1,0).{quote} There are 2 documents having _soap_(2) and _powder_(1) altogether. doc1: "Sunny Berlin breast tumors soap powder" doc2: "She is in soap Berlin today" +Scenario 1 (screenshot attached)+ *without* Synonym Graph Filter => 2 docs returned , as it checks for *"soap"* and *"powder"* separately. +Scenario 2 (screenshot attached)+ *with* Synonym Graph Filter => only 1 doc returned, but 2 were expected. Here only *"soap powder"* is being checked and it is not tokenized into "soap" and "powder" and searched further. Is it possible to expand query string - *soap powder* as: Synonym(soap powder) + Synonym(soap) + Synonym(powder) Thank You. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13289) Support for BlockMax WAND
[ https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091785#comment-17091785 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-13289: -- Thanks Ishan, I've made some progress to. I can look at your changes and merge what's needed > Support for BlockMax WAND > - > > Key: SOLR-13289 > URL: https://issues.apache.org/jira/browse/SOLR-13289 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya >Assignee: Ishan Chattopadhyaya >Priority: Major > Attachments: SOLR-13289.patch, SOLR-13289.patch > > > LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to > expose this via Solr. When enabled, the numFound returned will not be exact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091784#comment-17091784 ] ASF subversion and git services commented on LUCENE-7788: - Commit ecc98e8698a3ce8efa51712686697c0f33afab4d in lucene-solr's branch refs/heads/master from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ecc98e8 ] LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, > gradle_only.patch > > Time Spent: 50m > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects
[ https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091783#comment-17091783 ] ASF subversion and git services commented on LUCENE-7788: - Commit 83f090877b0590a1d99c79cfeec076dfed963076 in lucene-solr's branch refs/heads/branch_8x from Erick Erickson [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=83f0908 ] LUCENE-7788: fail precommit on unparameterised log messages and examine for wasted work/objects > fail precommit on unparameterised log messages and examine for wasted > work/objects > -- > > Key: LUCENE-7788 > URL: https://issues.apache.org/jira/browse/LUCENE-7788 > Project: Lucene - Core > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Erick Erickson >Priority: Minor > Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, > gradle_only.patch > > Time Spent: 50m > Remaining Estimate: 0h > > SOLR-10415 would be removing existing unparameterised log.trace messages use > and once that is in place then this ticket's one-line change would be for > 'ant precommit' to reject any future unparameterised log.trace message use. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13886) HDFSSyncSliceTest and SyncSliceTest started failing frequently
[ https://issues.apache.org/jira/browse/SOLR-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091773#comment-17091773 ] Kevin Risden commented on SOLR-13886: - Thanks [~erickerickson] yea I've been checking about once a day too and haven't seen any failures. It also fixed local Jenkins run failures I had seen for this test. > HDFSSyncSliceTest and SyncSliceTest started failing frequently > -- > > Key: SOLR-13886 > URL: https://issues.apache.org/jira/browse/SOLR-13886 > Project: Solr > Issue Type: Bug > Components: Tests >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Kevin Risden >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-13886.patch, SOLR-13886_jenkins_log.txt.gz > > > While I can see some failures of this test in the past, they weren't frequent > and were usually things like port bindings (maybe SOLR-13871) or timeouts. > I've started this failure in Jenkins (and locally) frequently: > {noformat} > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/5410/ > Java: 64bit/jdk-13 -XX:-UseCompressedOops -XX:+UseParallelGC > 2 tests failed. > FAILED: org.apache.solr.cloud.SyncSliceTest.test > Error Message: > expected:<5> but was:<4> > Stack Trace: > java.lang.AssertionError: expected:<5> but was:<4> > at > __randomizedtesting.SeedInfo.seed([F8E3B768E16E848D:70B788B24F92E975]:0) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at org.apache.solr.cloud.SyncSliceTest.test(SyncSliceTest.java:150) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:567) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) > at > org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1082) > at > org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1054) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomize
[jira] [Commented] (SOLR-13886) HDFSSyncSliceTest and SyncSliceTest started failing frequently
[ https://issues.apache.org/jira/browse/SOLR-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091766#comment-17091766 ] Erick Erickson commented on SOLR-13886: --- BTW, I checked Hoss' rollup yesterday and today for the previous 24 hours (48 hours total) and there are no failures reported for either of these, whereas there are for the last 7 days. So that's encouraging. Missed looking on Wednesday The BadApple report next Monday will still pick up some runs from before this was checked in, so don't panic if you see that. > HDFSSyncSliceTest and SyncSliceTest started failing frequently > -- > > Key: SOLR-13886 > URL: https://issues.apache.org/jira/browse/SOLR-13886 > Project: Solr > Issue Type: Bug > Components: Tests >Reporter: Tomas Eduardo Fernandez Lobbe >Assignee: Kevin Risden >Priority: Major > Fix For: 8.6 > > Attachments: SOLR-13886.patch, SOLR-13886_jenkins_log.txt.gz > > > While I can see some failures of this test in the past, they weren't frequent > and were usually things like port bindings (maybe SOLR-13871) or timeouts. > I've started this failure in Jenkins (and locally) frequently: > {noformat} > Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/5410/ > Java: 64bit/jdk-13 -XX:-UseCompressedOops -XX:+UseParallelGC > 2 tests failed. > FAILED: org.apache.solr.cloud.SyncSliceTest.test > Error Message: > expected:<5> but was:<4> > Stack Trace: > java.lang.AssertionError: expected:<5> but was:<4> > at > __randomizedtesting.SeedInfo.seed([F8E3B768E16E848D:70B788B24F92E975]:0) > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:834) > at org.junit.Assert.assertEquals(Assert.java:645) > at org.junit.Assert.assertEquals(Assert.java:631) > at org.apache.solr.cloud.SyncSliceTest.test(SyncSliceTest.java:150) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:567) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988) > at > org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1082) > at > org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1054) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817) > at > com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894) > at > com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesR
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091754#comment-17091754 ] Mike Drob commented on SOLR-14428: -- I've created https://github.com/apache/lucene-solr/pull/1455 for this, but that I haven't really tested it yet. Wanted to get feedback on the approach before spending too much time on it. [~ab] WDYT? [~cjcowie] do you think this is something you're able to try in your environment as well? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > Time Spent: 10m > Remaining Estimate: 0h > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #1455: SOLR-14428 minimize memory footprint of fuzzy query
madrob opened a new pull request #1455: URL: https://github.com/apache/lucene-solr/pull/1455 https://issues.apache.org/jira/browse/SOLR-14428 Make the automata of a fuzzy query mutable so that we don't always have to store them. However, there will be cases when we need to recompute them anyway. Will add unit tests if this approach makes sense. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5
[ https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091714#comment-17091714 ] Mike Drob commented on SOLR-14428: -- bq. Maybe there is an elegant way to store a stripped down FuzzyQuery in the cache? I think this runs into problems with auto warming based on the contents of the cache when we open a new searcher. Possibly need a way to rebuild it afterwards, which feels like we're going backwards from LUCENE-9068. Related, [~romseygeek] - should [BASE_RAM_BYTES|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/FuzzyQuery.java#L58] use {{FuzzyQuery.class}} instead of {{AutomatonQuery.class}}? > FuzzyQuery has severe memory usage in 8.5 > - > > Key: SOLR-14428 > URL: https://issues.apache.org/jira/browse/SOLR-14428 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Affects Versions: 8.5, 8.5.1 >Reporter: Colvin Cowie >Assignee: Andrzej Bialecki >Priority: Major > Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, > screenshot-2.png, screenshot-3.png, screenshot-4.png > > > I sent this to the mailing list > I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors > while running our normal tests. After profiling it was clear that the > majority of the heap was allocated through FuzzyQuery. > LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the > FuzzyQuery's constructor. > I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries > from random UUID strings for 5 minutes > {code} > FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2" > {code} > When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while > the memory usage has increased drastically on 8.5.0 and 8.5.1. > Comparison of heap usage while running the attached test against Solr 8.3.1 > and 8.5.1 with a single (empty) shard and 4GB heap: > !image-2020-04-23-09-18-06-070.png! > And with 4 shards on 8.4.1 and 8.5.0: > !screenshot-2.png! > I'm guessing that the memory might be being leaked if the FuzzyQuery objects > are referenced from the cache, while the FuzzyTermsEnum would not have been. > Query Result Cache on 8.5.1: > !screenshot-3.png! > ~316mb in the cache > QRC on 8.3.1 > !screenshot-4.png! > <1mb > With an empty cache, running this query > _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory > allocation > {noformat} > 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed: 1520 > 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855 > {noformat} > ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post
[ https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091696#comment-17091696 ] Christian Beikov commented on SOLR-12798: - I'm having a URI too long issue when doing a {{ContentStreamUpdateRequest.}} I guess my params are too big(contain metadata). What do I need to do to switch to multipart POST? > Structural changes in SolrJ since version 7.0.0 have effectively disabled > multipart post > > > Key: SOLR-12798 > URL: https://issues.apache.org/jira/browse/SOLR-12798 > Project: Solr > Issue Type: Improvement > Components: SolrJ >Affects Versions: 7.4 >Reporter: Karl Wright >Assignee: Karl Wright >Priority: Major > Attachments: HOT Balloon Trip_Ultra HD.jpg, > SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, > SOLR-12798-workaround.patch, SOLR-12798.patch, SOLR-12798.patch, > SOLR-12798.patch, no params in url.png, solr-update-request.txt > > > Project ManifoldCF uses SolrJ to post documents to Solr. When upgrading from > SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to > SolrJ's HttpSolrClient class that seemingly disable any use of multipart > post. This is critical because ManifoldCF's documents often contain metadata > in excess of 4K that therefore cannot be stuffed into a URL. > The changes in question seem to have been performed by Paul Noble on > 10/31/2017, with the introduction of the RequestWriter mechanism. Basically, > if a request has a RequestWriter, it is used exclusively to write the > request, and that overrides the stream mechanism completely. I haven't > chased it back to a specific ticket. > ManifoldCF's usage of SolrJ involves the creation of > ContentStreamUpdateRequests for all posts meant for Solr Cell, and the > creation of UpdateRequests for posts not meant for Solr Cell (as well as for > delete and commit requests). For our release cycle that is taking place > right now, we're shipping a modified version of HttpSolrClient that ignores > the RequestWriter when dealing with ContentStreamUpdateRequests. We > apparently cannot use multipart for all requests because on the Solr side we > get "pfountz Should not get here!" errors on the Solr side when we do, which > generate HTTP error code 500 responses. That should not happen either, in my > opinion. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091684#comment-17091684 ] Mayya Sharipova commented on LUCENE-9322: - > It is implemented by enum with {{distance()}} function. Also, I think it >would be good to persist (in the codec) which distance metric we use for the >field. May be for now, it is worth to keep the API simple and use euclidean distance. Both ann approaches we would like to pursue: HNSW and Clustering based approach use euclidean distance. > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9322) Discussing a unified vectors format API
[ https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091684#comment-17091684 ] Mayya Sharipova edited comment on LUCENE-9322 at 4/24/20, 3:57 PM: --- > It is implemented by enum with {{distance()}} function. Also, I think it >would be good to persist (in the codec) which distance metric we use for the >field. May be for now, it is worth to keep the API simple and use euclidean distance. Both ann approaches we would like to pursue: HNSW and Clustering based approach use euclidean distance. was (Author: mayyas): > It is implemented by enum with {{distance()}} function. Also, I think it >would be good to persist (in the codec) which distance metric we use for the >field. May be for now, it is worth to keep the API simple and use euclidean distance. Both ann approaches we would like to pursue: HNSW and Clustering based approach use euclidean distance. > Discussing a unified vectors format API > --- > > Key: LUCENE-9322 > URL: https://issues.apache.org/jira/browse/LUCENE-9322 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Julie Tibshirani >Priority: Major > > Two different approximate nearest neighbor approaches are currently being > developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse > quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to > handle vectors. In LUCENE-9136 we discussed the possibility of a unified API > that could support both approaches. The two ANN strategies give different > trade-offs in terms of speed, memory, and complexity, and it’s likely that > we’ll want to support both. Vector search is also an active research area, > and it would be great to be able to prototype and incorporate new approaches > without introducing more formats. > To me it seems like a good time to begin discussing a unified API. The > prototype for coarse quantization > ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit > soon (this depends on everyone's feedback of course). The approach is simple > and shows solid search performance, as seen > [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326]. > I think this API discussion is an important step in moving that > implementation forward. > The goals of the API would be > # Support for storing and retrieving individual float vectors. > # Support for approximate nearest neighbor search -- given a query vector, > return the indexed vectors that are closest to it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13839) MaxScore is returned as NAN when group.query doesn't match any docs
[ https://issues.apache.org/jira/browse/SOLR-13839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091675#comment-17091675 ] Mike Drob commented on SOLR-13839: -- There are a couple more Float.NaN in that method later, do we need to take care of those too? I'm not sure what code paths will lead to each one. > MaxScore is returned as NAN when group.query doesn't match any docs > --- > > Key: SOLR-13839 > URL: https://issues.apache.org/jira/browse/SOLR-13839 > Project: Solr > Issue Type: Bug > Components: search >Reporter: Munendra S N >Priority: Minor > Attachments: SOLR-13839.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When the main query matches some products but group.query doesn't match any > docs then maxScore=NAN would be returned in the response. > * This happens only in standalone/single shard mode > * score needs to fetched in the response to encounter this issue -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #1330: LUCENE-9267 Replace getQueryBuildTime time unit from ms to ns
madrob commented on pull request #1330: URL: https://github.com/apache/lucene-solr/pull/1330#issuecomment-619086921 I took care of the squash for you, thanks! Also I added an entry to CHANGES - committed in 013e983 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9267) The documentation of getQueryBuildTime function reports a wrong time unit.
[ https://issues.apache.org/jira/browse/LUCENE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved LUCENE-9267. --- Fix Version/s: master (9.0) Assignee: Mike Drob Resolution: Fixed Thanks for the patch! Committed to master! > The documentation of getQueryBuildTime function reports a wrong time unit. > -- > > Key: LUCENE-9267 > URL: https://issues.apache.org/jira/browse/LUCENE-9267 > Project: Lucene - Core > Issue Type: Task > Components: modules/other >Affects Versions: 8.2, 8.3, 8.4 >Reporter: Pierre-Luc Perron >Assignee: Mike Drob >Priority: Trivial > Labels: documentation, newbie, pull-request-available > Fix For: master (9.0) > > Attachments: LUCENE-9267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > As per documentation, the > [MatchingQueries|https://lucene.apache.org/core/8_4_1/monitor/org/apache/lucene/monitor/MatchingQueries.html] > class returns both getQueryBuildTime and getSearchTime in milliseconds. The > code shows > [searchTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/CandidateMatcher.java#L112] > returning milliseconds. However, the code shows > [buildTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java#L280] > returning nanoseconds. > The patch changes the documentation of getQueryBuildTime to report > nanoseconds instead of milliseconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9267) The documentation of getQueryBuildTime function reports a wrong time unit.
[ https://issues.apache.org/jira/browse/LUCENE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091671#comment-17091671 ] ASF subversion and git services commented on LUCENE-9267: - Commit 013e98347a011664bff18f72d7c24eb97b1201d9 in lucene-solr's branch refs/heads/master from Pierre-Luc Perron [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=013e983 ] LUCENE-9267 Replace getQueryBuildTime time unit from ms to ns > The documentation of getQueryBuildTime function reports a wrong time unit. > -- > > Key: LUCENE-9267 > URL: https://issues.apache.org/jira/browse/LUCENE-9267 > Project: Lucene - Core > Issue Type: Task > Components: modules/other >Affects Versions: 8.2, 8.3, 8.4 >Reporter: Pierre-Luc Perron >Priority: Trivial > Labels: documentation, newbie, pull-request-available > Attachments: LUCENE-9267.patch > > Time Spent: 0.5h > Remaining Estimate: 0h > > As per documentation, the > [MatchingQueries|https://lucene.apache.org/core/8_4_1/monitor/org/apache/lucene/monitor/MatchingQueries.html] > class returns both getQueryBuildTime and getSearchTime in milliseconds. The > code shows > [searchTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/CandidateMatcher.java#L112] > returning milliseconds. However, the code shows > [buildTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java#L280] > returning nanoseconds. > The patch changes the documentation of getQueryBuildTime to report > nanoseconds instead of milliseconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on pull request #1355: LUCENE-9279: Update dictionary version for Ukrainian analyzer
madrob commented on pull request #1355: URL: https://github.com/apache/lucene-solr/pull/1355#issuecomment-619079133 Fixed in 7fe6f9c57d This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1371: SOLR-14333: print readable version of CollapsedPostFilter query
madrob commented on a change in pull request #1371: URL: https://github.com/apache/lucene-solr/pull/1371#discussion_r414654670 ## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ## @@ -218,7 +245,7 @@ public static GroupHeadSelector build(final SolrParams localParams) { public String hint; private boolean needsScores = true; private boolean needsScores4Collapsing = false; -private int nullPolicy; +private NullPolicy nullPolicy; private Set boosted; // ordered by "priority" public static final int NULL_POLICY_IGNORE = 0; Review comment: These constants can be removed and everything routed through the NullPolicy enum. Same for the `NULL_COLLAPSE` string above and others. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1391: SOLR-14014 Add a disable Admin UI Flag
madrob commented on a change in pull request #1391: URL: https://github.com/apache/lucene-solr/pull/1391#discussion_r414646287 ## File path: solr/core/src/java/org/apache/solr/servlet/LoadAdminUiServlet.java ## @@ -24,6 +24,7 @@ import org.apache.solr.core.CoreContainer; import org.apache.solr.core.SolrCore; +import javax.servlet.ServletException; Review comment: This looks like an unused import to me? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-9679) Exception when removing zk node /security.json
[ https://issues.apache.org/jira/browse/SOLR-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091630#comment-17091630 ] Jan Høydahl commented on SOLR-9679: --- Any further comments? Probably needs a test... > Exception when removing zk node /security.json > -- > > Key: SOLR-9679 > URL: https://issues.apache.org/jira/browse/SOLR-9679 > Project: Solr > Issue Type: Bug > Components: Authentication >Reporter: Jan Høydahl >Assignee: Jan Høydahl >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > To reproduce: > # Upload {{security.json}} to zk > # {{bin/solr zk rm zk:/security.json -z localhost:9983}} > {noformat} > 2016-10-22 22:17:32.264 DEBUG (main-EventThread) [ ] o.a.s.c.c.SolrZkClient > Submitting job to respond to event WatchedEvent state:SyncConnected > type:NodeDeleted path:/security.json > 2016-10-22 22:17:32.265 DEBUG > (zkCallback-3-thread-1-processing-n:192.168.0.11:8983_solr) [ ] > o.a.s.c.c.ZkStateReader Updating [/security.json] ... > 2016-10-22 22:17:32.266 ERROR > (zkCallback-3-thread-1-processing-n:192.168.0.11:8983_solr) [ ] > o.a.s.c.c.ZkStateReader A ZK error has occurred > org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = > NoNode for /security.json > at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) > at > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356) > at > org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) > at > org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) > at > org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353) > at > org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:455) > at > org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > I'm not sure what should happen, but it would be sweet to be able to disable > security by simply removing the znode... [~noble.paul] ? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] s1monw opened a new pull request #1454: Consolidate all IW locking inside IndexWriter
s1monw opened a new pull request #1454: URL: https://github.com/apache/lucene-solr/pull/1454 Today we still have one class that runs some trickly logic that should be in the IndexWriter in the first place since it requires locking on the IndexWriter itself. This change inverts the API and now FrozendBufferedUpdates does not get the IndexWriter passed in, instead the IndexWriter owns most of the logic and executes on a FrozenBufferedUpdates object. This prevent locking on IndexWriter out side of the writer itself and paves the way to simplify some concurrency down the road This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9345) Separate IndexWriter from MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-9345. - Fix Version/s: 8.6 master (9.0) Lucene Fields: New,Patch Available (was: New) Assignee: Simon Willnauer Resolution: Fixed > Separate IndexWriter from MergeScheduler > > > Key: LUCENE-9345 > URL: https://issues.apache.org/jira/browse/LUCENE-9345 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Simon Willnauer >Assignee: Simon Willnauer >Priority: Major > Fix For: master (9.0), 8.6 > > Time Spent: 40m > Remaining Estimate: 0h > > MergeScheduler is tightly coupled with IndexWriter which causes IW to expose > unnecessary methods. For instance only the scheduler should call > IW#getNextMerge() but it's a public method. With some refactorings we can > nicely separate the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9345) Separate IndexWriter from MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091575#comment-17091575 ] ASF subversion and git services commented on LUCENE-9345: - Commit 9598d43bb629b0434e7ce557fe12ea88c19a5d00 in lucene-solr's branch refs/heads/branch_8x from Simon Willnauer [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9598d43 ] LUCENE-9345: Separate MergeSchedulder from IndexWriter (#1451) This change extracts the methods that are used by MergeScheduler into a MergeSource interface. This allows IndexWriter to better ensure locking, hide internal methods and removes the tight coupling between the two complex classes. This will also improve future testing. > Separate IndexWriter from MergeScheduler > > > Key: LUCENE-9345 > URL: https://issues.apache.org/jira/browse/LUCENE-9345 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Simon Willnauer >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > MergeScheduler is tightly coupled with IndexWriter which causes IW to expose > unnecessary methods. For instance only the scheduler should call > IW#getNextMerge() but it's a public method. With some refactorings we can > nicely separate the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9345) Separate IndexWriter from MergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091549#comment-17091549 ] ASF subversion and git services commented on LUCENE-9345: - Commit d7e0b906abcbc43d3737224cadcda7d2c795ccb0 in lucene-solr's branch refs/heads/master from Simon Willnauer [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d7e0b90 ] LUCENE-9345: Separate MergeSchedulder from IndexWriter (#1451) This change extracts the methods that are used by MergeScheduler into a MergeSource interface. This allows IndexWriter to better ensure locking, hide internal methods and removes the tight coupling between the two complex classes. This will also improve future testing. > Separate IndexWriter from MergeScheduler > > > Key: LUCENE-9345 > URL: https://issues.apache.org/jira/browse/LUCENE-9345 > Project: Lucene - Core > Issue Type: Improvement >Affects Versions: master (9.0) >Reporter: Simon Willnauer >Priority: Major > Time Spent: 40m > Remaining Estimate: 0h > > MergeScheduler is tightly coupled with IndexWriter which causes IW to expose > unnecessary methods. For instance only the scheduler should call > IW#getNextMerge() but it's a public method. With some refactorings we can > nicely separate the two. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9338) Clean up type safety in SimpleBindings
[ https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated LUCENE-9338: -- Fix Version/s: 8.6 > Clean up type safety in SimpleBindings > -- > > Key: LUCENE-9338 > URL: https://issues.apache.org/jira/browse/LUCENE-9338 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.6 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > SimpleBindings holds its bindings as a Map, and then casts > things when it builds its value sources. We can instead store a map of > Supplier and avoid casts entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method
[ https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-9340. --- Fix Version/s: 8.6 Resolution: Fixed > Deprecate and remove the SimpleBindings.add(SortField) method > - > > Key: LUCENE-9340 > URL: https://issues.apache.org/jira/browse/LUCENE-9340 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 8.6 > > Time Spent: 0.5h > Remaining Estimate: 0h > > This method is trappy, in that it only works for certain types of SortField > and you only find out which at runtime. We should deprecate it and encourage > users to pass an equivalent DoubleValuesSource instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9338) Clean up type safety in SimpleBindings
[ https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward resolved LUCENE-9338. --- Resolution: Fixed > Clean up type safety in SimpleBindings > -- > > Key: LUCENE-9338 > URL: https://issues.apache.org/jira/browse/LUCENE-9338 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > SimpleBindings holds its bindings as a Map, and then casts > things when it builds its value sources. We can instead store a map of > Supplier and avoid casts entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method
[ https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091485#comment-17091485 ] ASF subversion and git services commented on LUCENE-9340: - Commit 5eb117f561ab691f34409943ae1f85781735f8e0 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5eb117f ] LUCENE-9340: Remove deprecated SimpleBindings#add(SortField) method > Deprecate and remove the SimpleBindings.add(SortField) method > - > > Key: LUCENE-9340 > URL: https://issues.apache.org/jira/browse/LUCENE-9340 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This method is trappy, in that it only works for certain types of SortField > and you only find out which at runtime. We should deprecate it and encourage > users to pass an equivalent DoubleValuesSource instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method
[ https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091483#comment-17091483 ] ASF subversion and git services commented on LUCENE-9340: - Commit 72888bced33ff6c85102b162ff2a7303a17e253f in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=72888bc ] LUCENE-9340: Deprecate SimpleBindings#add(SortField) (#1447) This method is trappy; it doesn't work for all SortField types, but doesn't tell you that until runtime. This commit deprecates it, and removes all other callsites in the codebase. > Deprecate and remove the SimpleBindings.add(SortField) method > - > > Key: LUCENE-9340 > URL: https://issues.apache.org/jira/browse/LUCENE-9340 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This method is trappy, in that it only works for certain types of SortField > and you only find out which at runtime. We should deprecate it and encourage > users to pass an equivalent DoubleValuesSource instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method
[ https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091472#comment-17091472 ] ASF subversion and git services commented on LUCENE-9340: - Commit f6462ee35056f92bcfeed5f251d5372506e66b57 in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f6462ee ] LUCENE-9340: Deprecate SimpleBindings#add(SortField) (#1447) This method is trappy; it doesn't work for all SortField types, but doesn't tell you that until runtime. This commit deprecates it, and removes all other callsites in the codebase. > Deprecate and remove the SimpleBindings.add(SortField) method > - > > Key: LUCENE-9340 > URL: https://issues.apache.org/jira/browse/LUCENE-9340 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > This method is trappy, in that it only works for certain types of SortField > and you only find out which at runtime. We should deprecate it and encourage > users to pass an equivalent DoubleValuesSource instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery
[ https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091408#comment-17091408 ] Stamatis Zampetakis commented on LUCENE-8811: - So if I interpret well your response [~romseygeek], you are saying that {{TermInSetQuery}} should accept an unlimited number of terms. Is that correct? Looking again into the summary and discussion of this issue, I see that the goal was to "make this check more consistent across queries". I don't clearly see why {{TermInSetQuery}} should remain unbounded. On the other hand, if we wanted to enforce the check only for a {{BooleanQuery}} then why using the {{getNumClausesCheckVisitor}} visitor on every query. I see that {{TermInSetQuery#visit}} for example already iterates through all terms so if we don't need then we are just wasting CPU cycles without a very good reason. Sorry to insist on this but it is a change that will likely break our current implementation in the downstream project and I guess it will also affect quite a few others. > Add maximum clause count check to IndexSearcher rather than BooleanQuery > > > Key: LUCENE-8811 > URL: https://issues.apache.org/jira/browse/LUCENE-8811 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Alan Woodward >Priority: Minor > Fix For: master (9.0) > > Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, > LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch > > > Currently we only check whether boolean queries have too many clauses. > However there are other ways that queries may have too many clauses, for > instance if you have boolean queries that have themselves inner boolean > queries. > Could we use the new Query visitor API to move this check from BooleanQuery > to IndexSearcher in order to make this check more consistent across queries? > See for instance LUCENE-8810 where a rewrite rule caused the maximum clause > count to be hit even though the total number of leaf queries remained the > same. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide
[ https://issues.apache.org/jira/browse/SOLR-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Salamon updated SOLR-14435: -- Attachment: SOLR-14435-01.patch Status: Open (was: Open) > createNodeSet and createNodeSet.shuffle parameters missing from Collection > Restore RefGuide > --- > > Key: SOLR-14435 > URL: https://issues.apache.org/jira/browse/SOLR-14435 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Andras Salamon >Priority: Minor > Attachments: SOLR-14435-01.patch > > > Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are > supported by the Collection RESTORE command (I've tested it), they are > missing from the documentation: > [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide
[ https://issues.apache.org/jira/browse/SOLR-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andras Salamon updated SOLR-14435: -- Status: Patch Available (was: Open) > createNodeSet and createNodeSet.shuffle parameters missing from Collection > Restore RefGuide > --- > > Key: SOLR-14435 > URL: https://issues.apache.org/jira/browse/SOLR-14435 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation >Reporter: Andras Salamon >Priority: Minor > Attachments: SOLR-14435-01.patch > > > Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are > supported by the Collection RESTORE command (I've tested it), they are > missing from the documentation: > [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide
Andras Salamon created SOLR-14435: - Summary: createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide Key: SOLR-14435 URL: https://issues.apache.org/jira/browse/SOLR-14435 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Components: documentation Reporter: Andras Salamon Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are supported by the Collection RESTORE command (I've tested it), they are missing from the documentation: [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9338) Clean up type safety in SimpleBindings
[ https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091388#comment-17091388 ] ASF subversion and git services commented on LUCENE-9338: - Commit b66b970d1b06e56b0d685613855be5ef6d1a2c60 in lucene-solr's branch refs/heads/branch_8x from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b66b970 ] LUCENE-9338: Clean up type safety in SimpleBindings (#1444) Replaces SimpleBindings' Map with a map of Function to improve type safety, and reworks cycle detection and validation to avoid catching StackOverflowException > Clean up type safety in SimpleBindings > -- > > Key: LUCENE-9338 > URL: https://issues.apache.org/jira/browse/LUCENE-9338 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > SimpleBindings holds its bindings as a Map, and then casts > things when it builds its value sources. We can instead store a map of > Supplier and avoid casts entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9338) Clean up type safety in SimpleBindings
[ https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091372#comment-17091372 ] ASF subversion and git services commented on LUCENE-9338: - Commit ed3caab2d86b69ec4b3ed8e787827c0931b43d1b in lucene-solr's branch refs/heads/master from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ed3caab ] LUCENE-9338: Clean up type safety in SimpleBindings (#1444) Replaces SimpleBindings' Map with a map of Function to improve type safety, and reworks cycle detection and validation to avoid catching StackOverflowException > Clean up type safety in SimpleBindings > -- > > Key: LUCENE-9338 > URL: https://issues.apache.org/jira/browse/LUCENE-9338 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > SimpleBindings holds its bindings as a Map, and then casts > things when it builds its value sources. We can instead store a map of > Supplier and avoid casts entirely. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents
jpountz commented on a change in pull request #1351: URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r414392251 ## File path: lucene/core/src/java/org/apache/lucene/search/FilterLeafCollector.java ## @@ -53,4 +53,8 @@ public String toString() { return name + "(" + in + ")"; } + @Override + public DocIdSetIterator competitiveIterator() { +return in.competitiveIterator(); + } Review comment: We've had endless discussions about whether or not to delegate in FilterXXX classes and I think that the consensus is that we should only delegate abstract methods. Since this one has a default implementation, let's not delegate and look for extensions of FilterCollector that should delegate it? (e.g. asserting collectors) ## File path: lucene/core/src/java/org/apache/lucene/search/FilteringFieldComparator.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; + +/** + * Decorates a wrapped FieldComparator to add a functionality to skip over non-competitive docs. + * FilteringFieldComparator provides two additional functions for a FieldComparator: + * 1) {@code competitiveIterator()} that returns an iterator over + * competitive docs that are stronger than already collected docs. + * 2) {@code setCanUpdateIterator()} that notifies the comparator when it is ok to start updating its internal iterator. + * This method is called from a collector to inform the comparator to start updating its iterator. + */ +public abstract class FilteringFieldComparator extends FieldComparator { +final FieldComparator in; + +public FilteringFieldComparator(FieldComparator in) { +this.in = in; +} + +protected abstract DocIdSetIterator competitiveIterator(); + +protected abstract void setCanUpdateIterator() throws IOException; Review comment: can you add javadocs? ## File path: lucene/core/src/java/org/apache/lucene/search/FilteringFieldComparator.java ## @@ -0,0 +1,83 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search; + +import java.io.IOException; + +/** + * Decorates a wrapped FieldComparator to add a functionality to skip over non-competitive docs. + * FilteringFieldComparator provides two additional functions for a FieldComparator: + * 1) {@code competitiveIterator()} that returns an iterator over + * competitive docs that are stronger than already collected docs. + * 2) {@code setCanUpdateIterator()} that notifies the comparator when it is ok to start updating its internal iterator. + * This method is called from a collector to inform the comparator to start updating its iterator. + */ +public abstract class FilteringFieldComparator extends FieldComparator { +final FieldComparator in; + +public FilteringFieldComparator(FieldComparator in) { +this.in = in; +} + +protected abstract DocIdSetIterator competitiveIterator(); Review comment: Let's only have this method on LeafFieldCompatarors, e.g. by doing something like this? FieldComparators are top-level objects so it doesn't make sense to have leaf-level objects defined on them like DocIdSetIterators. ```suggestion @Override public abstract FilteringLeafFieldComparator getLeafComparator(LeafReaderContext context) throws IOException; // covariant return ty
[jira] [Updated] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?
[ https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ignacio Vera updated LUCENE-9087: - Description: Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the constructor. For the current default codec the value is set to 1024. This is a good compromise between memory usage and performance of the BKD tree. Lowering this value can increase search performance but it has a penalty in memory usage. Now that the BKD tree can be load off-heap, this can be less of a concern. Note that lowering too much that value can hurt performance as well as the tree becomes too deep and benefits are gone. For data types that use the tree as an effective R-tree (ranges and shapes datatypes) the benefits are larger as it can minimise the overlap between leaf nodes. Finally, creating too many leaf nodes can be dangerous at write time as memory usage depends on the number of leaf nodes created. The writer creates a long array of length = numberOfLeafNodes. What I am wondering here is if we can improve this situation in order to create the most efficient tree? My current ideas are: * We can adapt the points per leaf depending on that number so we create a tree with the best depth and best points per leaf. Note that for the for 1D case we have an upper estimation of the number of points that the tree will be indexing. * Add a mechanism so field types can easily define their best points per leaf. In the case, field types like ranges or shapes can define its own value to minimise overlap. * Maybe the default is just too high now that we can load the tree off-heap. Any thoughts? was: Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the constructor. For the current default codec the value is set to 1200. This is a good compromise between memory usage and performance of the BKD tree. Lowering this value can increase search performance but it has a penalty in memory usage. Now that the BKD tree can be load off-heap, this can be less of a concern. Note that lowering too much that value can hurt performance as well as the tree becomes too deep and benefits are gone. For data types that use the tree as an effective R-tree (ranges and shapes datatypes) the benefits are larger as it can minimise the overlap between leaf nodes. Finally, creating too many leaf nodes can be dangerous at write time as memory usage depends on the number of leaf nodes created. The writer creates a long array of length = numberOfLeafNodes. What I am wondering here is if we can improve this situation in order to create the most efficient tree? My current ideas are: * We can adapt the points per leaf depending on that number so we create a tree with the best depth and best points per leaf. Note that for the for 1D case we have an upper estimation of the number of points that the tree will be indexing. * Add a mechanism so field types can easily define their best points per leaf. In the case, field types like ranges or shapes can define its own value to minimise overlap. * Maybe the default is just too high now that we can load the tree off-heap. Any thoughts? > Should the BKD tree use a fixed maxPointsInLeafNode? > - > > Key: LUCENE-9087 > URL: https://issues.apache.org/jira/browse/LUCENE-9087 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > > Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the > constructor. For the current default codec the value is set to 1024. This is > a good compromise between memory usage and performance of the BKD tree. > Lowering this value can increase search performance but it has a penalty in > memory usage. Now that the BKD tree can be load off-heap, this can be less of > a concern. Note that lowering too much that value can hurt performance as > well as the tree becomes too deep and benefits are gone. > For data types that use the tree as an effective R-tree (ranges and shapes > datatypes) the benefits are larger as it can minimise the overlap between > leaf nodes. > Finally, creating too many leaf nodes can be dangerous at write time as > memory usage depends on the number of leaf nodes created. The writer creates > a long array of length = numberOfLeafNodes. > What I am wondering here is if we can improve this situation in order to > create the most efficient tree? My current ideas are: > > * We can adapt the points per leaf depending on that number so we create a > tree with the best depth and best points per leaf. Note that for the for 1D > case we have an upper estimation of the number of points that the tree will > be indexing. > * Add a mechanism so field types can easily de
[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch
[ https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091320#comment-17091320 ] Adrien Grand commented on LUCENE-9346: -- In terms of implementation, I wonder that it should be mostly about making sure that {{WANDScorer.tailSize}} never gets greater than or equal to {{minimumNumberShouldMatch}}. I don't have plans to work on it in the near future so feel free to take it if you're interested, I can help with the reviews. > WANDScorer should support minimumNumberShouldMatch > -- > > Key: LUCENE-9346 > URL: https://issues.apache.org/jira/browse/LUCENE-9346 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > > Currently we deoptimize when a minimumNumberShouldMatch is provided and fall > back to a scorer that doesn't dynamically prune hits based on scores. > Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we > could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould > match. Then any improvements we bring to WANDScorer like two-phase support > (LUCENE-8806) would automatically cover more queries. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch
Adrien Grand created LUCENE-9346: Summary: WANDScorer should support minimumNumberShouldMatch Key: LUCENE-9346 URL: https://issues.apache.org/jira/browse/LUCENE-9346 Project: Lucene - Core Issue Type: Improvement Reporter: Adrien Grand Currently we deoptimize when a minimumNumberShouldMatch is provided and fall back to a scorer that doesn't dynamically prune hits based on scores. Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould match. Then any improvements we bring to WANDScorer like two-phase support (LUCENE-8806) would automatically cover more queries. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings
jpountz commented on a change in pull request #1444: URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r414350099 ## File path: lucene/expressions/src/java/org/apache/lucene/expressions/SimpleBindings.java ## @@ -96,24 +90,51 @@ public DoubleValuesSource getDoubleValuesSource(String name) { case SCORE: return DoubleValuesSource.SCORES; default: -throw new UnsupportedOperationException(); +throw new UnsupportedOperationException(); } } - /** - * Traverses the graph of bindings, checking there are no cycles or missing references - * @throws IllegalArgumentException if the bindings is inconsistent + @Override + public DoubleValuesSource getDoubleValuesSource(String name) { +if (map.containsKey(name) == false) { + throw new IllegalArgumentException("Invalid reference '" + name + "'"); +} +return map.get(name).apply(this); + } + + /** + * Traverses the graph of bindings, checking there are no cycles or missing references + * @throws IllegalArgumentException if the bindings is inconsistent */ public void validate() { -for (Object o : map.values()) { - if (o instanceof Expression) { -Expression expr = (Expression) o; -try { - expr.getDoubleValuesSource(this); -} catch (StackOverflowError e) { - throw new IllegalArgumentException("Recursion Error: Cycle detected originating in (" + expr.sourceText + ")"); -} +for (String origin : map.keySet()) { Review comment: nit: use entrySet() since you consume both keys and values? ## File path: lucene/expressions/src/test/org/apache/lucene/expressions/TestExpressionValidation.java ## @@ -110,4 +110,15 @@ public void testCoRecursion4() throws Exception { }); assertTrue(expected.getMessage().contains("Cycle detected")); } + + public void testCoRecursion42() throws Exception { Review comment: I provided this test but I don't think we should add it as it relies on iteration order and might get defeated on some JVMs or future versions of Java. I'd suggest to not add this test, or to change the map in SimpleBindings from a HashMap to a TreeMap to be able to better test such cases (in which case we'd have to swap `cycle0`/`cycle2`), this could be done in a follow-up too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org