[GitHub] [lucene-solr] tflobbe commented on pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-04-24 Thread GitBox


tflobbe commented on pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619253348


   ah, yes, that makes sense



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2020-04-24 Thread Steven Rowe (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-8278:

Comment: was deleted

(was: Portal situs judi online terlengkap
https://www.caradaftarsbobetterbaru.com/)

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 7.4, 8.0
>
> Attachments: LUCENE-8278.patch, patched.png, unpatched.png
>
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091912#comment-17091912
 ] 

Mike Drob commented on SOLR-14428:
--

I'm going to pause on the path with {{interface Minimizable}} because after 
trying to start on it, it ends up being a huge mess of generic types and 
covariant types and I'm not sure it actually makes anything better or more 
readable.

Let's wait to get some more feedback from the other folks involved?

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-04-24 Thread GitBox


jpountz commented on pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619245099


   > Any number > Integer.MAX_VALUE means that each shard will reply with 
accurate count, right? so the sum of them all will also be accurate.
   
   Yes indeed. The perspective I had was rather than since this parameter is a 
number of hits, as a user I would expect any legal number of hits to also be a 
legal value for this parameter. So it would be a shame to fail or worse 
silently cast to an int if a user passes 3B as a value? I believe it would be 
fine to keep it an integer internally but carefully accepting longs when 
parsing URL parameters and rounding to the nearest integer, ie. all value 
greater than Integer.MAX_VALUE would get converted to Integer.MAX_VALUE?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091900#comment-17091900
 ] 

Mike Drob commented on SOLR-14428:
--

So another approach here would be to intercept the keys in CaffeineCache.put 
and if they implement some marker interface then we can call {{toCacheKey}}.

If it's more generic in this way, then it probably makes sense to do

{code}
public interface Minimizable {
  /** return an object equal to this object, but possibly with a minimized 
memory footprint */
  Minimizable minimize();
}
{code}

Then we don't have to make it invasive on {{Query}} and can stick to the places 
where it makes a difference. Which would be FuzzyQuery, took a glance and maybe 
AutomataQuery, and then probably anything that wraps around other queries. 
BooleanQuery? Are there others, I haven't looked yet.

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe commented on pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-04-24 Thread GitBox


tflobbe commented on pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619239862


   > Does minExactHits need to accept longs since the number of hits across 
shards might exceed Integer.MAX_VALUE?
   
   Good point, I didn't think about the distributed aspect yet. However, does 
it make sense to you? Any number > Integer.MAX_VALUE means that each shard will 
reply with accurate count, right? so the sum of them all will also be accurate. 
   
   > Also I see you chose to make BMW an opt-in, I hope we'll find ways to 
enable it by default eventually. :)
   Yes, I think this should be default in master. I was planning in making it 
in another PR to make it explicit.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091882#comment-17091882
 ] 

Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 8:52 PM:
---

I did the same thing with {{filters}} and that's ok...

Now I'm thinking, since some Queries must hold references to other queries when 
they are constructed, toCacheKey will need to be implemented on those in order 
to retrieve the cache key variants of the Queries they are constructed with.
 e.g. the clause sets in BooleanQuery
{noformat}
private final Map> clauseSets; // used for 
equals/hashcode{noformat}
And any third-party Query implementations that hold on to references to other 
Queries would need updating too.

Just seems like there might be quite a lot of fallout from this. Though it 
might still be best in the long run.

 

 

Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the 
query SolrCache, so need to update the puts to it too.

Calling toCacheKey on all the puts to the {{filterCache}} in SolrIndexSearcher 
has got the memory usage sorted out again with me firing the Fuzzy Queries as fq

 


was (Author: cjcowie):
I did the same thing with {{filters}} and that's ok...

Now I'm thinking, since some Queries must hold references to other queries when 
they are constructed, toCacheKey will need to be implemented on those in order 
to retrieve the cache key variants of the Queries they are constructed with.
 e.g. the clause sets in BooleanQuery
{noformat}
private final Map> clauseSets; // used for 
equals/hashcode{noformat}
And any third-party Query implementations that hold on to references to other 
Queries would need updating too.

Just seems like there might be quite a lot of fallout from this.

 

 

Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the 
query SolrCache, so need to update the puts to it too.

Calling toCacheKey on all the puts to the {{filterCache}} in SolrIndexSearcher 
has got the memory usage sorted out again with me firing the Fuzzy Queries as fq

 

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091882#comment-17091882
 ] 

Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 8:45 PM:
---

I did the same thing with {{filters}} and that's ok...

Now I'm thinking, since some Queries must hold references to other queries when 
they are constructed, toCacheKey will need to be implemented on those in order 
to retrieve the cache key variants of the Queries they are constructed with.
 e.g. the clause sets in BooleanQuery
{noformat}
private final Map> clauseSets; // used for 
equals/hashcode{noformat}
And any third-party Query implementations that hold on to references to other 
Queries would need updating too.

Just seems like there might be quite a lot of fallout from this.

 

 

Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the 
query SolrCache, so need to update the puts to it too.

Calling toCacheKey on all the puts to the {{filterCache}} in SolrIndexSearcher 
has got the memory usage sorted out again with me firing the Fuzzy Queries as fq

 


was (Author: cjcowie):
I did the same thing with {{filters}} and that's ok...

Now I'm thinking, since some Queries must hold references to other queries when 
they are constructed, toCacheKey will need to be implemented on those in order 
to retrieve the cache key variants of the Queries they are constructed with.
e.g. the clause sets in BooleanQuery
{noformat}
private final Map> clauseSets; // used for 
equals/hashcode{noformat}
And any third-party Query implementations that hold on to references to other 
Queries would need updating too.

Just seems like there might be quite a lot of fallout from this.

 

 

Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the 
query SolrCache, so need to update the puts to it too.

 

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091882#comment-17091882
 ] 

Colvin Cowie commented on SOLR-14428:
-

I did the same thing with {{filters}} and that's ok...

Now I'm thinking, since some Queries must hold references to other queries when 
they are constructed, toCacheKey will need to be implemented on those in order 
to retrieve the cache key variants of the Queries they are constructed with.
e.g. the clause sets in BooleanQuery
{noformat}
private final Map> clauseSets; // used for 
equals/hashcode{noformat}
And any third-party Query implementations that hold on to references to other 
Queries would need updating too.

Just seems like there might be quite a lot of fallout from this.

 

 

Also the {{filterCache}} doesn't use QueryResultKey, it's just keyed by the 
query SolrCache, so need to update the puts to it too.

 

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14437) Remove/refactor "ApiSupport" interface? (for V2 API)

2020-04-24 Thread David Smiley (Jira)
David Smiley created SOLR-14437:
---

 Summary: Remove/refactor "ApiSupport" interface? (for V2 API)
 Key: SOLR-14437
 URL: https://issues.apache.org/jira/browse/SOLR-14437
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: v2 API
Affects Versions: master (9.0)
Reporter: David Smiley


ApiSupport.java is an interface relating to the V2 API that is implemented by 
all request handlers, both those at a core level and others.  It's essentially 
this: (comments removed)
{code:java}
public interface ApiSupport {
  Collection getApis();
  default Boolean registerV1() { return Boolean.TRUE; }
  default Boolean registerV2() { return Boolean.FALSE; }
}
{code}

Firstly, let's always assume that the handler will always be registered in V2.  
All implementations I've seen explicitly return true here; maybe I'm missing 
something though.

Secondly, getApis() seems problematic for the ability to lazily load request 
handlers.  Can we assume, at least for core level request handlers, that there 
is exactly one API and where necessary rely on the "spec" JSON definition -- 
see org.apache.solr.api.ApiBag#registerLazy ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091869#comment-17091869
 ] 

Colvin Cowie commented on SOLR-14428:
-

Oh yes, you're absolutely right

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14436) When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym

2020-04-24 Thread Atin (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091868#comment-17091868
 ] 

Atin commented on SOLR-14436:
-

Hi,
Thank you for your response.
I had put it on the user-list. However I received no resolution, where one
user responded that a multi-word synonym cannot be tokenized further.

Now how can it de decided of it is a code issue? As nobody on user-list had
a solution.




> When using Synonym Graph Filter, Solr does not tokenize query-string if it 
> has multi-word synonym
> -
>
> Key: SOLR-14436
> URL: https://issues.apache.org/jira/browse/SOLR-14436
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query, Schema and Analysis
>Affects Versions: 8.3.1
>Reporter: Atin
>Priority: Major
> Attachments: Scenario1.png, Scenario2.png
>
>
>  
> While using Synonym Graph Filter,  if the query string contains a multi-word 
> synonym, it considers that multi-word synonym as a single term and does not 
> tokenize it further.
>  
> For example- *soap powder* is a search *query* which is also a _multi-word 
> synonym_ in the synonym file as-
> {quote}s(104254535,1,'soap powder',n,1,1).
> s(104254535,2,'built-soap powder',n,1,0).
> s(104254535,3,'washing powder',n,1,0).{quote}
>  
> There are 2 documents having _soap_(2) and _powder_(1) altogether.
> doc1: "Sunny Berlin breast tumors soap powder"
> doc2: "She is in soap Berlin today"
>  
>  
> +Scenario 1 (screenshot attached)+ 
>  *without* Synonym Graph Filter => 2 docs returned  , as it checks for 
> *"soap"*  and *"powder"* separately.
>  
> +Scenario 2 (screenshot attached)+ 
> *with* Synonym Graph Filter => only 1 doc returned, but 2 were expected. Here 
> only *"soap powder"* is being checked and it is not tokenized into "soap" and 
> "powder" and searched further.
> Is it possible to expand query string - *soap powder* as:
> Synonym(soap powder) + Synonym(soap) + Synonym(powder)
>  
> Thank You.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Issue Comment Deleted] (SOLR-11960) Add collection level properties

2020-04-24 Thread Tomas Eduardo Fernandez Lobbe (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomas Eduardo Fernandez Lobbe updated SOLR-11960:
-
Comment: was deleted

(was: Cara Daftar Sbobet

https://www.caradaftarsbobetterbaru.com/)

> Add collection level properties
> ---
>
> Key: SOLR-11960
> URL: https://issues.apache.org/jira/browse/SOLR-11960
> Project: Solr
>  Issue Type: New Feature
>Reporter: Peter Rusko
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: 7.3, 8.0
>
> Attachments: SOLR-11960.patch, SOLR-11960.patch, SOLR-11960.patch, 
> SOLR-11960.patch, SOLR-11960.patch, SOLR-11960_2.patch
>
>
> Solr has cluster properties, but no easy and extendable way of defining 
> properties that affect a single collection. Collection properties could be 
> stored in a single zookeeper node per collection, making it possible to 
> trigger zookeeper watchers for only those Solr nodes that have cores of that 
> collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14436) When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym

2020-04-24 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14436.
---
Resolution: Invalid

This is more a usage question than a bug/code issue, it would be best to raise 
this question the user's list.

See: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc there are links 
to both Lucene and Solr mailing lists there.

A _lot_ more people will see your question on that list and may be able to help 
more quickly.

If it's determined that this really is a code issue or enhancement to Lucene or 
Solr and not a configuration/usage problem, we can raise a new JIRA or reopen 
this one.



> When using Synonym Graph Filter, Solr does not tokenize query-string if it 
> has multi-word synonym
> -
>
> Key: SOLR-14436
> URL: https://issues.apache.org/jira/browse/SOLR-14436
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query, Schema and Analysis
>Affects Versions: 8.3.1
>Reporter: Atin
>Priority: Major
> Attachments: Scenario1.png, Scenario2.png
>
>
>  
> While using Synonym Graph Filter,  if the query string contains a multi-word 
> synonym, it considers that multi-word synonym as a single term and does not 
> tokenize it further.
>  
> For example- *soap powder* is a search *query* which is also a _multi-word 
> synonym_ in the synonym file as-
> {quote}s(104254535,1,'soap powder',n,1,1).
> s(104254535,2,'built-soap powder',n,1,0).
> s(104254535,3,'washing powder',n,1,0).{quote}
>  
> There are 2 documents having _soap_(2) and _powder_(1) altogether.
> doc1: "Sunny Berlin breast tumors soap powder"
> doc2: "She is in soap Berlin today"
>  
>  
> +Scenario 1 (screenshot attached)+ 
>  *without* Synonym Graph Filter => 2 docs returned  , as it checks for 
> *"soap"*  and *"powder"* separately.
>  
> +Scenario 2 (screenshot attached)+ 
> *with* Synonym Graph Filter => only 1 doc returned, but 2 were expected. Here 
> only *"soap powder"* is being checked and it is not tokenized into "soap" and 
> "powder" and searched further.
> Is it possible to expand query string - *soap powder* as:
> Synonym(soap powder) + Synonym(soap) + Synonym(powder)
>  
> Thank You.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091862#comment-17091862
 ] 

Mike Drob commented on SOLR-14428:
--

Good catch. I think we might also need to do a similar operation on {{filters}} 
now that I'm looking at it more closely.

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8278) UAX29URLEmailTokenizer is not detecting some tokens as URL type

2020-04-24 Thread agen hoqbet (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091861#comment-17091861
 ] 

agen hoqbet commented on LUCENE-8278:
-

Portal situs judi online terlengkap
https://www.caradaftarsbobetterbaru.com/

> UAX29URLEmailTokenizer is not detecting some tokens as URL type
> ---
>
> Key: LUCENE-8278
> URL: https://issues.apache.org/jira/browse/LUCENE-8278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Junte Zhang
>Assignee: Steven Rowe
>Priority: Minor
> Fix For: 7.4, 8.0
>
> Attachments: LUCENE-8278.patch, patched.png, unpatched.png
>
>
> We are using the UAX29URLEmailTokenizer so we can use the token types in our 
> plugins.
> However, I noticed that the tokenizer is not detecting certain URLs as  
> but  instead.
> Examples that are not working:
>  * example.com is 
>  * example.net is 
> But:
>  * https://example.com is 
>  * as is https://example.net
> Examples that work:
>  * example.ch is 
>  * example.co.uk is 
>  * example.nl is 
> I have checked this JIRA, and could not find an issue. I have tested this on 
> Lucene (Solr) 6.4.1 and 7.3.
> Could someone confirm my findings and advise what I could do to (help) 
> resolve this issue?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091843#comment-17091843
 ] 

Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 7:47 PM:
---

-Ah, I assume the statistics plugin uses RamUsageQueryVisitor, which triggers 
the building of the automata-

 

It's because QueryResultKey gets the ramBytesUsed from the original query still
{code:java}
ramBytesUsed =
BASE_RAM_BYTES_USED +
ramSfields +
RamUsageEstimator.sizeOfObject(query, 
RamUsageEstimator.QUERY_DEFAULT_RAM_BYTES_USED) +
RamUsageEstimator.sizeOfObject(filters, 
RamUsageEstimator.QUERY_DEFAULT_RAM_BYTES_USED);
{code}


was (Author: cjcowie):
Ah, I assume the statistics plugin uses RamUsageQueryVisitor, which triggers 
the building of the automata

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-11960) Add collection level properties

2020-04-24 Thread agen hoqbet (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-11960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091860#comment-17091860
 ] 

agen hoqbet commented on SOLR-11960:


Cara Daftar Sbobet

https://www.caradaftarsbobetterbaru.com/

> Add collection level properties
> ---
>
> Key: SOLR-11960
> URL: https://issues.apache.org/jira/browse/SOLR-11960
> Project: Solr
>  Issue Type: New Feature
>Reporter: Peter Rusko
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Blocker
> Fix For: 7.3, 8.0
>
> Attachments: SOLR-11960.patch, SOLR-11960.patch, SOLR-11960.patch, 
> SOLR-11960.patch, SOLR-11960.patch, SOLR-11960_2.patch
>
>
> Solr has cluster properties, but no easy and extendable way of defining 
> properties that affect a single collection. Collection properties could be 
> stored in a single zookeeper node per collection, making it possible to 
> trigger zookeeper watchers for only those Solr nodes that have cores of that 
> collection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091843#comment-17091843
 ] 

Colvin Cowie edited comment on SOLR-14428 at 4/24/20, 7:43 PM:
---

Ah, I assume the statistics plugin uses RamUsageQueryVisitor, which triggers 
the building of the automata


was (Author: cjcowie):
Ah, the statistics plugin uses RamUsageQueryVisitor, which triggers the 
building of the automata

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-04-24 Thread GitBox


jpountz commented on pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456#issuecomment-619206053


   Does minExactHits need to accept longs since the number of hits across 
shards might exceed Integer.MAX_VALUE?
   
   Also I see you chose to make BMW an opt-in, I hope we'll find ways to enable 
it by default eventually. :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide

2020-04-24 Thread Lucene/Solr QA (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091844#comment-17091844
 ] 

Lucene/Solr QA commented on SOLR-14435:
---

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
|| || || || {color:brown} master Compile Tests {color} ||
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | 
{color:green}  0m  3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | 
{color:green}  0m  3s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate ref guide {color} | 
{color:green}  0m  3s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:black}{color} | {color:black} {color} | {color:black}  1m 14s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-14435 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13001059/SOLR-14435-01.patch |
| Optional Tests |  ratsources  validatesourcepatterns  validaterefguide  |
| uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP 
Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh
 |
| git revision | master / ecc98e8 |
| ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 |
| modules | C: solr/solr-ref-guide U: solr/solr-ref-guide |
| Console output | 
https://builds.apache.org/job/PreCommit-SOLR-Build/742/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> createNodeSet and createNodeSet.shuffle parameters missing from Collection 
> Restore RefGuide
> ---
>
> Key: SOLR-14435
> URL: https://issues.apache.org/jira/browse/SOLR-14435
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14435-01.patch
>
>
> Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are 
> supported by the Collection RESTORE command (I've tested it), they are 
> missing from the documentation:
> [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091843#comment-17091843
 ] 

Colvin Cowie commented on SOLR-14428:
-

Ah, the statistics plugin uses RamUsageQueryVisitor, which triggers the 
building of the automata

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091838#comment-17091838
 ] 

Colvin Cowie commented on SOLR-14428:
-

Thanks, I patched it in. The heap usage looks a lot closer to how it was before 
on my stress test.

!image-2020-04-24-20-09-31-179.png!

I'm surprised to see the cache statistics are still reporting high ramBytesUsed 
though, e.g. a search for "field_s:e41848af85d24ac197c71db6888e17bc~2" still 
results in a ramBytesUsed of 648863
{code:java}
this.ramBytesUsed = BASE_RAM_BYTES + term.ramBytesUsed();
{code}
in the new constructor looks like it should do the right thing

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Colvin Cowie (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colvin Cowie updated SOLR-14428:

Attachment: image-2020-04-24-20-09-31-179.png

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> image-2020-04-24-20-09-31-179.png, screenshot-2.png, screenshot-3.png, 
> screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] tflobbe opened a new pull request #1456: SOLR-13289: Support for BlockMax WAND

2020-04-24 Thread GitBox


tflobbe opened a new pull request #1456:
URL: https://github.com/apache/lucene-solr/pull/1456


   This is still very much WIP. Some SolrJ tests are failing



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-04-24 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091835#comment-17091835
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-13289:
--

Here is my current progress: https://github.com/apache/lucene-solr/pull/1456

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14436) When using Synonym Graph Filter, Solr does not tokenize query-string if it has multi-word synonym

2020-04-24 Thread Atin (Jira)
Atin created SOLR-14436:
---

 Summary: When using Synonym Graph Filter, Solr does not tokenize 
query-string if it has multi-word synonym
 Key: SOLR-14436
 URL: https://issues.apache.org/jira/browse/SOLR-14436
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: query, Schema and Analysis
Affects Versions: 8.3.1
Reporter: Atin
 Attachments: Scenario1.png, Scenario2.png

 

While using Synonym Graph Filter,  if the query string contains a multi-word 
synonym, it considers that multi-word synonym as a single term and does not 
tokenize it further.

 

For example- *soap powder* is a search *query* which is also a _multi-word 
synonym_ in the synonym file as-
{quote}s(104254535,1,'soap powder',n,1,1).
s(104254535,2,'built-soap powder',n,1,0).
s(104254535,3,'washing powder',n,1,0).{quote}
 
There are 2 documents having _soap_(2) and _powder_(1) altogether.
doc1: "Sunny Berlin breast tumors soap powder"
doc2: "She is in soap Berlin today"
 
 
+Scenario 1 (screenshot attached)+ 
 *without* Synonym Graph Filter => 2 docs returned  , as it checks for *"soap"* 
 and *"powder"* separately.
 
+Scenario 2 (screenshot attached)+ 
*with* Synonym Graph Filter => only 1 doc returned, but 2 were expected. Here 
only *"soap powder"* is being checked and it is not tokenized into "soap" and 
"powder" and searched further.

Is it possible to expand query string - *soap powder* as:

Synonym(soap powder) + Synonym(soap) + Synonym(powder)

 

Thank You.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-04-24 Thread Tomas Eduardo Fernandez Lobbe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091785#comment-17091785
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-13289:
--

Thanks Ishan, I've made some progress to. I can look at your changes and merge 
what's needed

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091784#comment-17091784
 ] 

ASF subversion and git services commented on LUCENE-7788:
-

Commit ecc98e8698a3ce8efa51712686697c0f33afab4d in lucene-solr's branch 
refs/heads/master from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ecc98e8 ]

LUCENE-7788: fail precommit on unparameterised log messages and examine for 
wasted work/objects


> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, 
> gradle_only.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7788) fail precommit on unparameterised log messages and examine for wasted work/objects

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-7788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091783#comment-17091783
 ] 

ASF subversion and git services commented on LUCENE-7788:
-

Commit 83f090877b0590a1d99c79cfeec076dfed963076 in lucene-solr's branch 
refs/heads/branch_8x from Erick Erickson
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=83f0908 ]

LUCENE-7788: fail precommit on unparameterised log messages and examine for 
wasted work/objects


> fail precommit on unparameterised log messages and examine for wasted 
> work/objects
> --
>
> Key: LUCENE-7788
> URL: https://issues.apache.org/jira/browse/LUCENE-7788
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Erick Erickson
>Priority: Minor
> Attachments: LUCENE-7788.patch, LUCENE-7788.patch, gradle_only.patch, 
> gradle_only.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> SOLR-10415 would be removing existing unparameterised log.trace messages use 
> and once that is in place then this ticket's one-line change would be for 
> 'ant precommit' to reject any future unparameterised log.trace message use.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13886) HDFSSyncSliceTest and SyncSliceTest started failing frequently

2020-04-24 Thread Kevin Risden (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091773#comment-17091773
 ] 

Kevin Risden commented on SOLR-13886:
-

Thanks [~erickerickson] yea I've been checking about once a day too and haven't 
seen any failures. It also fixed local Jenkins run failures I had seen for this 
test.

> HDFSSyncSliceTest and SyncSliceTest started failing frequently
> --
>
> Key: SOLR-13886
> URL: https://issues.apache.org/jira/browse/SOLR-13886
> Project: Solr
>  Issue Type: Bug
>  Components: Tests
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-13886.patch, SOLR-13886_jenkins_log.txt.gz
>
>
> While I can see some failures of this test in the past, they weren't frequent 
> and were usually things like port bindings (maybe SOLR-13871) or timeouts. 
> I've started this failure in Jenkins (and locally) frequently:
> {noformat}
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/5410/
> Java: 64bit/jdk-13 -XX:-UseCompressedOops -XX:+UseParallelGC
> 2 tests failed.
> FAILED:  org.apache.solr.cloud.SyncSliceTest.test
> Error Message:
> expected:<5> but was:<4>
> Stack Trace:
> java.lang.AssertionError: expected:<5> but was:<4>
> at 
> __randomizedtesting.SeedInfo.seed([F8E3B768E16E848D:70B788B24F92E975]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at org.apache.solr.cloud.SyncSliceTest.test(SyncSliceTest.java:150)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1082)
> at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1054)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> com.carrotsearch.randomize

[jira] [Commented] (SOLR-13886) HDFSSyncSliceTest and SyncSliceTest started failing frequently

2020-04-24 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091766#comment-17091766
 ] 

Erick Erickson commented on SOLR-13886:
---

BTW, I checked Hoss' rollup yesterday and today for the previous 24 hours (48 
hours total) and there are no failures reported for either of these, whereas 
there are for the last 7 days. So that's encouraging. Missed looking on 
Wednesday

The BadApple report next Monday will still pick up some runs from before this 
was checked in, so don't panic if you see that.

> HDFSSyncSliceTest and SyncSliceTest started failing frequently
> --
>
> Key: SOLR-13886
> URL: https://issues.apache.org/jira/browse/SOLR-13886
> Project: Solr
>  Issue Type: Bug
>  Components: Tests
>Reporter: Tomas Eduardo Fernandez Lobbe
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-13886.patch, SOLR-13886_jenkins_log.txt.gz
>
>
> While I can see some failures of this test in the past, they weren't frequent 
> and were usually things like port bindings (maybe SOLR-13871) or timeouts. 
> I've started this failure in Jenkins (and locally) frequently:
> {noformat}
> Build: https://jenkins.thetaphi.de/job/Lucene-Solr-master-MacOSX/5410/
> Java: 64bit/jdk-13 -XX:-UseCompressedOops -XX:+UseParallelGC
> 2 tests failed.
> FAILED:  org.apache.solr.cloud.SyncSliceTest.test
> Error Message:
> expected:<5> but was:<4>
> Stack Trace:
> java.lang.AssertionError: expected:<5> but was:<4>
> at 
> __randomizedtesting.SeedInfo.seed([F8E3B768E16E848D:70B788B24F92E975]:0)
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.failNotEquals(Assert.java:834)
> at org.junit.Assert.assertEquals(Assert.java:645)
> at org.junit.Assert.assertEquals(Assert.java:631)
> at org.apache.solr.cloud.SyncSliceTest.test(SyncSliceTest.java:150)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.base/java.lang.reflect.Method.invoke(Method.java:567)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1750)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:938)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:974)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:988)
> at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsFixedStatement.callStatement(BaseDistributedSearchTestCase.java:1082)
> at 
> org.apache.solr.BaseDistributedSearchTestCase$ShardsRepeatRule$ShardsStatement.evaluate(BaseDistributedSearchTestCase.java:1054)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:57)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:49)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:368)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:817)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:468)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:947)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:832)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:883)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:894)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesR

[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091754#comment-17091754
 ] 

Mike Drob commented on SOLR-14428:
--

I've created https://github.com/apache/lucene-solr/pull/1455 for this, but that 
I haven't really tested it yet. Wanted to get feedback on the approach before 
spending too much time on it. [~ab] WDYT?

[~cjcowie] do you think this is something you're able to try in your 
environment as well?

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob opened a new pull request #1455: SOLR-14428 minimize memory footprint of fuzzy query

2020-04-24 Thread GitBox


madrob opened a new pull request #1455:
URL: https://github.com/apache/lucene-solr/pull/1455


   https://issues.apache.org/jira/browse/SOLR-14428
   
   Make the automata of a fuzzy query mutable so that we don't always have to 
store them. However, there will be cases when we need to recompute them anyway.
   
   Will add unit tests if this approach makes sense.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14428) FuzzyQuery has severe memory usage in 8.5

2020-04-24 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091714#comment-17091714
 ] 

Mike Drob commented on SOLR-14428:
--

bq. Maybe there is an elegant way to store a stripped down FuzzyQuery in the 
cache?

I think this runs into problems with auto warming based on the contents of the 
cache when we open a new searcher. Possibly need a way to rebuild it 
afterwards, which feels like we're going backwards from LUCENE-9068.

Related, [~romseygeek] - should 
[BASE_RAM_BYTES|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/FuzzyQuery.java#L58]
 use {{FuzzyQuery.class}} instead of {{AutomatonQuery.class}}?

> FuzzyQuery has severe memory usage in 8.5
> -
>
> Key: SOLR-14428
> URL: https://issues.apache.org/jira/browse/SOLR-14428
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.5, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: FuzzyHammer.java, image-2020-04-23-09-18-06-070.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:  1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12798) Structural changes in SolrJ since version 7.0.0 have effectively disabled multipart post

2020-04-24 Thread Christian Beikov (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091696#comment-17091696
 ] 

Christian Beikov commented on SOLR-12798:
-

I'm having a URI too long issue when doing a {{ContentStreamUpdateRequest.}} I 
guess my params are too big(contain metadata). What do I need to do to switch 
to multipart POST?

> Structural changes in SolrJ since version 7.0.0 have effectively disabled 
> multipart post
> 
>
> Key: SOLR-12798
> URL: https://issues.apache.org/jira/browse/SOLR-12798
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrJ
>Affects Versions: 7.4
>Reporter: Karl Wright
>Assignee: Karl Wright
>Priority: Major
> Attachments: HOT Balloon Trip_Ultra HD.jpg, 
> SOLR-12798-approach.patch, SOLR-12798-reproducer.patch, 
> SOLR-12798-workaround.patch, SOLR-12798.patch, SOLR-12798.patch, 
> SOLR-12798.patch, no params in url.png, solr-update-request.txt
>
>
> Project ManifoldCF uses SolrJ to post documents to Solr.  When upgrading from 
> SolrJ 7.0.x to SolrJ 7.4, we encountered significant structural changes to 
> SolrJ's HttpSolrClient class that seemingly disable any use of multipart 
> post.  This is critical because ManifoldCF's documents often contain metadata 
> in excess of 4K that therefore cannot be stuffed into a URL.
> The changes in question seem to have been performed by Paul Noble on 
> 10/31/2017, with the introduction of the RequestWriter mechanism.  Basically, 
> if a request has a RequestWriter, it is used exclusively to write the 
> request, and that overrides the stream mechanism completely.  I haven't 
> chased it back to a specific ticket.
> ManifoldCF's usage of SolrJ involves the creation of 
> ContentStreamUpdateRequests for all posts meant for Solr Cell, and the 
> creation of UpdateRequests for posts not meant for Solr Cell (as well as for 
> delete and commit requests).  For our release cycle that is taking place 
> right now, we're shipping a modified version of HttpSolrClient that ignores 
> the RequestWriter when dealing with ContentStreamUpdateRequests.  We 
> apparently cannot use multipart for all requests because on the Solr side we 
> get "pfountz Should not get here!" errors on the Solr side when we do, which 
> generate HTTP error code 500 responses.  That should not happen either, in my 
> opinion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9322) Discussing a unified vectors format API

2020-04-24 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091684#comment-17091684
 ] 

Mayya Sharipova commented on LUCENE-9322:
-

> It is implemented by enum with {{distance()}} function. Also, I think it 
>would be good to persist (in the codec) which distance metric we use for the 
>field.

 

May be for now, it is worth to keep the API simple and use euclidean distance.  
Both ann approaches we would like to pursue: HNSW and Clustering based approach 
use euclidean distance.

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9322) Discussing a unified vectors format API

2020-04-24 Thread Mayya Sharipova (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091684#comment-17091684
 ] 

Mayya Sharipova edited comment on LUCENE-9322 at 4/24/20, 3:57 PM:
---

> It is implemented by enum with {{distance()}} function. Also, I think it 
>would be good to persist (in the codec) which distance metric we use for the 
>field.

May be for now, it is worth to keep the API simple and use euclidean distance.  
Both ann approaches we would like to pursue: HNSW and Clustering based approach 
use euclidean distance.


was (Author: mayyas):
> It is implemented by enum with {{distance()}} function. Also, I think it 
>would be good to persist (in the codec) which distance metric we use for the 
>field.

 

May be for now, it is worth to keep the API simple and use euclidean distance.  
Both ann approaches we would like to pursue: HNSW and Clustering based approach 
use euclidean distance.

> Discussing a unified vectors format API
> ---
>
> Key: LUCENE-9322
> URL: https://issues.apache.org/jira/browse/LUCENE-9322
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Julie Tibshirani
>Priority: Major
>
> Two different approximate nearest neighbor approaches are currently being 
> developed, one based on HNSW ([#LUCENE-9004]) and another based on coarse 
> quantization ([#LUCENE-9136]). Each prototype proposes to add a new format to 
> handle vectors. In LUCENE-9136 we discussed the possibility of a unified API 
> that could support both approaches. The two ANN strategies give different 
> trade-offs in terms of speed, memory, and complexity, and it’s likely that 
> we’ll want to support both. Vector search is also an active research area, 
> and it would be great to be able to prototype and incorporate new approaches 
> without introducing more formats.
> To me it seems like a good time to begin discussing a unified API. The 
> prototype for coarse quantization 
> ([https://github.com/apache/lucene-solr/pull/1314]) could be ready to commit 
> soon (this depends on everyone's feedback of course). The approach is simple 
> and shows solid search performance, as seen 
> [here|https://github.com/apache/lucene-solr/pull/1314#issuecomment-608645326].
>  I think this API discussion is an important step in moving that 
> implementation forward.
> The goals of the API would be
> # Support for storing and retrieving individual float vectors.
> # Support for approximate nearest neighbor search -- given a query vector, 
> return the indexed vectors that are closest to it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13839) MaxScore is returned as NAN when group.query doesn't match any docs

2020-04-24 Thread Mike Drob (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091675#comment-17091675
 ] 

Mike Drob commented on SOLR-13839:
--

There are a couple more Float.NaN in that method later, do we need to take care 
of those too? I'm not sure what code paths will lead to each one.

> MaxScore is returned as NAN when group.query doesn't match any docs
> ---
>
> Key: SOLR-13839
> URL: https://issues.apache.org/jira/browse/SOLR-13839
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Reporter: Munendra S N
>Priority: Minor
> Attachments: SOLR-13839.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the main query matches some products but group.query doesn't match any 
> docs then maxScore=NAN would be returned in the response.
> * This happens only in standalone/single shard mode
> * score needs to fetched in the response to encounter this issue



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1330: LUCENE-9267 Replace getQueryBuildTime time unit from ms to ns

2020-04-24 Thread GitBox


madrob commented on pull request #1330:
URL: https://github.com/apache/lucene-solr/pull/1330#issuecomment-619086921


   I took care of the squash for you, thanks! Also I added an entry to CHANGES 
- committed in 013e983



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9267) The documentation of getQueryBuildTime function reports a wrong time unit.

2020-04-24 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved LUCENE-9267.
---
Fix Version/s: master (9.0)
 Assignee: Mike Drob
   Resolution: Fixed

Thanks for the patch! Committed to master!

> The documentation of getQueryBuildTime function reports a wrong time unit.
> --
>
> Key: LUCENE-9267
> URL: https://issues.apache.org/jira/browse/LUCENE-9267
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/other
>Affects Versions: 8.2, 8.3, 8.4
>Reporter: Pierre-Luc Perron
>Assignee: Mike Drob
>Priority: Trivial
>  Labels: documentation, newbie, pull-request-available
> Fix For: master (9.0)
>
> Attachments: LUCENE-9267.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As per documentation, the 
> [MatchingQueries|https://lucene.apache.org/core/8_4_1/monitor/org/apache/lucene/monitor/MatchingQueries.html]
>  class returns both getQueryBuildTime and getSearchTime in milliseconds. The 
> code shows 
> [searchTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/CandidateMatcher.java#L112]
>  returning milliseconds. However, the code shows 
> [buildTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java#L280]
>  returning nanoseconds.
> The patch changes the documentation of getQueryBuildTime to report 
> nanoseconds instead of milliseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9267) The documentation of getQueryBuildTime function reports a wrong time unit.

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091671#comment-17091671
 ] 

ASF subversion and git services commented on LUCENE-9267:
-

Commit 013e98347a011664bff18f72d7c24eb97b1201d9 in lucene-solr's branch 
refs/heads/master from Pierre-Luc Perron
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=013e983 ]

LUCENE-9267 Replace getQueryBuildTime time unit from ms to ns


> The documentation of getQueryBuildTime function reports a wrong time unit.
> --
>
> Key: LUCENE-9267
> URL: https://issues.apache.org/jira/browse/LUCENE-9267
> Project: Lucene - Core
>  Issue Type: Task
>  Components: modules/other
>Affects Versions: 8.2, 8.3, 8.4
>Reporter: Pierre-Luc Perron
>Priority: Trivial
>  Labels: documentation, newbie, pull-request-available
> Attachments: LUCENE-9267.patch
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> As per documentation, the 
> [MatchingQueries|https://lucene.apache.org/core/8_4_1/monitor/org/apache/lucene/monitor/MatchingQueries.html]
>  class returns both getQueryBuildTime and getSearchTime in milliseconds. The 
> code shows 
> [searchTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/CandidateMatcher.java#L112]
>  returning milliseconds. However, the code shows 
> [buildTime|https://github.com/apache/lucene-solr/blob/320578274be74a18ce150b604d28a740545fde48/lucene/monitor/src/java/org/apache/lucene/monitor/QueryIndex.java#L280]
>  returning nanoseconds.
> The patch changes the documentation of getQueryBuildTime to report 
> nanoseconds instead of milliseconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1355: LUCENE-9279: Update dictionary version for Ukrainian analyzer

2020-04-24 Thread GitBox


madrob commented on pull request #1355:
URL: https://github.com/apache/lucene-solr/pull/1355#issuecomment-619079133


   Fixed in 7fe6f9c57d



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1371: SOLR-14333: print readable version of CollapsedPostFilter query

2020-04-24 Thread GitBox


madrob commented on a change in pull request #1371:
URL: https://github.com/apache/lucene-solr/pull/1371#discussion_r414654670



##
File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
##
@@ -218,7 +245,7 @@ public static GroupHeadSelector build(final SolrParams 
localParams) {
 public String hint;
 private boolean needsScores = true;
 private boolean needsScores4Collapsing = false;
-private int nullPolicy;
+private NullPolicy nullPolicy;
 private Set boosted; // ordered by "priority"
 public static final int NULL_POLICY_IGNORE = 0;

Review comment:
   These constants can be removed and everything routed through the 
NullPolicy enum. Same for the `NULL_COLLAPSE` string above and others.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1391: SOLR-14014 Add a disable Admin UI Flag

2020-04-24 Thread GitBox


madrob commented on a change in pull request #1391:
URL: https://github.com/apache/lucene-solr/pull/1391#discussion_r414646287



##
File path: solr/core/src/java/org/apache/solr/servlet/LoadAdminUiServlet.java
##
@@ -24,6 +24,7 @@
 import org.apache.solr.core.CoreContainer;
 import org.apache.solr.core.SolrCore;
 
+import javax.servlet.ServletException;

Review comment:
   This looks like an unused import to me?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-9679) Exception when removing zk node /security.json

2020-04-24 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-9679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091630#comment-17091630
 ] 

Jan Høydahl commented on SOLR-9679:
---

Any further comments? Probably needs a test...

> Exception when removing zk node /security.json
> --
>
> Key: SOLR-9679
> URL: https://issues.apache.org/jira/browse/SOLR-9679
> Project: Solr
>  Issue Type: Bug
>  Components: Authentication
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> To reproduce:
> # Upload {{security.json}} to zk
> # {{bin/solr zk rm zk:/security.json -z localhost:9983}}
> {noformat}
> 2016-10-22 22:17:32.264 DEBUG (main-EventThread) [   ] o.a.s.c.c.SolrZkClient 
> Submitting job to respond to event WatchedEvent state:SyncConnected 
> type:NodeDeleted path:/security.json
> 2016-10-22 22:17:32.265 DEBUG 
> (zkCallback-3-thread-1-processing-n:192.168.0.11:8983_solr) [   ] 
> o.a.s.c.c.ZkStateReader Updating [/security.json] ... 
> 2016-10-22 22:17:32.266 ERROR 
> (zkCallback-3-thread-1-processing-n:192.168.0.11:8983_solr) [   ] 
> o.a.s.c.c.ZkStateReader A ZK error has occurred
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /security.json
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353)
>   at 
> org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>   at 
> org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353)
>   at 
> org.apache.solr.common.cloud.ZkStateReader$3.process(ZkStateReader.java:455)
>   at 
> org.apache.solr.common.cloud.SolrZkClient$3.lambda$process$0(SolrZkClient.java:268)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}
> I'm not sure what should happen, but it would be sweet to be able to disable 
> security by simply removing the znode... [~noble.paul] ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] s1monw opened a new pull request #1454: Consolidate all IW locking inside IndexWriter

2020-04-24 Thread GitBox


s1monw opened a new pull request #1454:
URL: https://github.com/apache/lucene-solr/pull/1454


   Today we still have one class that runs some trickly logic that should
   be in the IndexWriter in the first place since it requires locking on
   the IndexWriter itself. This change inverts the API and now 
FrozendBufferedUpdates
   does not get the IndexWriter passed in, instead the IndexWriter owns most of 
the logic
   and executes on a FrozenBufferedUpdates object. This prevent locking on 
IndexWriter out
   side of the writer itself and paves the way to simplify some concurrency 
down the road



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9345) Separate IndexWriter from MergeScheduler

2020-04-24 Thread Simon Willnauer (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-9345.
-
Fix Version/s: 8.6
   master (9.0)
Lucene Fields: New,Patch Available  (was: New)
 Assignee: Simon Willnauer
   Resolution: Fixed

> Separate IndexWriter from MergeScheduler
> 
>
> Key: LUCENE-9345
> URL: https://issues.apache.org/jira/browse/LUCENE-9345
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Simon Willnauer
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.6
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> MergeScheduler is tightly coupled with IndexWriter which causes IW to expose 
> unnecessary methods. For instance only the scheduler should call 
> IW#getNextMerge() but it's a public method. With some refactorings we can 
> nicely separate the two. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9345) Separate IndexWriter from MergeScheduler

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091575#comment-17091575
 ] 

ASF subversion and git services commented on LUCENE-9345:
-

Commit 9598d43bb629b0434e7ce557fe12ea88c19a5d00 in lucene-solr's branch 
refs/heads/branch_8x from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9598d43 ]

LUCENE-9345: Separate MergeSchedulder from IndexWriter (#1451)

This change extracts the methods that are used by MergeScheduler into
a MergeSource interface. This allows IndexWriter to better ensure
locking, hide internal methods and removes the tight coupling between the two
complex classes. This will also improve future testing.

> Separate IndexWriter from MergeScheduler
> 
>
> Key: LUCENE-9345
> URL: https://issues.apache.org/jira/browse/LUCENE-9345
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> MergeScheduler is tightly coupled with IndexWriter which causes IW to expose 
> unnecessary methods. For instance only the scheduler should call 
> IW#getNextMerge() but it's a public method. With some refactorings we can 
> nicely separate the two. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9345) Separate IndexWriter from MergeScheduler

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091549#comment-17091549
 ] 

ASF subversion and git services commented on LUCENE-9345:
-

Commit d7e0b906abcbc43d3737224cadcda7d2c795ccb0 in lucene-solr's branch 
refs/heads/master from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d7e0b90 ]

LUCENE-9345: Separate MergeSchedulder from IndexWriter (#1451)

This change extracts the methods that are used by MergeScheduler into
a MergeSource interface. This allows IndexWriter to better ensure
locking, hide internal methods and removes the tight coupling between the two
complex classes. This will also improve future testing.

> Separate IndexWriter from MergeScheduler
> 
>
> Key: LUCENE-9345
> URL: https://issues.apache.org/jira/browse/LUCENE-9345
> Project: Lucene - Core
>  Issue Type: Improvement
>Affects Versions: master (9.0)
>Reporter: Simon Willnauer
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> MergeScheduler is tightly coupled with IndexWriter which causes IW to expose 
> unnecessary methods. For instance only the scheduler should call 
> IW#getNextMerge() but it's a public method. With some refactorings we can 
> nicely separate the two. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9338) Clean up type safety in SimpleBindings

2020-04-24 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-9338:
--
Fix Version/s: 8.6

> Clean up type safety in SimpleBindings
> --
>
> Key: LUCENE-9338
> URL: https://issues.apache.org/jira/browse/LUCENE-9338
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SimpleBindings holds its bindings as a Map, and then casts 
> things when it builds its value sources.  We can instead store a map of 
> Supplier and avoid casts entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method

2020-04-24 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9340.
---
Fix Version/s: 8.6
   Resolution: Fixed

> Deprecate and remove the SimpleBindings.add(SortField) method
> -
>
> Key: LUCENE-9340
> URL: https://issues.apache.org/jira/browse/LUCENE-9340
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 8.6
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This method is trappy, in that it only works for certain types of SortField 
> and you only find out which at runtime.  We should deprecate it and encourage 
> users to pass an equivalent DoubleValuesSource instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9338) Clean up type safety in SimpleBindings

2020-04-24 Thread Alan Woodward (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward resolved LUCENE-9338.
---
Resolution: Fixed

> Clean up type safety in SimpleBindings
> --
>
> Key: LUCENE-9338
> URL: https://issues.apache.org/jira/browse/LUCENE-9338
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SimpleBindings holds its bindings as a Map, and then casts 
> things when it builds its value sources.  We can instead store a map of 
> Supplier and avoid casts entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091485#comment-17091485
 ] 

ASF subversion and git services commented on LUCENE-9340:
-

Commit 5eb117f561ab691f34409943ae1f85781735f8e0 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5eb117f ]

LUCENE-9340: Remove deprecated SimpleBindings#add(SortField) method


> Deprecate and remove the SimpleBindings.add(SortField) method
> -
>
> Key: LUCENE-9340
> URL: https://issues.apache.org/jira/browse/LUCENE-9340
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This method is trappy, in that it only works for certain types of SortField 
> and you only find out which at runtime.  We should deprecate it and encourage 
> users to pass an equivalent DoubleValuesSource instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091483#comment-17091483
 ] 

ASF subversion and git services commented on LUCENE-9340:
-

Commit 72888bced33ff6c85102b162ff2a7303a17e253f in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=72888bc ]

LUCENE-9340: Deprecate SimpleBindings#add(SortField) (#1447)

This method is trappy; it doesn't work for all SortField types, but doesn't tell
you that until runtime. This commit deprecates it, and removes all other
callsites in the codebase.


> Deprecate and remove the SimpleBindings.add(SortField) method
> -
>
> Key: LUCENE-9340
> URL: https://issues.apache.org/jira/browse/LUCENE-9340
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This method is trappy, in that it only works for certain types of SortField 
> and you only find out which at runtime.  We should deprecate it and encourage 
> users to pass an equivalent DoubleValuesSource instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9340) Deprecate and remove the SimpleBindings.add(SortField) method

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091472#comment-17091472
 ] 

ASF subversion and git services commented on LUCENE-9340:
-

Commit f6462ee35056f92bcfeed5f251d5372506e66b57 in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f6462ee ]

LUCENE-9340: Deprecate SimpleBindings#add(SortField) (#1447)

This method is trappy; it doesn't work for all SortField types, but doesn't tell
you that until runtime. This commit deprecates it, and removes all other
callsites in the codebase.

> Deprecate and remove the SimpleBindings.add(SortField) method
> -
>
> Key: LUCENE-9340
> URL: https://issues.apache.org/jira/browse/LUCENE-9340
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> This method is trappy, in that it only works for certain types of SortField 
> and you only find out which at runtime.  We should deprecate it and encourage 
> users to pass an equivalent DoubleValuesSource instead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery

2020-04-24 Thread Stamatis Zampetakis (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091408#comment-17091408
 ] 

Stamatis Zampetakis commented on LUCENE-8811:
-

So if I interpret well your response [~romseygeek], you are saying that 
{{TermInSetQuery}} should accept an unlimited number of terms. Is that correct?

Looking again into the summary and discussion of this issue, I see that the 
goal was to "make this check more consistent across queries". I don't clearly 
see why {{TermInSetQuery}} should remain unbounded. 

On the other hand, if we wanted to enforce the check only for a 
{{BooleanQuery}} then why using the {{getNumClausesCheckVisitor}} visitor on 
every query. I see that {{TermInSetQuery#visit}} for example already iterates 
through all terms so if we don't need then we are just wasting CPU cycles 
without a very good reason.

Sorry to insist on this but it is a change that will likely break our current 
implementation in the downstream project and I guess it will also affect quite 
a few others. 

> Add maximum clause count check to IndexSearcher rather than BooleanQuery
> 
>
> Key: LUCENE-8811
> URL: https://issues.apache.org/jira/browse/LUCENE-8811
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: master (9.0)
>
> Attachments: LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch, 
> LUCENE-8811.patch, LUCENE-8811.patch, LUCENE-8811.patch
>
>
> Currently we only check whether boolean queries have too many clauses. 
> However there are other ways that queries may have too many clauses, for 
> instance if you have boolean queries that have themselves inner boolean 
> queries.
> Could we use the new Query visitor API to move this check from BooleanQuery 
> to IndexSearcher in order to make this check more consistent across queries? 
> See for instance LUCENE-8810 where a rewrite rule caused the maximum clause 
> count to be hit even though the total number of leaf queries remained the 
> same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide

2020-04-24 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-14435:
--
Attachment: SOLR-14435-01.patch
Status: Open  (was: Open)

> createNodeSet and createNodeSet.shuffle parameters missing from Collection 
> Restore RefGuide
> ---
>
> Key: SOLR-14435
> URL: https://issues.apache.org/jira/browse/SOLR-14435
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14435-01.patch
>
>
> Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are 
> supported by the Collection RESTORE command (I've tested it), they are 
> missing from the documentation:
> [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide

2020-04-24 Thread Andras Salamon (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andras Salamon updated SOLR-14435:
--
Status: Patch Available  (was: Open)

> createNodeSet and createNodeSet.shuffle parameters missing from Collection 
> Restore RefGuide
> ---
>
> Key: SOLR-14435
> URL: https://issues.apache.org/jira/browse/SOLR-14435
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Andras Salamon
>Priority: Minor
> Attachments: SOLR-14435-01.patch
>
>
> Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are 
> supported by the Collection RESTORE command (I've tested it), they are 
> missing from the documentation:
> [https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14435) createNodeSet and createNodeSet.shuffle parameters missing from Collection Restore RefGuide

2020-04-24 Thread Andras Salamon (Jira)
Andras Salamon created SOLR-14435:
-

 Summary: createNodeSet and createNodeSet.shuffle parameters 
missing from Collection Restore RefGuide
 Key: SOLR-14435
 URL: https://issues.apache.org/jira/browse/SOLR-14435
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: documentation
Reporter: Andras Salamon


Although {{createNodeSet}} and {{createNodeSet.shuffle}} parameters are 
supported by the Collection RESTORE command (I've tested it), they are missing 
from the documentation:

[https://lucene.apache.org/solr/guide/8_5/collection-management.html#collection-management]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9338) Clean up type safety in SimpleBindings

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091388#comment-17091388
 ] 

ASF subversion and git services commented on LUCENE-9338:
-

Commit b66b970d1b06e56b0d685613855be5ef6d1a2c60 in lucene-solr's branch 
refs/heads/branch_8x from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b66b970 ]

LUCENE-9338: Clean up type safety in SimpleBindings (#1444)

Replaces SimpleBindings' Map with a map of
Function to improve type safety, and
reworks cycle detection and validation to avoid catching 
StackOverflowException

> Clean up type safety in SimpleBindings
> --
>
> Key: LUCENE-9338
> URL: https://issues.apache.org/jira/browse/LUCENE-9338
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SimpleBindings holds its bindings as a Map, and then casts 
> things when it builds its value sources.  We can instead store a map of 
> Supplier and avoid casts entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9338) Clean up type safety in SimpleBindings

2020-04-24 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091372#comment-17091372
 ] 

ASF subversion and git services commented on LUCENE-9338:
-

Commit ed3caab2d86b69ec4b3ed8e787827c0931b43d1b in lucene-solr's branch 
refs/heads/master from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ed3caab ]

LUCENE-9338: Clean up type safety in SimpleBindings (#1444)

Replaces SimpleBindings' Map with a map of
Function to improve type safety, and
reworks cycle detection and validation to avoid catching 
StackOverflowException

> Clean up type safety in SimpleBindings
> --
>
> Key: LUCENE-9338
> URL: https://issues.apache.org/jira/browse/LUCENE-9338
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> SimpleBindings holds its bindings as a Map, and then casts 
> things when it builds its value sources.  We can instead store a map of 
> Supplier and avoid casts entirely.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1351: LUCENE-9280: Collectors to skip noncompetitive documents

2020-04-24 Thread GitBox


jpountz commented on a change in pull request #1351:
URL: https://github.com/apache/lucene-solr/pull/1351#discussion_r414392251



##
File path: 
lucene/core/src/java/org/apache/lucene/search/FilterLeafCollector.java
##
@@ -53,4 +53,8 @@ public String toString() {
 return name + "(" + in + ")";
   }
 
+  @Override
+  public DocIdSetIterator competitiveIterator() {
+return in.competitiveIterator();
+  }

Review comment:
   We've had endless discussions about whether or not to delegate in 
FilterXXX classes and I think that the consensus is that we should only 
delegate abstract methods. Since this one has a default implementation, let's 
not delegate and look for extensions of FilterCollector that should delegate 
it? (e.g. asserting collectors)

##
File path: 
lucene/core/src/java/org/apache/lucene/search/FilteringFieldComparator.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+
+/**
+ * Decorates a wrapped FieldComparator to add a functionality to skip over 
non-competitive docs.
+ * FilteringFieldComparator provides two additional functions for a 
FieldComparator:
+ * 1) {@code competitiveIterator()} that returns an iterator over
+ *  competitive docs that are stronger than already collected docs.
+ * 2) {@code setCanUpdateIterator()} that notifies the comparator when it is 
ok to start updating its internal iterator.
+ *  This method is called from a collector to inform the comparator to start 
updating its iterator.
+ */
+public abstract class FilteringFieldComparator extends FieldComparator {
+final FieldComparator in;
+
+public FilteringFieldComparator(FieldComparator in) {
+this.in = in;
+}
+
+protected abstract DocIdSetIterator competitiveIterator();
+
+protected abstract void setCanUpdateIterator() throws IOException;

Review comment:
   can you add javadocs?

##
File path: 
lucene/core/src/java/org/apache/lucene/search/FilteringFieldComparator.java
##
@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+
+/**
+ * Decorates a wrapped FieldComparator to add a functionality to skip over 
non-competitive docs.
+ * FilteringFieldComparator provides two additional functions for a 
FieldComparator:
+ * 1) {@code competitiveIterator()} that returns an iterator over
+ *  competitive docs that are stronger than already collected docs.
+ * 2) {@code setCanUpdateIterator()} that notifies the comparator when it is 
ok to start updating its internal iterator.
+ *  This method is called from a collector to inform the comparator to start 
updating its iterator.
+ */
+public abstract class FilteringFieldComparator extends FieldComparator {
+final FieldComparator in;
+
+public FilteringFieldComparator(FieldComparator in) {
+this.in = in;
+}
+
+protected abstract DocIdSetIterator competitiveIterator();

Review comment:
   Let's only have this method on LeafFieldCompatarors, e.g. by doing 
something like this? FieldComparators are top-level objects so it doesn't make 
sense to have leaf-level objects defined on them like DocIdSetIterators.
   
   ```suggestion
 @Override
 public abstract FilteringLeafFieldComparator 
getLeafComparator(LeafReaderContext context) throws IOException; // covariant 
return ty

[jira] [Updated] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?

2020-04-24 Thread Ignacio Vera (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ignacio Vera updated LUCENE-9087:
-
Description: 
Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
constructor. For the current default codec the value is set to 1024. This is a 
good compromise between memory usage and performance of the BKD tree.

Lowering this value can increase search performance but it has a penalty in 
memory usage. Now that the BKD tree can be load off-heap, this can be less of a 
concern. Note that lowering too much that value can hurt performance as well as 
the tree becomes too deep and benefits are gone.

For data types that use the tree as an effective R-tree (ranges and shapes 
datatypes) the benefits are larger as it can minimise the overlap between leaf 
nodes. 

Finally, creating too many leaf nodes can be dangerous at write time as memory 
usage depends on the number of leaf nodes created. The writer creates a long 
array of length = numberOfLeafNodes.

What I am wondering here is if we can improve this situation in order to create 
the most efficient tree? My current ideas are:

 
 * We can adapt the points per leaf depending on that number so we create a 
tree with the best depth and best points per leaf. Note that for the for 1D 
case we have an upper estimation of the number of points that the tree will be 
indexing. 
 * Add a mechanism so field types can easily define their best points per leaf. 
In the case, field types like ranges or shapes can define its own value to 
minimise overlap.
 * Maybe the default is just too high now that we can load the tree off-heap.

 

Any thoughts?

 

 

 

 

 

 

  was:
Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
constructor. For the current default codec the value is set to 1200. This is a 
good compromise between memory usage and performance of the BKD tree.

Lowering this value can increase search performance but it has a penalty in 
memory usage. Now that the BKD tree can be load off-heap, this can be less of a 
concern. Note that lowering too much that value can hurt performance as well as 
the tree becomes too deep and benefits are gone.

For data types that use the tree as an effective R-tree (ranges and shapes 
datatypes) the benefits are larger as it can minimise the overlap between leaf 
nodes. 

Finally, creating too many leaf nodes can be dangerous at write time as memory 
usage depends on the number of leaf nodes created. The writer creates a long 
array of length = numberOfLeafNodes.

What I am wondering here is if we can improve this situation in order to create 
the most efficient tree? My current ideas are:

 
 * We can adapt the points per leaf depending on that number so we create a 
tree with the best depth and best points per leaf. Note that for the for 1D 
case we have an upper estimation of the number of points that the tree will be 
indexing. 
 * Add a mechanism so field types can easily define their best points per leaf. 
In the case, field types like ranges or shapes can define its own value to 
minimise overlap.
 * Maybe the default is just too high now that we can load the tree off-heap.

 

Any thoughts?

 

 

 

 

 

 


> Should the BKD tree use a fixed maxPointsInLeafNode? 
> -
>
> Key: LUCENE-9087
> URL: https://issues.apache.org/jira/browse/LUCENE-9087
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
>
> Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
> constructor. For the current default codec the value is set to 1024. This is 
> a good compromise between memory usage and performance of the BKD tree.
> Lowering this value can increase search performance but it has a penalty in 
> memory usage. Now that the BKD tree can be load off-heap, this can be less of 
> a concern. Note that lowering too much that value can hurt performance as 
> well as the tree becomes too deep and benefits are gone.
> For data types that use the tree as an effective R-tree (ranges and shapes 
> datatypes) the benefits are larger as it can minimise the overlap between 
> leaf nodes. 
> Finally, creating too many leaf nodes can be dangerous at write time as 
> memory usage depends on the number of leaf nodes created. The writer creates 
> a long array of length = numberOfLeafNodes.
> What I am wondering here is if we can improve this situation in order to 
> create the most efficient tree? My current ideas are:
>  
>  * We can adapt the points per leaf depending on that number so we create a 
> tree with the best depth and best points per leaf. Note that for the for 1D 
> case we have an upper estimation of the number of points that the tree will 
> be indexing. 
>  * Add a mechanism so field types can easily de

[jira] [Commented] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2020-04-24 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091320#comment-17091320
 ] 

Adrien Grand commented on LUCENE-9346:
--

In terms of implementation, I wonder that it should be mostly about making sure 
that {{WANDScorer.tailSize}} never gets greater than or equal to 
{{minimumNumberShouldMatch}}. I don't have plans to work on it in the near 
future so feel free to take it if you're interested, I can help with the 
reviews.

> WANDScorer should support minimumNumberShouldMatch
> --
>
> Key: LUCENE-9346
> URL: https://issues.apache.org/jira/browse/LUCENE-9346
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
> back to a scorer that doesn't dynamically prune hits based on scores.
> Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
> could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
> match. Then any improvements we bring to WANDScorer like two-phase support 
> (LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9346) WANDScorer should support minimumNumberShouldMatch

2020-04-24 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9346:


 Summary: WANDScorer should support minimumNumberShouldMatch
 Key: LUCENE-9346
 URL: https://issues.apache.org/jira/browse/LUCENE-9346
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


Currently we deoptimize when a minimumNumberShouldMatch is provided and fall 
back to a scorer that doesn't dynamically prune hits based on scores.

Given how WANDScorer and MinShouldMatchSumScorer are similar I wonder if we 
could remove MinShouldSumScorer once WANDScorer supports minimumNumberShould 
match. Then any improvements we bring to WANDScorer like two-phase support 
(LUCENE-8806) would automatically cover more queries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1444: LUCENE-9338: Clean up type safety in SimpleBindings

2020-04-24 Thread GitBox


jpountz commented on a change in pull request #1444:
URL: https://github.com/apache/lucene-solr/pull/1444#discussion_r414350099



##
File path: 
lucene/expressions/src/java/org/apache/lucene/expressions/SimpleBindings.java
##
@@ -96,24 +90,51 @@ public DoubleValuesSource getDoubleValuesSource(String 
name) {
   case SCORE:
 return DoubleValuesSource.SCORES;
   default:
-throw new UnsupportedOperationException(); 
+throw new UnsupportedOperationException();
 }
   }
   
-  /** 
-   * Traverses the graph of bindings, checking there are no cycles or missing 
references 
-   * @throws IllegalArgumentException if the bindings is inconsistent 
+  @Override
+  public DoubleValuesSource getDoubleValuesSource(String name) {
+if (map.containsKey(name) == false) {
+  throw new IllegalArgumentException("Invalid reference '" + name + "'");
+}
+return map.get(name).apply(this);
+  }
+
+  /**
+   * Traverses the graph of bindings, checking there are no cycles or missing 
references
+   * @throws IllegalArgumentException if the bindings is inconsistent
*/
   public void validate() {
-for (Object o : map.values()) {
-  if (o instanceof Expression) {
-Expression expr = (Expression) o;
-try {
-  expr.getDoubleValuesSource(this);
-} catch (StackOverflowError e) {
-  throw new IllegalArgumentException("Recursion Error: Cycle detected 
originating in (" + expr.sourceText + ")");
-}
+for (String origin : map.keySet()) {

Review comment:
   nit: use entrySet() since you consume both keys and values?

##
File path: 
lucene/expressions/src/test/org/apache/lucene/expressions/TestExpressionValidation.java
##
@@ -110,4 +110,15 @@ public void testCoRecursion4() throws Exception {
 });
 assertTrue(expected.getMessage().contains("Cycle detected"));
   }
+
+  public void testCoRecursion42() throws Exception {

Review comment:
   I provided this test but I don't think we should add it as it relies on 
iteration order and might get defeated on some JVMs or future versions of Java. 
I'd suggest to not add this test, or to change the map in SimpleBindings from a 
HashMap to a TreeMap to be able to better test such cases (in which case we'd 
have to swap `cycle0`/`cycle2`), this could be done in a follow-up too.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org