[jira] [Commented] (LUCENE-8808) Introduce Optional Cap on Number Of Threads Per Query

2019-05-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847715#comment-16847715
 ] 

Adrien Grand commented on LUCENE-8808:
--

Sorry I don't. This is a hard problem. Your approach makes the assumption that 
it's ok to make slow queries slower if that helps fast queries remain fast, 
which I don't think would be the right assumption for most users?

> Introduce Optional Cap on Number Of Threads Per Query
> -
>
> Key: LUCENE-8808
> URL: https://issues.apache.org/jira/browse/LUCENE-8808
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> With the presence of https://issues.apache.org/jira/browse/LUCENE-8757 , a 
> natural progression is to allow advanced users to specify a cap on the number 
> of threads a query can use. This is especially useful for long polled 
> IndexSearcher instances, where the same instance is used to fire queries for 
> multiple runs and there are multiple concurrent IndexSearcher instances on 
> the same node.
>  
> This will be an optional parameter and local only to the IndexSearcher 
> instance being configured during construction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8808) Introduce Optional Cap on Number Of Threads Per Query

2019-05-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847400#comment-16847400
 ] 

Adrien Grand commented on LUCENE-8808:
--

In general I'm reluctant to adding more configuration options. It increases the 
API surface area, which in turn makes Lucene harder to use. My gut feeling 
right now is that this new setting would not help enough to warrant the 
introduction of a new option?

> Introduce Optional Cap on Number Of Threads Per Query
> -
>
> Key: LUCENE-8808
> URL: https://issues.apache.org/jira/browse/LUCENE-8808
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> With the presence of https://issues.apache.org/jira/browse/LUCENE-8757 , a 
> natural progression is to allow advanced users to specify a cap on the number 
> of threads a query can use. This is especially useful for long polled 
> IndexSearcher instances, where the same instance is used to fire queries for 
> multiple runs and there are multiple concurrent IndexSearcher instances on 
> the same node.
>  
> This will be an optional parameter and local only to the IndexSearcher 
> instance being configured during construction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7386) Flatten nested disjunctions

2019-05-24 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7386.
--
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.1

> Flatten nested disjunctions
> ---
>
> Key: LUCENE-7386
> URL: https://issues.apache.org/jira/browse/LUCENE-7386
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-7386.patch, LUCENE-7386.patch, LUCENE-7386.patch
>
>
> Now that coords are gone it became easier to flatten nested disjunctions. It 
> might sound weird to write nested disjunctions in the first place, but 
> disjunctions can be created implicitly by other queries such as 
> more-like-this, LatLonPoint.newBoxQuery, non-scoring synonym queries, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder

2019-05-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847343#comment-16847343
 ] 

Adrien Grand commented on LUCENE-8810:
--

Doing instanceof checks feels too fragile to me, this won't work if the 
BooleanQuery is wrapped under a ConstantScoreQuery or a BoostQuery. We could 
consider using the visitor API instead, but this would make BooleanQuery 
construction run in quadratic time of the depth of the query (if you have a 
boolean query BQ1 that wraps BQ2, which itself wraps BQ3, the clause count of 
BQ2 will be checked by BQ1 and BQ2 and the clause count of BQ3 will be checked 
by BQ1, BQ2 and BQ3, etc.), which is why I thought of IndexSearcher, which is 
the place where we would have access to the top-level query.

> Flattening of nested disjunctions does not take into account number of clause 
> limitation of builder
> ---
>
> Key: LUCENE-8810
> URL: https://issues.apache.org/jira/browse/LUCENE-8810
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.0
>Reporter: Mickaël Sauvée
>Priority: Minor
> Fix For: 8.1.1
>
> Attachments: LUCENE-8810.patch
>
>
> In org.apache.lucene.search.BooleanQuery, at the end of the function 
> rewrite(IndexReader reader), the query is rewritten to flatten nested 
> disjunctions.
> This does not take into account the limitation on the number of clauses in a 
> builder (1024).
>  In some circumstances, this limite can be reached, hence an exception is 
> thrown.
> Here is a unit test that highlight this.
> {code:java}
>   public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws 
> IOException {
> IndexSearcher searcher = newSearcher(new MultiReader());
> BooleanQuery.Builder builder1024 = new BooleanQuery.Builder();
> for(int i = 0; i < 1024; i++) {
>   builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), 
> Occur.SHOULD);
> }
> Query inner = builder1024.build();
> Query query = new BooleanQuery.Builder()
> .add(inner, Occur.SHOULD)
> .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD)
> .build();
> searcher.rewrite(query);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8810) Flattening of nested disjunctions does not take into account number of clause limitation of builder

2019-05-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16847320#comment-16847320
 ] 

Adrien Grand commented on LUCENE-8810:
--

I'm expecting that this would mostly be an issue for users who use inner 
boolean queries as a way to work around the maximum clause count. That said I 
understand how this change can be surprising, and it should be easy enough to 
check the clause count in the rewrite rule so I'd be ok with doing this. Would 
you like to work on a patch?

I opened LUCENE-8811 to maybe re-think the way that we check the number of 
clauses of queries in a more consistent way.

bq. I do not know what is "block-max WAND" (line 479 of BooleanQuery).

It is an optimized way to retrieve top hits of disjunctions (boolean queries 
with only SHOULD clauses) by decreasing score. It works by ignoring low-scoring 
clauses, and works better when disjunctions are inlined since this gives more 
information to the algorithm.

> Flattening of nested disjunctions does not take into account number of clause 
> limitation of builder
> ---
>
> Key: LUCENE-8810
> URL: https://issues.apache.org/jira/browse/LUCENE-8810
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 8.0
>Reporter: Mickaël Sauvée
>Priority: Minor
> Fix For: 8.1.1
>
>
> In org.apache.lucene.search.BooleanQuery, at the end of the function 
> rewrite(IndexReader reader), the query is rewritten to flatten nested 
> disjunctions.
> This does not take into account the limitation on the number of clauses in a 
> builder (1024).
>  In some circumstances, this limite can be reached, hence an exception is 
> thrown.
> Here is a unit test that highlight this.
> {code:java}
>   public void testFlattenInnerDisjunctionsWithMoreThan1024Terms() throws 
> IOException {
> IndexSearcher searcher = newSearcher(new MultiReader());
> BooleanQuery.Builder builder1024 = new BooleanQuery.Builder();
> for(int i = 0; i < 1024; i++) {
>   builder1024.add(new TermQuery(new Term("foo", "bar-" + i)), 
> Occur.SHOULD);
> }
> Query inner = builder1024.build();
> Query query = new BooleanQuery.Builder()
> .add(inner, Occur.SHOULD)
> .add(new TermQuery(new Term("foo", "baz")), Occur.SHOULD)
> .build();
> searcher.rewrite(query);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8811) Add maximum clause count check to IndexSearcher rather than BooleanQuery

2019-05-24 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8811:


 Summary: Add maximum clause count check to IndexSearcher rather 
than BooleanQuery
 Key: LUCENE-8811
 URL: https://issues.apache.org/jira/browse/LUCENE-8811
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


Currently we only check whether boolean queries have too many clauses. However 
there are other ways that queries may have too many clauses, for instance if 
you have boolean queries that have themselves inner boolean queries.

Could we use the new Query visitor API to move this check from BooleanQuery to 
IndexSearcher in order to make this check more consistent across queries? See 
for instance LUCENE-8810 where a rewrite rule caused the maximum clause count 
to be hit even though the total number of leaf queries remained the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8808) Introduce Optional Cap on Number Of Threads Per Query

2019-05-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845828#comment-16845828
 ] 

Adrien Grand commented on LUCENE-8808:
--

I don't think it really fixes the problem, it only makes it occur a bit later? 
If there is a single bad query, it makes the issue less problematic for the 
well-behaved queries, but what if 10% of your queries are bad, then these 
queries would likely still hog all available threads (and probably even fill 
the queue) and none of the well-behaved queries may run?

> Introduce Optional Cap on Number Of Threads Per Query
> -
>
> Key: LUCENE-8808
> URL: https://issues.apache.org/jira/browse/LUCENE-8808
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> With the presence of https://issues.apache.org/jira/browse/LUCENE-8757 , a 
> natural progression is to allow advanced users to specify a cap on the number 
> of threads a query can use. This is especially useful for long polled 
> IndexSearcher instances, where the same instance is used to fire queries for 
> multiple runs and there are multiple concurrent IndexSearcher instances on 
> the same node.
>  
> This will be an optional parameter and local only to the IndexSearcher 
> instance being configured during construction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8362) Add DocValue support for RangeFields

2019-05-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845816#comment-16845816
 ] 

Adrien Grand commented on LUCENE-8362:
--

I agree this could be useful in conjunction with IndexOrDocValuesQuery. Some 
thoughts:
* other classes handle points and doc values on separate classes (eg. 
LongPoint/NumericDocValuesField, LatLonPoint/LatLonDocValuesField), we should 
do the same for consistency?
* we should document the limitation that there can be at most one value per 
document due to the use of binary doc values
* can we have factory methods for doc value queries, so that something can be 
done with these docvalue fields, preferrably with "Slow" in the name of the 
factory method, like NumericDocValuesField#newSlowRangeQuery

> Add DocValue support for RangeFields 
> -
>
> Key: LUCENE-8362
> URL: https://issues.apache.org/jira/browse/LUCENE-8362
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Nicholas Knize
>Priority: Minor
> Attachments: LUCENE-8362.patch
>
>
> I'm opening this issue to discuss adding DocValue support to 
> {{\{Int|Long|Float|Double\}Range}} field types. Since existing numeric range 
> fields already provide the methods for encoding ranges as a byte array I 
> think this could be as simple as adding syntactic sugar to existing range 
> fields that simply build an instance of {{BinaryDocValues}} using that same 
> encoding. I'm envisioning something like 
> {{doc.add(IntRange.newDocValuesField("intDV", 100)}} But I'd like to solicit 
> other ideas or potential drawbacks to this approach.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8808) Introduce Optional Cap on Number Of Threads Per Query

2019-05-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845802#comment-16845802
 ] 

Adrien Grand commented on LUCENE-8808:
--

Can you explain the use-case a bit more? What does it buy compared to having an 
executor with a fixed number of threads that is shared across all IndexSearcher 
instances?

> Introduce Optional Cap on Number Of Threads Per Query
> -
>
> Key: LUCENE-8808
> URL: https://issues.apache.org/jira/browse/LUCENE-8808
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> With the presence of https://issues.apache.org/jira/browse/LUCENE-8757 , a 
> natural progression is to allow advanced users to specify a cap on the number 
> of threads a query can use. This is especially useful for long polled 
> IndexSearcher instances, where the same instance is used to fire queries for 
> multiple runs and there are multiple concurrent IndexSearcher instances on 
> the same node.
>  
> This will be an optional parameter and local only to the IndexSearcher 
> instance being configured during construction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8757.
--
   Resolution: Fixed
Fix Version/s: 8.2
   master (9.0)

Merged, thanks [~atris]!

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8805) Parameter changes for stringField() in StoredFieldVisitor

2019-05-21 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8805.
--
   Resolution: Fixed
Fix Version/s: master (9.0)

Thanks [~danmuzi], I just merged your change to master. I decided to not 
backport to 8.x since it's breaking but I could easily be convinced to if 
someone else has a different opinion. Thanks again for contributing!

> Parameter changes for stringField() in StoredFieldVisitor
> -
>
> Key: LUCENE-8805
> URL: https://issues.apache.org/jira/browse/LUCENE-8805
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: LUCENE-8805.patch, LUCENE-8805.patch, LUCENE-8805.patch
>
>
> I wrote this patch after seeing the comments left by [~mikemccand] when 
> SortingStoredFieldsConsumer class was first created.
> {code:java}
> @Override
> public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException 
> {
>   ...
>   // TODO: can we avoid new BR here?
>   ...
> }
> @Override
> public void stringField(FieldInfo fieldInfo, byte[] value) throws IOException 
> {
>   ...
>   // TODO: can we avoid new String here?
>   ...
> }
> {code}
> I changed two things.
>  -1) change binaryField() parameters from byte[] to BytesRef.-
>  2) change stringField() parameters from byte[] to String.
> I also changed the related contents while doing the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8807) Change all download URLs in build files to HTTPS

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845009#comment-16845009
 ] 

Adrien Grand commented on LUCENE-8807:
--

As usual, thanks [~thetaphi] for taking care of these issues!

> Change all download URLs in build files to HTTPS
> 
>
> Key: LUCENE-8807
> URL: https://issues.apache.org/jira/browse/LUCENE-8807
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Affects Versions: 8.1
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 7.7.2, master (9.0), 8.2, 8.1.1
>
> Attachments: LUCENE-8807.patch, LUCENE-8807.patch
>
>
> At least for Lucene this is not a security issue, because we have checksums 
> for all downloaded JAR dependencies:
> {quote}
> [...] Projects like Lucene do checksum whitelists of
> all their build dependencies, and you may wish to consider that as a
> protection against threats beyond just MITM [...]
> {quote}
> This patch fixes the URLs for most files referenced in {{\*build.xml}} and 
> {{\*ivy\*.xml}} to HTTPS. There are a few data files in benchmark which use 
> HTTP only, but that's uncritical and I added a TODO. Some were broken already.
> I removed the "uk.maven.org" workarounds for Maven, as this does not work 
> with HTTPS. By keeping those inside, we break the whole chain of trust, as 
> any non-working HTTPS would fallback to the insecure uk.maven.org Maven 
> mirror.
> As the great chinese firewall is changing all the time, we should just wait 
> for somebody complaining.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844817#comment-16844817
 ] 

Adrien Grand commented on LUCENE-8727:
--

I mentioned a shared priority queue in the description, but there might be 
other ways to do this. The main goal is that slices that get collected in 
parallel can benefit from information that is gathered in other slices.

> IndexSearcher#search(Query,int) should operate on a shared priority queue 
> when configured with an executor
> --
>
> Key: LUCENE-8727
> URL: https://issues.apache.org/jira/browse/LUCENE-8727
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> If IndexSearcher is configured with an executor, then the top docs for each 
> slice are computed separately before being merged once the top docs for all 
> slices are computed. With block-max WAND this is a bit of a waste of 
> resources: it would be better if an increase of the min competitive score 
> could help skip non-competitive hits on every slice and not just the current 
> one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8788) Order LeafReaderContexts by Estimated Number Of Hits

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844820#comment-16844820
 ] 

Adrien Grand commented on LUCENE-8788:
--

I think this is definitely worth exploring. It looks like a subset of 
LUCENE-8727, since we are only aiming at using fully collected slices here to 
speed up slices that have not been collected yet.

> Order LeafReaderContexts by Estimated Number Of Hits
> 
>
> Key: LUCENE-8788
> URL: https://issues.apache.org/jira/browse/LUCENE-8788
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> We offer no guarantee on the order in which an IndexSearcher will look at 
> segments during a search operation. This can be improved for use cases where 
> an engine using Lucene invokes early termination and uses the partially 
> collected hits. A better model would be if we sorted segments by the 
> estimated number of hits, thus increasing the probability of the overall 
> relevance of the returned partial results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8788) Order LeafReaderContexts by Estimated Number Of Hits

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844796#comment-16844796
 ] 

Adrien Grand commented on LUCENE-8788:
--

Do I get your idea right that your plan is to select multiple slices, but to 
collect them sequentially rather than in parallel so collection of a slice can 
leverage information that was gathered in previous slices? For instance in the 
case that a user wants the top 10 hits sorted by a numeric field foo and that 
the 10th best hit has a value of 7 for field foo after collecting the first 
slice, we could ignore documents whose value for the foo field is greater than 
7 for follow-up slices. And then we can order slices in the order that best 
suits us since Lucene has no expectation regarding the order in which slices 
are collected, so we could sort slices by increasing minimum (or maximum, or 
median) foo value.

This could be especially useful in the worst-case scenario that index order is 
inversely correlated with sort order. For instance lots of users end up pushing 
logs to Lucene indices, and usually more recent logs get higher doc IDs. So 
fetching the most recent logs hits the worst-case scenario I mentioned in my 
previous sentence. Index sorting could help address this problem, but these 
users often have lots of data and care about indexing rate, while index sorting 
adds overhead to indexing.

A related idea that [~jimczi] mentioned to me would be to shuffle segments both 
at merge time and when opening point-in-time views, in order to avoid ever 
having an index order that is inversely correlated with sort order. Similarly 
to how one can avoid running into quicksort's worst-case by shuffling the array 
first.

 

> Order LeafReaderContexts by Estimated Number Of Hits
> 
>
> Key: LUCENE-8788
> URL: https://issues.apache.org/jira/browse/LUCENE-8788
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Major
>
> We offer no guarantee on the order in which an IndexSearcher will look at 
> segments during a search operation. This can be improved for use cases where 
> an engine using Lucene invokes early termination and uses the partially 
> collected hits. A better model would be if we sorted segments by the 
> estimated number of hits, thus increasing the probability of the overall 
> relevance of the returned partial results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8770) BlockMaxConjunctionScorer should support two-phase scorers

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844618#comment-16844618
 ] 

Adrien Grand commented on LUCENE-8770:
--

+1

> BlockMaxConjunctionScorer should support two-phase scorers
> --
>
> Key: LUCENE-8770
> URL: https://issues.apache.org/jira/browse/LUCENE-8770
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-8770.patch, LUCENE-8770.patch
>
>
> The support for two-phase scorers in BlockMaxConjunctionScorer is missing. 
> This can slow down some queries that need to execute costly second phase on 
> more documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844595#comment-16844595
 ] 

Adrien Grand commented on LUCENE-8757:
--

[~atris] I think it is still not correct since the values of the docBase/maxDoc 
can only be seen by the current leaf collector while we need this check across 
all leaf collectors that are created from the same collector.

Looking at the AssertingCollector again, it has a check that doc IDs are 
collected in doc ID order, so I wonder why this assertion didn't trip with the 
earlier version of your patch that sorted leaves by decreasing maxDoc. Maybe we 
just got lucky? Nevertheless I think it's worth adding another assertion that 
leaves are collected in the right order and that their doc ID space doesn't 
intersect as described above, eg. we could record a {{previousLeafMaxDoc}} at 
the same level as {{maxDoc}} in AssertinCollector, and then in 
{{getLeafCollector}} do something like

{code}
assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be 
greater if some leaves are skipped
previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
{code}

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844198#comment-16844198
 ] 

Adrien Grand edited comment on LUCENE-8757 at 5/20/19 6:30 PM:
---

Thanks [~atris]. I think there is a bug in AssertingCollector as 
previousDocBase is always -1? By the way, we don't need to ensure that 
previousDocBase <= docBase, but that previousDocBase + previousMaxDoc <= 
docBase?


was (Author: jpountz):
Thanks [~atris]. I think there is a bug in AssertingCollector as 
previousDocBase is always 1? By the way, we don't only need to ensure that 
previousDocBase <= docBase, but even that previousDocBase + previousMaxDoc <= 
docBase?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844198#comment-16844198
 ] 

Adrien Grand commented on LUCENE-8757:
--

Thanks [~atris]. I think there is a bug in AssertingCollector as 
previousDocBase is always 1? By the way, we don't only need to ensure that 
previousDocBase <= docBase, but even that previousDocBase + previousMaxDoc <= 
docBase?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8805) Parameter changes for stringField() in StoredFieldVisitor

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844183#comment-16844183
 ] 

Adrien Grand commented on LUCENE-8805:
--

The patch looks good in general. In SolrDocumentFetcher, you don't need to 
convert the string to a ByteArrayUtf8CharSequence, you can pass the String 
directly and it will do the same, it just needs a CharSequence.

> Parameter changes for stringField() in StoredFieldVisitor
> -
>
> Key: LUCENE-8805
> URL: https://issues.apache.org/jira/browse/LUCENE-8805
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8805.patch, LUCENE-8805.patch
>
>
> I wrote this patch after seeing the comments left by [~mikemccand] when 
> SortingStoredFieldsConsumer class was first created.
> {code:java}
> @Override
> public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException 
> {
>   ...
>   // TODO: can we avoid new BR here?
>   ...
> }
> @Override
> public void stringField(FieldInfo fieldInfo, byte[] value) throws IOException 
> {
>   ...
>   // TODO: can we avoid new String here?
>   ...
> }
> {code}
> I changed two things.
>  -1) change binaryField() parameters from byte[] to BytesRef.-
>  2) change stringField() parameters from byte[] to String.
> I also changed the related contents while doing the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844029#comment-16844029
 ] 

Adrien Grand commented on LUCENE-8791:
--

bq. Yes, We run CPU intensive ML models to rescore hits, it always runs with an 
executor.

I understand how ML models can be costly to build, but running them against a 
small set of hits should perform reasonably fast?

bq. using LeafReaderContext.ord directly as positional parameter might be risky 
since LeafReaderContext has a constructor where it sets the ord to 0

This constructor is pkg-private and only used for single-segment reader 
contexts, such as the ones that you get from LeafReader#getContext. Reader 
context ords should be reliable.

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844013#comment-16844013
 ] 

Adrien Grand commented on LUCENE-8757:
--

I think we could add an assertion for this in AssertingCollector.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8770) BlockMaxConjunctionScorer should support two-phase scorers

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844001#comment-16844001
 ] 

Adrien Grand commented on LUCENE-8770:
--

This is great. I wonder how useful computing the score in the two-phase and the 
iterator help now, can we get rid of it or would it hurt?

> BlockMaxConjunctionScorer should support two-phase scorers
> --
>
> Key: LUCENE-8770
> URL: https://issues.apache.org/jira/browse/LUCENE-8770
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-8770.patch
>
>
> The support for two-phase scorers in BlockMaxConjunctionScorer is missing. 
> This can slow down some queries that need to execute costly second phase on 
> more documents.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8804) FieldType attribute map should not be modifiable after freeze

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843979#comment-16843979
 ] 

Adrien Grand commented on LUCENE-8804:
--

This makes sense to me. [~mikemccand] Any objections?

> FieldType attribute map should not be modifiable after freeze
> -
>
> Key: LUCENE-8804
> URL: https://issues.apache.org/jira/browse/LUCENE-8804
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.0
>Reporter: Vamshi Vijay Nakkirtha
>Priority: Minor
>  Labels: features, patch
> Attachments: LUCENE-8804.patch
>
>
> Today FieldType attribute map can be modifiable even after freeze. For all 
> other properties of FieldType, we do "checkIfFrozen()" before making the 
> update to the property but for attribute map, we does not seem to make such 
> check. 
>  
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.0.0/lucene/core/src/java/org/apache/lucene/document/FieldType.java#L363]
> we may need to add check at the beginning of the function similar to other 
> properties setters.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8689) Boolean DocValues Codec Implementation

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843945#comment-16843945
 ] 

Adrien Grand commented on LUCENE-8689:
--

[~Dmitry Popov] Since your codec only iterates on ones at read time, maybe it 
should only accept ones at index time? I'm afraid that its behavior might be a 
bit surprising to users otherwise? You would also be able to change 
{{longValue()}} to always return 1 instead of consulting the bitset (since it 
is only legal to call this method when the iterator is positioned).

[~erickerickson] Solr's bool fields wouldn't be able to use this codec at the 
moment since they use SORTED doc values, which are unsupported by the proposed 
codec, which only supports NUMERIC doc values.

> Boolean DocValues Codec Implementation
> --
>
> Key: LUCENE-8689
> URL: https://issues.apache.org/jira/browse/LUCENE-8689
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Ivan Mamontov
>Priority: Minor
>  Labels: patch, performance
> Attachments: LUCENE-8689.patch, LUCENE-8689.patch, 
> SynteticDocValuesBench70.java, SynteticDocValuesBench80.java, 
> benchmark_dense.png, boolean_vs_dense_vs_sparse_indexing.png, 
> boolean_vs_dense_vs_sparse_updates.png, dense_vs_sparse_querying.png, 
> results2.png
>
>
> To avoid issues where some products become available/unavailable at some 
> point in time after being out-of-stock, e-commerce search system designers 
> need to embed up-to-date information about inventory availability right into 
> the search engines. Key requirement is to be able to accurately filter out 
> unavailable products and use availability as one of ranking signals. However, 
> keeping availability data up-to-date is a non-trivial task. Straightforward 
> implementation based on a partial updates of Lucene documents causes Solr 
> cache trashing with negatively affected query performance and resource 
> utilization.
>  As an alternative solution we can use DocValues and build-in in-place 
> updates where field values can be independently updated without touching 
> inverted index, and while filtering by DocValues is a bit slower, overall 
> performance gain is better. However existing long based docValues are not 
> sufficiently optimized for carrying boolean inventory availability data:
>  * All DocValues queries are internally rewritten into 
> org.apache.lucene.search.DocValuesNumbersQuery which is based on direct 
> iteration over all column values and typically much slower than using 
> TermsQuery.
>  * On every commit/merge codec has to iterate over DocValues a couple times 
> in order to choose the best compression algorithm suitable for given data. As 
> a result for 4K fields and 3M max doc merge takes more than 10 minutes
> This issue is intended to solve these limitations via special bitwise doc 
> values format that uses internal representation of 
> org.apache.lucene.util.FixedBitSet in order to store indexed values and load 
> them at search time as a simple long array without additional decoding. There 
> are several reasons for this:
>  * At index time encoding is super fast without superfluous iterations over 
> all values to choose the best compression algorithm suitable for given data.
>  * At query time decoding is also simple and fast, no GC pressure and extra 
> steps
>  * Internal representation allows to perform random access in constant time
> Limitations are:
>  * Does not support non boolean fields
>  * Boolean fields must be represented as long values 1 for true and 0 for 
> false
>  * Current implementation does not support advanced bit set formats like 
> org.apache.lucene.util.SparseFixedBitSet or 
> org.apache.lucene.util.RoaringDocIdSet
> In order to evaluate performance gain I've wrote a simple JMH based benchmark 
> [^SynteticDocValuesBench70.java] which allows to estimate a relative cost of 
> DF filters. This benchmark creates 2 000 000 documents with 5 boolean columns 
> with different density, where 10, 35, 50, 60 and 90 is an amount of documents 
> with value 1. Each method tries to enumerate over all values in synthetic 
> store field in all available ways:
>  * baseline – in almost all cases Solr uses FixedBitSet in filter cache to 
> keep store availability. This test just iterates over all bits.
>  * docValuesRaw – iterates over all values of DV column, the same code is 
> used in "post filtering", sorting and faceting.
>  * docValuesNumbersQuery – iterates over all values produced by query/filter 
> store:1, actually there is the only query implementation for DV based fields 
> - DocValuesNumbersQuery. This means that Lucene rewrites all term, range and 
> filter queries for non indexed filed into this fallback implementation.
>  * docValuesBooleanQuery – 

[jira] [Commented] (LUCENE-8805) Parameter changes for binaryField() and stringField() in StoredFieldVisitor

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843931#comment-16843931
 ] 

Adrien Grand commented on LUCENE-8805:
--

I like how passing a String to stringField makes the API simpler. I'm less 
enthusiastic about using a BytesRef instead of a byte[] for binary fields. I 
could see the benefit if this allowed reusing the byte[], but otherwise it 
makes the API more complicated with no benefit?

> Parameter changes for binaryField() and stringField() in StoredFieldVisitor
> ---
>
> Key: LUCENE-8805
> URL: https://issues.apache.org/jira/browse/LUCENE-8805
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: LUCENE-8805.patch
>
>
> I wrote this patch after seeing the comments left by [~mikemccand] when 
> SortingStoredFieldsConsumer class was first created.
> {code:java}
> @Override
> public void binaryField(FieldInfo fieldInfo, byte[] value) throws IOException 
> {
>   ...
>   // TODO: can we avoid new BR here?
>   ...
> }
> @Override
> public void stringField(FieldInfo fieldInfo, byte[] value) throws IOException 
> {
>   ...
>   // TODO: can we avoid new String here?
>   ...
> }
> {code}
> I changed two things.
> 1) change binaryField() parameters from byte[] to BytesRef.
> 2) change stringField() parameters from byte[] to String.
> I also changed the related contents while doing the work.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4012) Make all query classes serializable, and provide a query parser to consume them

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843910#comment-16843910
 ] 

Adrien Grand commented on LUCENE-4012:
--

bq. My thought is that each query parser could potentially come with a 
serializer that serializes queries into its language, since not every parser 
can represent every query type

FYI it's not only about the query types that a parser supports. For instead the 
classic query parsers creates synonym queries for terms that occur at the same 
position, but it doesn't have a way to create a synonym query explicitly from 2 
or more terms.

I agree that serialization of queries is important, but in my opinion it makes 
things simpler to add serialization support to another, simpler, query layer 
that would sit on top of Lucene and that only translates into Lucene queries 
for the purpose of query execution. This layer doesn't have to care of the 
details regarding how scores are merged across fields (dismax vs. bool vs. 
BM25FQuery), whether any query terms included synonyms, how the field was 
indexed (range query on points or terms don't translate to the same Lucene 
query), whether some queries were introduced that only affect execution paths 
but not matched results (IndexOrDocValueQuery), etc. 

> Make all query classes serializable, and provide a query parser to consume 
> them
> ---
>
> Key: LUCENE-4012
> URL: https://issues.apache.org/jira/browse/LUCENE-4012
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/queryparser
>Affects Versions: 4.0-ALPHA
>Reporter: Benson Margulies
>Priority: Major
> Attachments: bq.patch
>
>
> I started off on LUCENE-4004 wanting to use DisjunctionMaxQuery via a parser. 
> However, this wasn't really because I thought that human beans should be 
> improvisationally  composing such thing. My real goal was to concoct a query 
> tree over *here*, and then serialize it to send to Solr over *there*. 
> It occurs to me that if the Xml parser is pretty good for this, JSON would be 
> better. It further occurs to me that the query classes may already all work 
> with Jackson, and, if they don't, the required tweaks will be quite small. By 
> allowing Jackson to write out class names as needed, you get the ability to 
> serialize *any* query, so long as the other side has the classes in class 
> path. A trifle verbose, but not as verbose as XML, and furthermore squishable 
> (though not in a URL) via SMILE or BSON.
> So, the goal of this JIRA is to accumulate tweaks to the query classes to 
> make them more 'bean pattern'. An alternative would be Jackson annotations. 
> However, I suspect that folks would be happier to minimize the level of 
> coupling here; in the extreme, the trivial parser could live in contrib if no 
> one wants a dependency, even optional, on Jackson itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-20 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16843876#comment-16843876
 ] 

Adrien Grand commented on LUCENE-8757:
--

[~atris] Your last patch sorts in reverse order of docBase, it should sort by 
the natural order?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8799) Add Force Commit Method to RandomIndexWriter

2019-05-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840620#comment-16840620
 ] 

Adrien Grand commented on LUCENE-8799:
--

[~atris] I don't have a checkout of the code locally so apologies if I'm 
missing something, but it looks like calling RandomIndexWriter.commit(false) 
today would call w.commit(), just like the new method you are proposing?

> Add Force Commit Method to RandomIndexWriter
> 
>
> Key: LUCENE-8799
> URL: https://issues.apache.org/jira/browse/LUCENE-8799
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Minor
> Attachments: LUCENE-8799.patch
>
>
> A test convenience would be to add a force commit method in 
> RandomIndexWriter. This will allow writing tests where segment sizes can be 
> accurately controlled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8799) Add Force Commit Method to RandomIndexWriter

2019-05-14 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839913#comment-16839913
 ] 

Adrien Grand commented on LUCENE-8799:
--

What does it do differently than RandomIndexWriter#commit(false)?

> Add Force Commit Method to RandomIndexWriter
> 
>
> Key: LUCENE-8799
> URL: https://issues.apache.org/jira/browse/LUCENE-8799
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Priority: Minor
> Attachments: LUCENE-8799.patch
>
>
> A test convenience would be to add a force commit method in 
> RandomIndexWriter. This will allow writing tests where segment sizes can be 
> accurately controlled.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7697) IndexSearcher should leverage index sorting

2019-05-13 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838371#comment-16838371
 ] 

Adrien Grand commented on LUCENE-7697:
--

I am not, but [~jtibshirani] recently mentioned to me that she'd like to look 
into LUCENE-7714. There is LUCENE-8727 which is somewhat related too: some of 
the optimizations that we apply based on the minimum competitive score, and to 
a lesser extent index sorting, become less efficient when IndexSearcher is 
configured with an executor. This is due to the fact that these optimizations 
leverage information about already collected documents to more efficiently 
collect new documents. Since IndexSearcher searches each slice independently, a 
given slice can't benefit from information that has been collected in other 
slices.

> IndexSearcher should leverage index sorting
> ---
>
> Key: LUCENE-7697
> URL: https://issues.apache.org/jira/browse/LUCENE-7697
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> We made good efforts in order to make index sorting fast and easy to 
> configure. We should now look into making IndexSearcher aware of it. This 
> will probably require changes of the API as not collecting all matches means 
> that we can no longer know things like the total number of hits or the 
> maximum score.
> I don't plan to work on it anytime soon, I'm just opening this issue to raise 
> awareness. I'd be happy to do reviews however if someone decides to tackle it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-13 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838363#comment-16838363
 ] 

Adrien Grand commented on LUCENE-8757:
--

Yes. Top-docs collectors are expected to tie-break by doc ID in case documents 
compare equal. Things like TopDocs#merge compare doc IDs explicitly for that 
purpose, but Collector#collect implementations just rely on the fact that 
documents are collected in order to ignore documents that compare equal to the 
current k-th best hit. So we need to sort segments within a slice by docBase in 
order to get the same top hits regardless of how slices have been constructed.

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7697) IndexSearcher should leverage index sorting

2019-05-12 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-7697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-7697.
--
Resolution: Duplicate

Oops, actually I've done what I had in mind in LUCENE-8059, I had probably 
forgotten about this issue. I think this is what your (2) is about.

Regarding (1), we have an open issue about this idea at LUCENE-7714. (It talks 
about range queries, but it is the same idea as what you are describing for 
exact matches.)

> IndexSearcher should leverage index sorting
> ---
>
> Key: LUCENE-7697
> URL: https://issues.apache.org/jira/browse/LUCENE-7697
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> We made good efforts in order to make index sorting fast and easy to 
> configure. We should now look into making IndexSearcher aware of it. This 
> will probably require changes of the API as not collecting all matches means 
> that we can no longer know things like the total number of hits or the 
> maximum score.
> I don't plan to work on it anytime soon, I'm just opening this issue to raise 
> awareness. I'd be happy to do reviews however if someone decides to tackle it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8791) Add CollectorRescorer

2019-05-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838136#comment-16838136
 ] 

Adrien Grand commented on LUCENE-8791:
--

I like the idea of a collector-based rescorer so that you can do things that 
you can't do with queries, such as ensuring there is some diversity on the 
first page (we have a DiversifiedTopDocsCollector). However I'm curious about 
the fact that this rescorer takes an executor, are you running rescorers that 
are very CPU-intensive?

> Add CollectorRescorer
> -
>
> Key: LUCENE-8791
> URL: https://issues.apache.org/jira/browse/LUCENE-8791
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Elbek Kamoliddinov
>Priority: Major
> Attachments: LUCENE-8791.patch, LUCENE-8791.patch
>
>
> This is another implementation of query rescorer api (LUCENE-5489). It adds 
> rescoring functionality based on provided CollectorManager. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8795) ArrayIndexOutOfBoundsException during System.arraycopy in BKDWriter

2019-05-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838132#comment-16838132
 ] 

Adrien Grand commented on LUCENE-8795:
--

TestBKD 
(https://github.com/apache/lucene-solr/blob/master/lucene/core/src/test/org/apache/lucene/util/bkd/TestBKD.java)
 is the most likely test to reproduce this issue.

> ArrayIndexOutOfBoundsException during System.arraycopy in BKDWriter
> ---
>
> Key: LUCENE-8795
> URL: https://issues.apache.org/jira/browse/LUCENE-8795
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 6.6
> Environment: h2. Operating system/Platform
> {code:java}
> IBM i - V7R3M0
> {code}
> h2. Java version
> {code:java}
> java version "1.8.0_191"
> Java(TM) SE Runtime Environment (build 8.0.5.25 - 
> pap6480sr5fp25-20181030_01(SR5 FP25))
> IBM J9 VM (build 2.9, JRE 1.8.0 OS/400 ppc64-64-Bit Compressed References 
> 20181029_400846 (JIT enabled, AOT enabled)
> OpenJ9   - c5c78da
> OMR  - 3d5ac33
> IBM  - 8c1bdc2)
> JCL - 20181022_01 based on Oracle jdk8u191-b26
> NOTICE: If no version information is found above, this could indicate a 
> corrupted Java installation!
> Java detected was: /QOpenSys/QIBM/ProdData/JavaVM/jdk80/64bit/bin/java
> -Dmultiarchive.basepath=/home/NEXTOWN/Multi-Support/Next -Xms128m -Xmx2048m
> {code}
>Reporter: Torben Riis
>Priority: Major
>
> Hi,
> I’m a bit stuck here and needs a clue or two in order to continue our 
> investigations. Hope that someone can help. :)
> Periodically, around once a month, we get the below 
> {{ArrayIndexOutOfBoundsException}} on our system. We use multiple indexes and 
> the error can originate from any of them, but the error always occurs in line 
> 1217 in {{BKDWriter}} (during a {{System.arraycopy}}).
> We found a couple of issues on the net regarding JIT optimization problem 
> related to J9, but they all looks like they have been resolved and cannot be 
> reproduced anymore. But nevertheless, we have added the following {{-Xjit}} 
> flags to excludes JIT optimization for every class / inner classes in the 
> {{bkd}} package. Moreover, we have also made a complete copy of the whole 
> installation in production, and added the opposite arguments (enforce JIT 
> optimizations for the specific classes). First we will try with 
> {{optLevel=hot}}, but if this doesn’t show anything we will afterwards try 
> with {{veryHot}} and {{scorching}}. Unfortunately we do not have the result 
> of this yet, but I’ll of course post it when it is known.
> Unfortunately it’s not possible to run OpenJDK on the IBM i platform, so such 
> a test will not be possible, but it is worth mentioning that our product is a 
> standard product, which typically run on the Windows or Linux platform using 
> AdoptOpenJDK. Currently we have a couple of hundred installations running out 
> there on these platforms, and without any problems. But on the IBM I platform 
> with J9 we sometimes see this exception.
> Any good ideas for further investigations? Or have seen such issue before?
> We are using Lucene 6.6.0 and runs on IBM J9 on the IBM I platform.
> h2.  Stacktrace
> {code:java}
> Exception in thread "Lucene Merge Thread #0" 2019-05-01T06:10:07.970 CEST 
> [Lucene Merge Thread #0] org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:703)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:683)
> Caused by: 2019-05-01T06:10:07.971 CEST [Lucene Merge Thread #0] 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.lucene.util.bkd.BKDWriter.recursePackIndex(BKDWriter.java:1217)
> at 
> org.apache.lucene.util.bkd.BKDWriter.recursePackIndex(BKDWriter.java:1197)
> at 
> org.apache.lucene.util.bkd.BKDWriter.packIndex(BKDWriter.java:1078)
> at 
> org.apache.lucene.util.bkd.BKDWriter.writeIndex(BKDWriter.java:1245)
> at 
> org.apache.lucene.util.bkd.BKDWriter.access$600(BKDWriter.java:82)
> at 
> org.apache.lucene.util.bkd.BKDWriter$OneDimensionBKDWriter.finish(BKDWriter.java:648)
> at org.apache.lucene.util.bkd.BKDWriter.merge(BKDWriter.java:560)
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:212)
> at 
> org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
> at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
> at 
> 

[jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm

2019-05-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838131#comment-16838131
 ] 

Adrien Grand commented on LUCENE-8757:
--

I think we need to sort by docBase before constructing the slices, otherwise we 
might collect doc IDs out-of-order. By the way we should probably make the 
LeafSlice constructor check that leaves come in order?

> Better Segment To Thread Mapping Algorithm
> --
>
> Key: LUCENE-8757
> URL: https://issues.apache.org/jira/browse/LUCENE-8757
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Atri Sharma
>Assignee: Simon Willnauer
>Priority: Major
> Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, 
> LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8795) ArrayIndexOutOfBoundsException during System.arraycopy in BKDWriter

2019-05-10 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8795.
--
Resolution: Won't Fix

Closing since this looks like a JVM bug.

> ArrayIndexOutOfBoundsException during System.arraycopy in BKDWriter
> ---
>
> Key: LUCENE-8795
> URL: https://issues.apache.org/jira/browse/LUCENE-8795
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 6.6
> Environment: h2. Operating system/Platform
> {code:java}
> IBM i - V7R3M0
> {code}
> h2. Java version
> {code:java}
> java version "1.8.0_191"
> Java(TM) SE Runtime Environment (build 8.0.5.25 - 
> pap6480sr5fp25-20181030_01(SR5 FP25))
> IBM J9 VM (build 2.9, JRE 1.8.0 OS/400 ppc64-64-Bit Compressed References 
> 20181029_400846 (JIT enabled, AOT enabled)
> OpenJ9   - c5c78da
> OMR  - 3d5ac33
> IBM  - 8c1bdc2)
> JCL - 20181022_01 based on Oracle jdk8u191-b26
> NOTICE: If no version information is found above, this could indicate a 
> corrupted Java installation!
> Java detected was: /QOpenSys/QIBM/ProdData/JavaVM/jdk80/64bit/bin/java
> -Dmultiarchive.basepath=/home/NEXTOWN/Multi-Support/Next -Xms128m -Xmx2048m
> {code}
>Reporter: Torben Riis
>Priority: Major
>
> Hi,
> I’m a bit stuck here and needs a clue or two in order to continue our 
> investigations. Hope that someone can help. :)
> Periodically, around once a month, we get the below 
> {{ArrayIndexOutOfBoundsException}} on our system. We use multiple indexes and 
> the error can originate from any of them, but the error always occurs in line 
> 1217 in {{BKDWriter}} (during a {{System.arraycopy}}).
> We found a couple of issues on the net regarding JIT optimization problem 
> related to J9, but they all looks like they have been resolved and cannot be 
> reproduced anymore. But nevertheless, we have added the following {{-Xjit}} 
> flags to excludes JIT optimization for every class / inner classes in the 
> {{bkd}} package. Moreover, we have also made a complete copy of the whole 
> installation in production, and added the opposite arguments (enforce JIT 
> optimizations for the specific classes). First we will try with 
> {{optLevel=hot}}, but if this doesn’t show anything we will afterwards try 
> with {{veryHot}} and {{scorching}}. Unfortunately we do not have the result 
> of this yet, but I’ll of course post it when it is known.
> Unfortunately it’s not possible to run OpenJDK on the IBM i platform, so such 
> a test will not be possible, but it is worth mentioning that our product is a 
> standard product, which typically run on the Windows or Linux platform using 
> AdoptOpenJDK. Currently we have a couple of hundred installations running out 
> there on these platforms, and without any problems. But on the IBM I platform 
> with J9 we sometimes see this exception.
> Any good ideas for further investigations? Or have seen such issue before?
> We are using Lucene 6.6.0 and runs on IBM J9 on the IBM I platform.
> h2.  Stacktrace
> {code:java}
> Exception in thread "Lucene Merge Thread #0" 2019-05-01T06:10:07.970 CEST 
> [Lucene Merge Thread #0] org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:703)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:683)
> Caused by: 2019-05-01T06:10:07.971 CEST [Lucene Merge Thread #0] 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.lucene.util.bkd.BKDWriter.recursePackIndex(BKDWriter.java:1217)
> at 
> org.apache.lucene.util.bkd.BKDWriter.recursePackIndex(BKDWriter.java:1197)
> at 
> org.apache.lucene.util.bkd.BKDWriter.packIndex(BKDWriter.java:1078)
> at 
> org.apache.lucene.util.bkd.BKDWriter.writeIndex(BKDWriter.java:1245)
> at 
> org.apache.lucene.util.bkd.BKDWriter.access$600(BKDWriter.java:82)
> at 
> org.apache.lucene.util.bkd.BKDWriter$OneDimensionBKDWriter.finish(BKDWriter.java:648)
> at org.apache.lucene.util.bkd.BKDWriter.merge(BKDWriter.java:560)
> at 
> org.apache.lucene.codecs.lucene60.Lucene60PointsWriter.merge(Lucene60PointsWriter.java:212)
> at 
> org.apache.lucene.index.SegmentMerger.mergePoints(SegmentMerger.java:173)
> at 
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
> at 
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4356)
> at 
> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3931)
> at 
> 

[jira] [Commented] (LUCENE-7840) BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 1 MUST/FILTER clause and 0==minShouldMatch

2019-05-07 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834714#comment-16834714
 ] 

Adrien Grand commented on LUCENE-7840:
--

+1


> BooleanQuery.rewriteNoScoring - optimize away any SHOULD clauses if at least 
> 1 MUST/FILTER clause and 0==minShouldMatch
> ---
>
> Key: LUCENE-7840
> URL: https://issues.apache.org/jira/browse/LUCENE-7840
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Major
> Attachments: LUCENE-7840.patch, LUCENE-7840.patch
>
>
> I haven't thought this through completely, let alone write up a patch / test 
> case, but IIUC...
> We should be able to optimize  {{ BooleanQuery rewriteNoScoring() }} so that 
> (after converting MUST clauses to FILTER clauses) we can check for the common 
> case of {{0==getMinimumNumberShouldMatch()}} and throw away any SHOULD 
> clauses as long as there is is at least one FILTER clause.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8795) ArrayIndexOutOfBoundsException during System.arraycopy in BKDWriter

2019-05-07 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834709#comment-16834709
 ] 

Adrien Grand commented on LUCENE-8795:
--

I just did a quick review, here is the code for this particular version of 
Lucene in case you would like to have a look: 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L1217.

{code}
// restore lastSplitValues to what caller originally passed us:
System.arraycopy(savSplitValue, 0, lastSplitValues, splitDim * bytesPerDim + 
prefix, suffix);
{code}

The thing that is confusing is that there is another call to arraycopy a couple 
lines up that does the opposite copy 
(https://github.com/apache/lucene-solr/blob/releases/lucene-solr/6.6.0/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L1182):

{code}
System.arraycopy(lastSplitValues, splitDim * bytesPerDim + prefix, 
savSplitValue, 0, suffix);
{code}

It doesn't make sense to me that the arraycopy on line 1217 would fail with an 
AIOOBE while the one on line 1182 would not, this looks like a JVM bug to me.

> ArrayIndexOutOfBoundsException during System.arraycopy in BKDWriter
> ---
>
> Key: LUCENE-8795
> URL: https://issues.apache.org/jira/browse/LUCENE-8795
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 6.6
> Environment: h2. Operating system/Platform
> {code:java}
> IBM i - V7R3M0
> {code}
> h2. Java version
> {code:java}
> java version "1.8.0_191"
> Java(TM) SE Runtime Environment (build 8.0.5.25 - 
> pap6480sr5fp25-20181030_01(SR5 FP25))
> IBM J9 VM (build 2.9, JRE 1.8.0 OS/400 ppc64-64-Bit Compressed References 
> 20181029_400846 (JIT enabled, AOT enabled)
> OpenJ9   - c5c78da
> OMR  - 3d5ac33
> IBM  - 8c1bdc2)
> JCL - 20181022_01 based on Oracle jdk8u191-b26
> NOTICE: If no version information is found above, this could indicate a 
> corrupted Java installation!
> Java detected was: /QOpenSys/QIBM/ProdData/JavaVM/jdk80/64bit/bin/java
> -Dmultiarchive.basepath=/home/NEXTOWN/Multi-Support/Next -Xms128m -Xmx2048m
> {code}
>Reporter: Torben Riis
>Priority: Major
>
> Hi,
> I’m a bit stuck here and needs a clue or two in order to continue our 
> investigations. Hope that someone can help. :)
> Periodically, around once a month, we get the below 
> {{ArrayIndexOutOfBoundsException}} on our system. We use multiple indexes and 
> the error can originate from any of them, but the error always occurs in line 
> 1217 in {{BKDWriter}} (during a {{System.arraycopy}}).
> We found a couple of issues on the net regarding JIT optimization problem 
> related to J9, but they all looks like they have been resolved and cannot be 
> reproduced anymore. But nevertheless, we have added the following {{-Xjit}} 
> flags to excludes JIT optimization for every class / inner classes in the 
> {{bkd}} package. Moreover, we have also made a complete copy of the whole 
> installation in production, and added the opposite arguments (enforce JIT 
> optimizations for the specific classes). First we will try with 
> {{optLevel=hot}}, but if this doesn’t show anything we will afterwards try 
> with {{veryHot}} and {{scorching}}. Unfortunately we do not have the result 
> of this yet, but I’ll of course post it when it is known.
> Unfortunately it’s not possible to run OpenJDK on the IBM i platform, so such 
> a test will not be possible, but it is worth mentioning that our product is a 
> standard product, which typically run on the Windows or Linux platform using 
> AdoptOpenJDK. Currently we have a couple of hundred installations running out 
> there on these platforms, and without any problems. But on the IBM I platform 
> with J9 we sometimes see this exception.
> Any good ideas for further investigations? Or have seen such issue before?
> We are using Lucene 6.6.0 and runs on IBM J9 on the IBM I platform.
> h2.  Stacktrace
> {code:java}
> Exception in thread "Lucene Merge Thread #0" 2019-05-01T06:10:07.970 CEST 
> [Lucene Merge Thread #0] org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:703)
> at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:683)
> Caused by: 2019-05-01T06:10:07.971 CEST [Lucene Merge Thread #0] 
> java.lang.ArrayIndexOutOfBoundsException
> at 
> org.apache.lucene.util.bkd.BKDWriter.recursePackIndex(BKDWriter.java:1217)
> at 
> org.apache.lucene.util.bkd.BKDWriter.recursePackIndex(BKDWriter.java:1197)
> at 
> 

[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose

2019-04-26 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16827026#comment-16827026
 ] 

Adrien Grand commented on LUCENE-8776:
--

I am sorry that this change broke your use-case, but rejecting backward offsets 
still sounds like a better trade-off to me.

> Start offset going backwards has a legitimate purpose
> -
>
> Key: LUCENE-8776
> URL: https://issues.apache.org/jira/browse/LUCENE-8776
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.6
>Reporter: Ram Venkat
>Priority: Major
>
> Here is the use case where startOffset can go backwards:
> Say there is a line "Organic light-emitting-diode glows", and I want to run 
> span queries and highlight them properly. 
> During index time, light-emitting-diode is split into three words, which 
> allows me to search for 'light', 'emitting' and 'diode' individually. The 
> three words occupy adjacent positions in the index, as 'light' adjacent to 
> 'emitting' and 'light' at a distance of two words from 'diode' need to match 
> this word. So, the order of words after splitting are: Organic, light, 
> emitting, diode, glows. 
> But, I also want to search for 'organic' being adjacent to 
> 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. 
> The way I solved this was to also generate 'light-emitting-diode' at two 
> positions: (a) In the same position as 'light' and (b) in the same position 
> as 'glows', like below:
> ||organic||light||emitting||diode||glows||
> | |light-emitting-diode| |light-emitting-diode| |
> |0|1|2|3|4|
> The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets 
> are obviously the same. This works beautifully in Lucene 5.x in both 
> searching and highlighting with span queries. 
> But when I try this in Lucene 7.6, it hits the condition "Offsets must not go 
> backwards" at DefaultIndexingChain:818. This IllegalArgumentException is 
> being thrown without any comments on why this check is needed. As I explained 
> above, startOffset going backwards is perfectly valid, to deal with word 
> splitting and span operations on these specialized use cases. On the other 
> hand, it is not clear what value is added by this check and which highlighter 
> code is affected by offsets going backwards. This same check is done at 
> BaseTokenStreamTestCase:245. 
> I see others talk about how this check found bugs in WordDelimiter etc. but 
> it also prevents legitimate use cases. Can this check be removed?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8779) MinHashFilter generates invalid terms

2019-04-26 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8779:


 Summary: MinHashFilter generates invalid terms
 Key: LUCENE-8779
 URL: https://issues.apache.org/jira/browse/LUCENE-8779
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


This problem was reported at 
https://github.com/elastic/elasticsearch/issues/41556: MinHashFilter computes a 
hash and then folds its bits into the chars of the term. However this might 
generate invalid terms that eg. end with a character that is a high surrogate.

This doesn't trigger exceptions at index time because we are lenient with 
unmatched surrogates when converting to a binary term.

{code}
  } else {
// surrogate pair
// confirm valid high surrogate
if (code < 0xDC00 && (i < end-1)) {
  int utf32 = (int) s.charAt(i+1);
  // confirm valid low surrogate and write pair
  if (utf32 >= 0xDC00 && utf32 <= 0xDFFF) { 
utf32 = (code << 10) + utf32 + SURROGATE_OFFSET;
i++;
out[upto++] = (byte)(0xF0 | (utf32 >> 18));
out[upto++] = (byte)(0x80 | ((utf32 >> 12) & 0x3F));
out[upto++] = (byte)(0x80 | ((utf32 >> 6) & 0x3F));
out[upto++] = (byte)(0x80 | (utf32 & 0x3F));
continue;
  }
}
// replace unpaired surrogate or out-of-order low surrogate
// with substitution character
out[upto++] = (byte) 0xEF;
out[upto++] = (byte) 0xBF;
out[upto++] = (byte) 0xBD;
  }
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8776) Start offset going backwards has a legitimate purpose

2019-04-24 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824974#comment-16824974
 ] 

Adrien Grand commented on LUCENE-8776:
--

I'd rather keep this check for the reasons Robert mentioned. Only enabling the 
check when offsets are indexed doesn't sound like a great trade-off to me as 
we'd be accepting a broken token stream that would only not cause trouble 
because offsets are not indexed. Can you either give light, emitting and diode 
the same offsets as "light-emitting-diode" or give the two additional 
"light-emitting-diode" tokens you are introducing the same offsets as "light" 
for the first one and "diode" for the last one?

> Start offset going backwards has a legitimate purpose
> -
>
> Key: LUCENE-8776
> URL: https://issues.apache.org/jira/browse/LUCENE-8776
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 7.6
>Reporter: Ram Venkat
>Priority: Major
>
> Here is the use case where startOffset can go backwards:
> Say there is a line "Organic light-emitting-diode glows", and I want to run 
> span queries and highlight them properly. 
> During index time, light-emitting-diode is split into three words, which 
> allows me to search for 'light', 'emitting' and 'diode' individually. The 
> three words occupy adjacent positions in the index, as 'light' adjacent to 
> 'emitting' and 'light' at a distance of two words from 'diode' need to match 
> this word. So, the order of words after splitting are: Organic, light, 
> emitting, diode, glows. 
> But, I also want to search for 'organic' being adjacent to 
> 'light-emitting-diode' or 'light-emitting-diode' being adjacent to 'glows'. 
> The way I solved this was to also generate 'light-emitting-diode' at two 
> positions: (a) In the same position as 'light' and (b) in the same position 
> as 'glows', like below:
> ||organic||light||emitting||diode||glows||
> | |light-emitting-diode| |light-emitting-diode| |
> |0|1|2|3|4|
> The positions of the two 'light-emitting-diode' are 1 and 3, but the offsets 
> are obviously the same. This works beautifully in Lucene 5.x in both 
> searching and highlighting with span queries. 
> But when I try this in Lucene 7.6, it hits the condition "Offsets must not go 
> backwards" at DefaultIndexingChain:818. This IllegalArgumentException is 
> being thrown without any comments on why this check is needed. As I explained 
> above, startOffset going backwards is perfectly valid, to deal with word 
> splitting and span operations on these specialized use cases. On the other 
> hand, it is not clear what value is added by this check and which highlighter 
> code is affected by offsets going backwards. This same check is done at 
> BaseTokenStreamTestCase:245. 
> I see others talk about how this check found bugs in WordDelimiter etc. but 
> it also prevents legitimate use cases. Can this check be removed?  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8768) Javadoc search support

2019-04-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821800#comment-16821800
 ] 

Adrien Grand commented on LUCENE-8768:
--

[~thetaphi] Any thoughts about this?

> Javadoc search support
> --
>
> Key: LUCENE-8768
> URL: https://issues.apache.org/jira/browse/LUCENE-8768
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Namgyu Kim
>Priority: Major
> Attachments: javadoc-nightly.png, new-javadoc.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Javadoc search is a new feature since Java 9.
>  ([https://openjdk.java.net/jeps/225])
> I think there is no reason not to use it if the current Lucene Java version 
> is 11.
> It can be a great help to developers looking at API documentation.
> (The elastic search also supports it now!
>  
> [https://artifacts.elastic.co/javadoc/org/elasticsearch/client/elasticsearch-rest-client/7.0.0/org/elasticsearch/client/package-summary.html])
>  
> ■ Before (Lucene Nightly Core Module Javadoc)
> !javadoc-nightly.png!
> ■ After 
> *!new-javadoc.png!*
>  
> I'll change two lines for this.
> 1) change Javadoc's noindex option from true to false.
> {code:java}
> // common-build.xml line 182
> {code}
> 2) add javadoc argument "--no-module-directories"
> {code:java}
> // common-build.xml line 2100
>  overview="@{overview}"
> additionalparam="--no-module-directories" // NEW CODE
> packagenames="org.apache.lucene.*,org.apache.solr.*"
> ...
> maxmemory="${javadoc.maxmemory}">
> {code}
> Currently there is an issue like the following link in JDK 11, so we need 
> "--no-module-directories" option.
>  ([https://bugs.openjdk.java.net/browse/JDK-8215291])
>  
> ■ How to test
> I did +"ant javadocs-modules"+ on lucene project and check Javadoc.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8154) TODO List when upgrading to Java 9 as minimum requirement

2019-04-19 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8154.
--
Resolution: Implemented

[~erickerickson] Indeed, this is superseded by LUCENE-8738.

> TODO List when upgrading to Java 9 as minimum requirement
> -
>
> Key: LUCENE-8154
> URL: https://issues.apache.org/jira/browse/LUCENE-8154
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Uwe Schindler
>Priority: Major
>  Labels: Java9
>
> This issue is just a placeholder to record stuff that needs to be done when 
> we upgrade to Java 9 as minimum requirement for running Lucene/Solr:
> - Remove {{FutureArrays}} and {{FutureObjects}} from source tree and change 
> code to use Java 9 native methods. Disable MR-JAR building (maybe only 
> disable so we can reuse at later stages)
> - Remove Java 8 bytebuffer unmapping code
> Final stuff:
> - When upgrading to Java 9, don't delete the Java 9 specific stuff for 
> Multi-Release testing from build files or smoke tester! Keep it alive, maybe 
> migrate to later Java (e.g. LTS-Java)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8765) OneDimensionBKDWriter valueCount validation didn't include leafCount

2019-04-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16820188#comment-16820188
 ] 

Adrien Grand commented on LUCENE-8765:
--

Good catch! This patch seems to have been created off an old master checkout, 
but it is still easy to apply. One minor thing I'd like to change is to use 
expectThrows in the test rather than a try/catch block if that works for you.

> OneDimensionBKDWriter valueCount validation didn't include leafCount
> 
>
> Key: LUCENE-8765
> URL: https://issues.apache.org/jira/browse/LUCENE-8765
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Affects Versions: 7.5, master (9.0)
>Reporter: ZhaoYang
>Priority: Minor
> Attachments: 
> 0001-Fix-OneDimensionBKDWriter-valueCount-validation.patch
>
>
> {{[OneDimensionBKDWriter#add|https://github.com/jasonstack/lucene-solr/blob/branch_7_5/lucene/core/src/java/org/apache/lucene/util/bkd/BKDWriter.java#L612]}}
>  checks if {{valueCount}} exceeds predefined {{totalPointCount}}, but 
> {{valueCount}} is only updated for every 
> 1024({{DEFAULT_MAX_POINTS_IN_LEAF_NODE}}) points. 
> We should include {{leafCount}} for validation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819305#comment-16819305
 ] 

Adrien Grand commented on LUCENE-8759:
--

Maybe the test could explicitly test both normal and denormal floats all the 
time? Otherwise +1.

I'm curious whether this makes any difference when running luceneutil? 

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16819154#comment-16819154
 ] 

Adrien Grand commented on LUCENE-8759:
--

I gave a try at simplifying this a bit and have the following version which 
passes your test:

{code}
if (value <= 0) {
  throw new IllegalArgumentException("Value must be > 0, got " + value);
} else if (Float.isFinite(value) == false) { // preserve infinity or NaN
  return value;
}

final int floatBits = Float.floatToIntBits(value);
final int prevFloatBits = floatBits - 1;
final int prevFloatExp = prevFloatBits >>> 23;

// delta between the mantissa of the double representation of `value` and
// the previous float value is 2^shift
int shift = 52 - 23;
if (prevFloatExp == 0x0) {
  // we need to tune `shift` for denormal floats whose mantissa doesn't have
  // an implicit leading bit
  shift += Integer.numberOfLeadingZeros(prevFloatBits) - (31-23);
}

long doubleBits = Double.doubleToLongBits(value);
doubleBits -= (1L << (shift - 1)); // half way between the current float and 
the previous one
doubleBits += (floatBits & 0x1); // add one if necessary to compensate for the 
fact that Java rounds to even in case of tie

return Double.longBitsToDouble(doubleBits);
{code}

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-16 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818980#comment-16818980
 ] 

Adrien Grand commented on LUCENE-8759:
--

I'm wondering whether the logic could be simplified: the current code takes 
care of both casting to a double and rounding to the smallest double, maybe it 
could cast the float to a double and then twiddle bits of the double value 
(maybe it doesn't help, just wondering).

> BlockMaxConjunctionScorer's simplified way of computing max scores hurts 
> performance
> 
>
> Key: LUCENE-8759
> URL: https://issues.apache.org/jira/browse/LUCENE-8759
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8759.patch
>
>
> BlockMaxConjunctionScorer computes the minimum value that the score should 
> have after each scorer in order to be able to interrupt scorer as soon as 
> possible. For instance say scorers A, B and C produce maximum scores that are 
> equal to 4, 2 and 1. If the minimum competitive score is X, then the score 
> after scoring A, B and C must be at least X, the score after scoring A and B 
> must be at least X-1 and the score after scoring A must be at least X-1-2.
> However this is made a bit more complex than that due to floating-point 
> numbers and the fact that intermediate score values are doubles which only 
> get casted to a float after all values have been summed up. In order to keep 
> things simple, BlockMaxConjunctionScore has the following comment and code
> {code}
> // Also compute the minimum required scores for a hit to be 
> competitive
> // A double that is less than 'score' might still be converted to 
> 'score'
> // when casted to a float, so we go to the previous float to avoid 
> this issue
> minScores[minScores.length - 1] = minScore > 0 ? 
> Math.nextDown(minScore) : 0;
> {code}
> It simplifies the problem by calling Math.nextDown(minScore). However this is 
> problematic because it defeats the fact that TopScoreDocCollector calls 
> setMinCompetitiveScore on the float value that is immediately greater than 
> the k-th greatest hit so far.
> nextDown(minScore) is not the value that we need. The value that we need is 
> the smallest double that converts to minScore when casted to a float, which 
> would be half-way between nextDown(minScore) and minScore. In some cases this 
> would help get better performance out of conjunctions, especially if some 
> clauses produce constant scores.
> MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818207#comment-16818207
 ] 

Adrien Grand commented on LUCENE-8738:
--

+1 on the suggested short-term workaround

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
> Attachments: LUCENE-8738-solr-CoreCloseListener.patch
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817665#comment-16817665
 ] 

Adrien Grand commented on LUCENE-8738:
--

Feel free to send it, +1 for tomorrow CEST.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
> Attachments: LUCENE-8738-solr-CoreCloseListener.patch
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817616#comment-16817616
 ] 

Adrien Grand commented on LUCENE-8738:
--

Thanks Uwe. Sorry for the late reply, +1 to send an email to dev@ to warn about 
this upcoming change.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
> Attachments: LUCENE-8738-solr-CoreCloseListener.patch
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-12 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16816072#comment-16816072
 ] 

Adrien Grand commented on LUCENE-8738:
--

Agreed it would be simpler to just call queueCoreClose() from the impl, though 
I might be missing some reasons why it's designed this way today.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
> Attachments: LUCENE-8738-solr-CoreCloseListener.patch
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8725) Make TermsQuery.SeekingTermSetTermsEnum public

2019-04-11 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16815143#comment-16815143
 ] 

Adrien Grand commented on LUCENE-8725:
--

+1 to the patch, let's maybe make it internal rather than experimental?

> Make TermsQuery.SeekingTermSetTermsEnum public
> --
>
> Key: LUCENE-8725
> URL: https://issues.apache.org/jira/browse/LUCENE-8725
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Trivial
> Fix For: 8.1
>
> Attachments: LUCENE-8725.patch
>
>
> I have come across use-cases where directly accessing {{TermsQuery}} can 
> help. If there is no objection I would like to make it public



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814823#comment-16814823
 ] 

Adrien Grand commented on LUCENE-8738:
--

[~thetaphi] I tested Eclipse indeed. I only had issue with 
MockInitialContextFactory, Eclipse complains that it tries to access classes 
from a module it doesn't have access to.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16814577#comment-16814577
 ] 

Adrien Grand commented on LUCENE-8738:
--

[~thetaphi] Do you know what still needs to be done before merging back to 
master? When we are done, ore close to being done, I plan to send an email to 
the list to ask for some more eyes on changes that I did before merging, 
especially the Observable/Observer removal.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8762) Lucene50PostingsReader should specialize reading docs+freqs with impacts

2019-04-10 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8762:


 Summary: Lucene50PostingsReader should specialize reading 
docs+freqs with impacts
 Key: LUCENE-8762
 URL: https://issues.apache.org/jira/browse/LUCENE-8762
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


Currently if you ask for impacts, we only have one implementation that is able 
to expose everything: docs, freqs, positions and offsets. In contrast, if you 
don't need impacts, we have specialization for docs+freqs, docs+freqs+positions 
and docs+freqs+positions+offsets.

Maybe we should add specialization for the docs+freqs case with impacts, which 
should be the most common case, and remove specialization for 
docs+freqs+positions when impacts are not requested?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8760) Reconsider the best way to encode postings now that we can skip non-competitive hits

2019-04-10 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8760:


 Summary: Reconsider the best way to encode postings now that we 
can skip non-competitive hits
 Key: LUCENE-8760
 URL: https://issues.apache.org/jira/browse/LUCENE-8760
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


The fact that we now skip non competitive hits has some implications to our 
postings:
 - we are now more likely to call advance vs. nextDoc
 - we are less likely to read term frequency for a given doc, since we only do 
that if the maximum score reported by impacts is competitive
 - we are less likely to read positions for a given doc, since exact phrase 
queries first check the maximum score that would be obtained with a phrase freq 
equal to the minimum of all term freqs

It might be a good opportunity to re-explore the best way to encode postings.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8759) BlockMaxConjunctionScorer's simplified way of computing max scores hurts performance

2019-04-10 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8759:


 Summary: BlockMaxConjunctionScorer's simplified way of computing 
max scores hurts performance
 Key: LUCENE-8759
 URL: https://issues.apache.org/jira/browse/LUCENE-8759
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


BlockMaxConjunctionScorer computes the minimum value that the score should have 
after each scorer in order to be able to interrupt scorer as soon as possible. 
For instance say scorers A, B and C produce maximum scores that are equal to 4, 
2 and 1. If the minimum competitive score is X, then the score after scoring A, 
B and C must be at least X, the score after scoring A and B must be at least 
X-1 and the score after scoring A must be at least X-1-2.

However this is made a bit more complex than that due to floating-point numbers 
and the fact that intermediate score values are doubles which only get casted 
to a float after all values have been summed up. In order to keep things 
simple, BlockMaxConjunctionScore has the following comment and code

{code}
// Also compute the minimum required scores for a hit to be competitive
// A double that is less than 'score' might still be converted to 
'score'
// when casted to a float, so we go to the previous float to avoid this 
issue
minScores[minScores.length - 1] = minScore > 0 ? 
Math.nextDown(minScore) : 0;
{code}

It simplifies the problem by calling Math.nextDown(minScore). However this is 
problematic because it defeats the fact that TopScoreDocCollector calls 
setMinCompetitiveScore on the float value that is immediately greater than the 
k-th greatest hit so far.

nextDown(minScore) is not the value that we need. The value that we need is the 
smallest double that converts to minScore when casted to a float, which would 
be half-way between nextDown(minScore) and minScore. In some cases this would 
help get better performance out of conjunctions, especially if some clauses 
produce constant scores.

MaxScoreSumPropagator#setMinCompetitiveScore has the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8619) Decrease I/O pressure of OfflineSorter

2019-04-10 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8619.
--
Resolution: Not A Problem

This isn't a problem anymore now that Ignacio rewrote the merging of BKD trees 
as a selection problem rathen than a sorting problem.

> Decrease I/O pressure of OfflineSorter
> --
>
> Key: LUCENE-8619
> URL: https://issues.apache.org/jira/browse/LUCENE-8619
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>
> OfflineSorter is likely I/O bound, yet it doesn't really try to relieve I/O. 
> For instance it always writes the length on 2 bytes, which is waseful when 
> used by BKDWriter since all byte[] arrays have exactly the same length. For 
> LatLonPoint, this is a 25% space overhead that we could remove.
> Doing lightweight compression on the fly might also help.
> As a data point, Ignacio told me that after indexing 60M shapes with 
> LatLonShape (1.65B triangles), the index directory was about 265GB and 
> dropped to 57GB when merging was over.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813565#comment-16813565
 ] 

Adrien Grand commented on LUCENE-8738:
--

Sorry Uwe, I don't understand what you are suggesting.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813445#comment-16813445
 ] 

Adrien Grand commented on LUCENE-8738:
--

Apparently the issue can be worked around by calling the file package-list 
locally, even though it is supposed to be called element-list with the move to 
modules. I'll push a fix shortly.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813346#comment-16813346
 ] 

Adrien Grand commented on LUCENE-8738:
--

There seems to be issues with links to the standard API. I wonder that it might 
be related to the move from package-list to element-list.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7386) Flatten nested disjunctions

2019-04-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813211#comment-16813211
 ] 

Adrien Grand commented on LUCENE-7386:
--

For the record I had to disable the verification of scores for this run of the 
benchmark since this change removes intermediate casts to float which trigger 
slight changes in the produced scores.

> Flatten nested disjunctions
> ---
>
> Key: LUCENE-7386
> URL: https://issues.apache.org/jira/browse/LUCENE-7386
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7386.patch, LUCENE-7386.patch, LUCENE-7386.patch
>
>
> Now that coords are gone it became easier to flatten nested disjunctions. It 
> might sound weird to write nested disjunctions in the first place, but 
> disjunctions can be created implicitly by other queries such as 
> more-like-this, LatLonPoint.newBoxQuery, non-scoring synonym queries, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8708) Can we simplify conjunctions of range queries automatically?

2019-04-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813199#comment-16813199
 ] 

Adrien Grand commented on LUCENE-8708:
--

Thanks Atri for giving it a try! This change is a bit too invasive to my taste 
given that this is only a nice feature to have. That said I don't really have 
ideas how to make it better... 

> Can we simplify conjunctions of range queries automatically?
> 
>
> Key: LUCENE-8708
> URL: https://issues.apache.org/jira/browse/LUCENE-8708
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: interval_range_clauses_merging0704.patch
>
>
> BooleanQuery#rewrite already has some logic to make queries more efficient, 
> such as deduplicating filters or rewriting boolean queries that wrap a single 
> positive clause to that clause.
> It would be nice to also simplify conjunctions of range queries, so that eg. 
> {{foo: [5 TO *] AND foo:[* TO 20]}} would be rewritten to {{foo:[5 TO 20]}}. 
> When constructing queries manually or via the classic query parser, it feels 
> unnecessary as this is something that the user can fix easily. However if you 
> want to implement a query parser that only allows specifying one bound at 
> once, such as Gmail ({{after:2018-12-31}} 
> https://support.google.com/mail/answer/7190?hl=en) or GitHub 
> ({{updated:>=2018-12-31}} 
> https://help.github.com/en/articles/searching-issues-and-pull-requests#search-by-when-an-issue-or-pull-request-was-created-or-last-updated)
>  then you might end up with inefficient queries if the end user specifies 
> both an upper and a lower bound. It would be nice if we optimized those 
> automatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit

2019-04-09 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813164#comment-16813164
 ] 

Adrien Grand commented on LUCENE-8753:
--

bq. BlockTree and UniformSplit had the same QPS for Term and Phrase queries. I 
didn't understand why a different behavior between a small and a large index.

I think this is expected. Query processing needs to look up the term in the 
terms dict and then process documents that contain this term. When the index 
gets larger, postings usually grow more quickly than the terms dictionary, so 
processing postings takes more time relatively compared to looking up the term 
in the terms dictionary. Term dictionary lookup performance only really matters 
for queries that have few matches (which you somehow simulated by running the 
benchmark on wikimedium500k) and updates, which are simulated by the PKLookup 
task.

> New PostingFormat - UniformSplit
> 
>
> Key: LUCENE-8753
> URL: https://issues.apache.org/jira/browse/LUCENE-8753
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Priority: Major
> Attachments: Uniform Split Technique.pdf, luceneutil.benchmark.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a proposal to add a new PostingsFormat called "UniformSplit" with 4 
> objectives:
>  - Clear design and simple code.
>  - Easily extensible, for both the logic and the index format.
>  - Light memory usage with a very compact FST.
>  - Focus on efficient TermQuery, PhraseQuery and PrefixQuery performance.
> (the pdf attached explains visually the technique in more details)
>  The principle is to split the list of terms into blocks and use a FST to 
> access the block, but not as a prefix trie, rather with a seek-floor pattern. 
> For the selection of the blocks, there is a target average block size (number 
> of terms), with an allowed delta variation (10%) to compare the terms and 
> select the one with the minimal distinguishing prefix.
>  There are also several optimizations inside the block to make it more 
> compact and speed up the loading/scanning.
> The performance obtained is interesting with the luceneutil benchmark, 
> comparing UniformSplit with BlockTree. Find it in the first comment and also 
> attached for better formatting.
> Although the precise percentages vary between runs, three main points:
>  - TermQuery and PhraseQuery are improved.
>  - PrefixQuery and WildcardQuery are ok.
>  - Fuzzy queries are clearly less performant, because BlockTree is so 
> optimized for them.
> Compared to BlockTree, FST size is reduced by 15%, and segment writing time 
> is reduced by 20%. So this PostingsFormat scales to lots of docs, as 
> BlockTree.
> This initial version passes all Lucene tests. Use “ant test 
> -Dtests.codec=UniformSplitTesting” to test with this PostingsFormat.
> Subjectively, we think we have fulfilled our goal of code simplicity. And we 
> have already exercised this PostingsFormat extensibility to create a 
> different flavor for our own use-case.
> Contributors: Juan Camilo Rodriguez Duran, Bruno Roustant, David Smiley



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8747) Allow access to Weight and submatches from Matches instances

2019-04-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811015#comment-16811015
 ] 

Adrien Grand commented on LUCENE-8747:
--

Is getWeight really needed or could you pull an iterator and then call getQuery?



> Allow access to Weight and submatches from Matches instances
> 
>
> Key: LUCENE-8747
> URL: https://issues.apache.org/jira/browse/LUCENE-8747
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8747.patch
>
>
> A Matches object currently allows access to all matching terms from a query, 
> but the structure of the matching query is flattened out, so if you want to 
> find which subqueries have matched you need to iterate over all matches, 
> collecting queries as you go.  It should be easier to get this information 
> from the parent Matches object.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8749) Proposal: Pluggable Interface for Slice Allocation Algorithm

2019-04-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810987#comment-16810987
 ] 

Adrien Grand commented on LUCENE-8749:
--

I'm wondering whether customizing slices would actually be necessary if we had 
a better default implementation of IndexSearcher#slices? For instance something 
that would create slices of similar sizes?

> Proposal: Pluggable Interface for Slice Allocation Algorithm
> 
>
> Key: LUCENE-8749
> URL: https://issues.apache.org/jira/browse/LUCENE-8749
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Atri Sharma
>Priority: Major
>
> The slice allocation method allocates one thread per segment today. If a user 
> wishes to use a different slice allocation algorithm, there is no way except 
> to make a change in IndexSearcher. This Jira proposes an interface to 
> decouple the slice allocation mechanism from IndexSearcher and allow plugging 
> in the method from an external factory (like Collectors).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8751) Weight#matches should use the scorerSupplier to create scorers

2019-04-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810980#comment-16810980
 ] 

Adrien Grand commented on LUCENE-8751:
--

+1

> Weight#matches should use the scorerSupplier to create scorers
> --
>
> Key: LUCENE-8751
> URL: https://issues.apache.org/jira/browse/LUCENE-8751
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-8751.patch
>
>
> The default implementation for Weight#matches creates a scorer to check if 
> the document matches. Since this API is per document it would be more 
> efficient to create a ScorerSupplier and then create the scorer with a 
> leadCost of 1. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8701) Speed up ToParentBlockJoinQuery when total hit count is not needed

2019-04-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810659#comment-16810659
 ] 

Adrien Grand commented on LUCENE-8701:
--

+1 it looks good. Maybe we should call searcher#rewrite after wrapping in a 
ConstantScoreQuery to be more future-proof, if we ever add checks in 
ConstancScoreQuery#createWeight that eg. the wrapped query is not a 
ConstantScoreQuery or a BoostQuery.

> Speed up ToParentBlockJoinQuery when total hit count is not needed
> --
>
> Key: LUCENE-8701
> URL: https://issues.apache.org/jira/browse/LUCENE-8701
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-8701.patch, LUCENE-8701.patch
>
>
> We spotted a regression on nested queries in the Elastisearch nightly track:
> https://elasticsearch-benchmarks.elastic.co/index.html#tracks/nested/nightly/30d
> It seems related to the fact that we propagate the TOP_SCORES score mode to 
> the child query even though we don't compute a max score in the 
> BlockJoinScorer and don't propagate the minimum score either. Since it is not 
> possible to compute a max score for a document that depends on other 
> documents (the children) we should probably force the score mode to COMPLETE 
> to build the child scorer. This should avoid the overhead of loading and 
> reading the impacts. It should also be possible to early terminate queries 
> that use the ScoreMode.None mode since in this case the score of each parent 
> document is the same.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8688) Forced merges merge more than necessary

2019-04-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16810629#comment-16810629
 ] 

Adrien Grand commented on LUCENE-8688:
--

I backported to branch_7_7 as well in case we do a new bugfix release.

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 7.7.2, 8.1, master (9.0)
>
> Attachments: LUCENE-8688.patch, LUCENE-8688.patch, LUCENE-8688.patch, 
> LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8688) Forced merges merge more than necessary

2019-04-05 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8688:
-
Fix Version/s: 7.7.2

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 7.7.2, 8.1, master (9.0)
>
> Attachments: LUCENE-8688.patch, LUCENE-8688.patch, LUCENE-8688.patch, 
> LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8753) New PostingFormat - UniformSplit

2019-04-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808813#comment-16808813
 ] 

Adrien Grand commented on LUCENE-8753:
--

I haven't read the pdf or the patch yet. The LowTerm speedup makes me think 
that term lookups are faster, but then I'm surprised PKLookup is slower. Is it 
due to the fact that it doesn't have the ability to fail lookups early like 
BlockTree?

> New PostingFormat - UniformSplit
> 
>
> Key: LUCENE-8753
> URL: https://issues.apache.org/jira/browse/LUCENE-8753
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Affects Versions: 8.0
>Reporter: Bruno Roustant
>Priority: Major
> Attachments: Uniform Split Technique.pdf, luceneutil.benchmark.txt
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This is a proposal to add a new PostingsFormat called "UniformSplit" with 4 
> objectives:
>  - Clear design and simple code.
>  - Easily extensible, for both the logic and the index format.
>  - Light memory usage with a very compact FST.
>  - Focus on efficient TermQuery, PhraseQuery and PrefixQuery performance.
> (the pdf attached explains visually the technique in more details)
>  The principle is to split the list of terms into blocks and use a FST to 
> access the block, but not as a prefix trie, rather with a seek-floor pattern. 
> For the selection of the blocks, there is a target average block size (number 
> of terms), with an allowed delta variation (10%) to compare the terms and 
> select the one with the minimal distinguishing prefix.
>  There are also several optimizations inside the block to make it more 
> compact and speed up the loading/scanning.
> The performance obtained is interesting with the luceneutil benchmark, 
> comparing UniformSplit with BlockTree. Find it in the first comment and also 
> attached for better formatting.
> Although the precise percentages vary between runs, three main points:
>  - TermQuery and PhraseQuery are improved.
>  - PrefixQuery and WildcardQuery are ok.
>  - Fuzzy queries are clearly less performant, because BlockTree is so 
> optimized for them.
> Compared to BlockTree, FST size is reduced by 15%, and segment writing time 
> is reduced by 20%. So this PostingsFormat scales to lots of docs, as 
> BlockTree.
> This initial version passes all Lucene tests. Use “ant test 
> -Dtests.codec=UniformSplitTesting” to test with this PostingsFormat.
> Subjectively, we think we have fulfilled our goal of code simplicity. And we 
> have already exercised this PostingsFormat extensibility to create a 
> different flavor for our own use-case.
> Contributors: Juan Camilo Rodriguez Duran, Bruno Roustant, David Smiley



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808610#comment-16808610
 ] 

Adrien Grand commented on LUCENE-8738:
--

+1 The other part that needs reviews is the replacement of Observable/Observer 
with utility classes from java.beans.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808503#comment-16808503
 ] 

Adrien Grand commented on LUCENE-8738:
--

+1 to disable jtidy for now

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-03 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808415#comment-16808415
 ] 

Adrien Grand commented on LUCENE-8738:
--

Currently the build prevents tidy from running if the Java version is not 1.8. 
I tried to remove this constraint to see how it goes and it fails silently. I 
can dig but I was wondering if you knew anything about it.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-04-02 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16807848#comment-16807848
 ] 

Adrien Grand commented on LUCENE-8738:
--

[~thetaphi] I'm curious if you have an idea how to make tidy work with Java 11?

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8732) Allow ConstantScoreQuery to skip counting hits

2019-03-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16804163#comment-16804163
 ] 

Adrien Grand commented on LUCENE-8732:
--

If someone has time to look, I think we have a similar issue with boolean 
queries that only consist of filter clauses. 

> Allow ConstantScoreQuery to skip counting hits
> --
>
> Key: LUCENE-8732
> URL: https://issues.apache.org/jira/browse/LUCENE-8732
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8732.patch
>
>
> We already have a ConstantScoreScorer that knows how to early terminate the 
> collection but the ConstantScoreQuery uses a private scorer that doesn't take 
> advantage of setMinCompetitiveScore. This issue is about reusing the 
> ConstantScoreScorer in the ConstantScoreQuery in order to early terminate 
> queries that don't need to compute the total number of hits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8739) ZSTD Compressor support in Lucene

2019-03-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801064#comment-16801064
 ] 

Adrien Grand commented on LUCENE-8739:
--

Zstd looks great indeed. We'd need a pure Java impl if we wanted to fold it 
into the default codec since lucene-core can't have dependencies. It was easy 
with LZ4 which is pretty straightforward, I suspect it will be a bit harder 
with zstd. Or maybe the JDK will provide bindings for zstd one day like it does 
with zlib.

> ZSTD Compressor support in Lucene
> -
>
> Key: LUCENE-8739
> URL: https://issues.apache.org/jira/browse/LUCENE-8739
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/codecs
>Reporter: Sean Torres
>Priority: Minor
>  Labels: features
>
> ZStandard has a great speed and compression ratio tradeoff. 
> ZStandard is open source compression from Facebook.
> More about ZSTD
> [https://github.com/facebook/zstd]
> [https://code.facebook.com/posts/1658392934479273/smaller-and-faster-data-compression-with-zstandard/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-03-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801036#comment-16801036
 ] 

Adrien Grand commented on LUCENE-8738:
--

You argued to keep this code around in the vote thread, which is why I haven't 
removed it. I have a slight preference for removing it, no strong feelings 
though.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-03-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800943#comment-16800943
 ] 

Adrien Grand commented on LUCENE-8738:
--

[~thetaphi] Feel free to heavy commit, I don't plan to touch it again until 
tomorrow.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8738) Bump minimum Java version requirement to 11

2019-03-25 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16800921#comment-16800921
 ] 

Adrien Grand commented on LUCENE-8738:
--

Hi Uwe, thanks for this checklist. I have started looking into what needs 
fixing to bump the minimum version requirement to Java 11 on the 
jira/LUCENE-8738 branch if you are interested to have a look or help. I haven't 
been through all items that you have listed yet.

> Bump minimum Java version requirement to 11
> ---
>
> Key: LUCENE-8738
> URL: https://issues.apache.org/jira/browse/LUCENE-8738
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Adrien Grand
>Priority: Minor
>  Labels: Java11
> Fix For: master (9.0)
>
>
> See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8738) Bump minimum version requirement to 11

2019-03-25 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8738:


 Summary: Bump minimum version requirement to 11
 Key: LUCENE-8738
 URL: https://issues.apache.org/jira/browse/LUCENE-8738
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
 Fix For: master (9.0)


See vote thread for reference: https://markmail.org/message/q6ubdycqscpl43aq.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8737) Simon poortman

2019-03-24 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8737.
--
   Resolution: Invalid
Fix Version/s: (was: Positions Branch)

I'm assuming this was a mismanipulation?

> Simon poortman
> --
>
> Key: LUCENE-8737
> URL: https://issues.apache.org/jira/browse/LUCENE-8737
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 8.0
>Reporter: Simon poortman
>Priority: Major
>  Labels: Brown
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8732) Allow ConstantScoreQuery to skip counting hits

2019-03-22 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798867#comment-16798867
 ] 

Adrien Grand commented on LUCENE-8732:
--

+1 to the approach. Should we keep returning the sub scorer via getChildren()?

> Allow ConstantScoreQuery to skip counting hits
> --
>
> Key: LUCENE-8732
> URL: https://issues.apache.org/jira/browse/LUCENE-8732
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
> Attachments: LUCENE-8732.patch
>
>
> We already have a ConstantScoreScorer that knows how to early terminate the 
> collection but the ConstantScoreQuery uses a private scorer that doesn't take 
> advantage of setMinCompetitiveScore. This issue is about reusing the 
> ConstantScoreScorer in the ConstantScoreQuery in order to early terminate 
> queries that don't need to compute the total number of hits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8733) retrospectively add @since javadocs for 'intervals' classes

2019-03-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798313#comment-16798313
 ] 

Adrien Grand commented on LUCENE-8733:
--

I have mixed feelings. I have never found the "@since" tag very helpful, though 
I can understand how some users could find it useful on important APIs, eg. 
maybe things like BooleanQuery, FeatureField, LongPoint. However here we are 
updating javadocs of package-private classes in the sandbox, so it's unclear to 
me to whom it is going to be helpful? I'm not opposed to adding those, but a 
bit surprised. I'd be curious to better understand your motivation?

> retrospectively add @since javadocs for 'intervals' classes
> ---
>
> Key: LUCENE-8733
> URL: https://issues.apache.org/jira/browse/LUCENE-8733
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8733-branch-7-4.patch
>
>
> LUCENE-8196 started 'intervals' and subsequent tickets extended it.
> This ticket proposes to retrospectively add {{@since X.Y}} javadocs for all 
> the classes (and to then going forward perhaps continue to add them).
> And perhaps we could have an 'intervals' or similar JIRA components choice 
> too?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8733) retrospectively add @since javadocs for 'intervals' classes

2019-03-21 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16798295#comment-16798295
 ] 

Adrien Grand commented on LUCENE-8733:
--

Most of the classes whose documentation is updated are package-private, so this 
will never show up in javadocs?

> retrospectively add @since javadocs for 'intervals' classes
> ---
>
> Key: LUCENE-8733
> URL: https://issues.apache.org/jira/browse/LUCENE-8733
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8733-branch-7-4.patch
>
>
> LUCENE-8196 started 'intervals' and subsequent tickets extended it.
> This ticket proposes to retrospectively add {{@since X.Y}} javadocs for all 
> the classes (and to then going forward perhaps continue to add them).
> And perhaps we could have an 'intervals' or similar JIRA components choice 
> too?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8731) mark MultiTermAwareComponent as deprecated (7.x and 7.7 only)

2019-03-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796413#comment-16796413
 ] 

Adrien Grand commented on LUCENE-8731:
--

Let's use the "@Deprecated" annotation on the class in addition to the javadoc 
tag?

> mark MultiTermAwareComponent as deprecated (7.x and 7.7 only)
> -
>
> Key: LUCENE-8731
> URL: https://issues.apache.org/jira/browse/LUCENE-8731
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Trivial
> Fix For: 7.7.2
>
> Attachments: LUCENE-8731.patch
>
>
> {{MultiTermAwareComponent}} is one of several hundred classes tagged as 
> {{@lucene.experimental}} i.e. it is understood that it might change in 
> incompatible ways in the next release. Since in this case it is specifically 
> known from LUCENE-8497 that the class is removed in 8.0 onwards I think it 
> would be nice to mark it as deprecated, sign-posting readers to replacement 
> details. Proposed patch to follow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-13334) Expose FeatureField

2019-03-19 Thread Adrien Grand (JIRA)
Adrien Grand created SOLR-13334:
---

 Summary: Expose FeatureField
 Key: SOLR-13334
 URL: https://issues.apache.org/jira/browse/SOLR-13334
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Adrien Grand


It'd be nice to expose Lucene's FeatureField. This is especially useful in 
conjunction with SOLR-13289 since FeatureField can skip non-competitive hits, 
which makes it realistic to apply custom scoring over an entire collection 
rather than only via a rescorer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2019-03-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16796063#comment-16796063
 ] 

Adrien Grand commented on SOLR-13289:
-

+1 to expose these optimizations.

I like option 1 better too. I think it's fine to expose this optimization as an 
opt-in for now, but in the longer term we should consider making it enabled by 
default? Otherwise users might never learn about it even if they are fine with 
the trade-off?

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: SOLR-13289.patch
>
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7958) Give TermInSetQuery better advancing capabilities

2019-03-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795905#comment-16795905
 ] 

Adrien Grand commented on LUCENE-7958:
--

Thanks for sharing [~hermes]. I should resurrect the above patch when I have 
some time!

> Give TermInSetQuery better advancing capabilities
> -
>
> Key: LUCENE-7958
> URL: https://issues.apache.org/jira/browse/LUCENE-7958
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-7958.patch
>
>
> If a TermInSetQuery has more than 15 matching terms on a given segment, then 
> we consume all postings lists into a bitset and return an iterator over this 
> bitset as a scorer. I would like to change it so that we keep the 15 postings 
> lists that have the largest document frequencies and consume all other 
> (shorter) postings lists into a bitset. In the end we return a disjunction 
> over the N longest postings lists and the bit set. This could help consume 
> fewer doc ids if the TermInSetQuery is intersected with other queries, 
> especially if the document frequencies of the terms it wraps have a zipfian 
> distribution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8150) Remove references to segments.gen.

2019-03-19 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-8150:
-
Attachment: LUCENE-8150.patch

> Remove references to segments.gen.
> --
>
> Key: LUCENE-8150
> URL: https://issues.apache.org/jira/browse/LUCENE-8150
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8150.patch, LUCENE-8150.patch
>
>
> This was the way we wrote pending segment files before we switch to 
> {{pending_segments_N}} in LUCENE-5925.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8150) Remove references to segments.gen.

2019-03-19 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16795886#comment-16795886
 ] 

Adrien Grand commented on LUCENE-8150:
--

Here is a new patch based on the above comments. The "segments.gen" string only 
exists in SegmentInfos now.

> Remove references to segments.gen.
> --
>
> Key: LUCENE-8150
> URL: https://issues.apache.org/jira/browse/LUCENE-8150
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8150.patch, LUCENE-8150.patch
>
>
> This was the way we wrote pending segment files before we switch to 
> {{pending_segments_N}} in LUCENE-5925.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8150) Remove references to segments.gen.

2019-03-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793604#comment-16793604
 ] 

Adrien Grand commented on LUCENE-8150:
--

I could work around it now that I know of this trap, but if it could be fixed, 
that would be even better. :)

> Remove references to segments.gen.
> --
>
> Key: LUCENE-8150
> URL: https://issues.apache.org/jira/browse/LUCENE-8150
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8150.patch
>
>
> This was the way we wrote pending segment files before we switch to 
> {{pending_segments_N}} in LUCENE-5925.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8150) Remove references to segments.gen.

2019-03-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793597#comment-16793597
 ] 

Adrien Grand commented on LUCENE-8150:
--

Indeed! I think it's due to the fact that I'm always filtering by open issues 
on jirasearch, and it filters out issues that are marked as "patch available". 
I'll bring the patch up-to-date.

> Remove references to segments.gen.
> --
>
> Key: LUCENE-8150
> URL: https://issues.apache.org/jira/browse/LUCENE-8150
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8150.patch
>
>
> This was the way we wrote pending segment files before we switch to 
> {{pending_segments_N}} in LUCENE-5925.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-8688) Forced merges merge more than necessary

2019-03-15 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8688.
--
   Resolution: Fixed
Fix Version/s: master (9.0)
   8.1

Merged. Thanks [~original-brownbear]!

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: 8.1, master (9.0)
>
> Attachments: LUCENE-8688.patch, LUCENE-8688.patch, LUCENE-8688.patch, 
> LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8725) Make TermsQuery public

2019-03-15 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793420#comment-16793420
 ] 

Adrien Grand commented on LUCENE-8725:
--

I think it was done this way to make sure that equals/hashCode remain fast, 
since the query cache calls these methods implicitly quite frequently. If we 
were to do this again today, maybe we'd consider disabling caching instead 
(Weight#isCacheable).

> Make TermsQuery public
> --
>
> Key: LUCENE-8725
> URL: https://issues.apache.org/jira/browse/LUCENE-8725
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Noble Paul
>Priority: Trivial
> Fix For: 8.1
>
>
> I have come across use-cases where directly accessing {{TermsQuery}} can 
> help. If there is no objection I would like to make it public



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8727) IndexSearcher#search(Query,int) should operate on a shared priority queue when configured with an executor

2019-03-14 Thread Adrien Grand (JIRA)
Adrien Grand created LUCENE-8727:


 Summary: IndexSearcher#search(Query,int) should operate on a 
shared priority queue when configured with an executor
 Key: LUCENE-8727
 URL: https://issues.apache.org/jira/browse/LUCENE-8727
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand


If IndexSearcher is configured with an executor, then the top docs for each 
slice are computed separately before being merged once the top docs for all 
slices are computed. With block-max WAND this is a bit of a waste of resources: 
it would be better if an increase of the min competitive score could help skip 
non-competitive hits on every slice and not just the current one.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8688) Forced merges merge more than necessary

2019-03-14 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792497#comment-16792497
 ] 

Adrien Grand commented on LUCENE-8688:
--

bq. otherwise the case of merging from a single segments with deletes to a 
single segment becomes a NOOP when before we'd merge here in any case and 
expunge the deletes which I guess we want to keep?

I think so.

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8688.patch, LUCENE-8688.patch, LUCENE-8688.patch, 
> LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8725) Make TermsQuery public

2019-03-14 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792478#comment-16792478
 ] 

Adrien Grand commented on LUCENE-8725:
--

Is my assumption correct that you are interested in this query because you 
don't want to copy the content of the BytesRefHash and because your query might 
match most of the ids that exist in the index?
This query is cheating a bit in its equals/hashCode to be cacheable without 
having to actually compare the BytesRefHash by comparing the join query, the 
join field and the index it has been generated on instead. In my opinion it'd 
be better to fork the query. Maybe the one piece that we can make public 
somewhere is the SeekingTermSetTermsEnum?

 

> Make TermsQuery public
> --
>
> Key: LUCENE-8725
> URL: https://issues.apache.org/jira/browse/LUCENE-8725
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Noble Paul
>Priority: Trivial
> Fix For: 8.1
>
>
> I have come across use-cases where directly accessing {{TermsQuery}} can 
> help. If there is no objection I would like to make it public



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8688) Forced merges merge more than necessary

2019-03-13 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791557#comment-16791557
 ] 

Adrien Grand commented on LUCENE-8688:
--

Thanks for iterating on this, I think there are still some issues wrt not 
running the final merge if there are on-going merges:
 - It feels wrong that the code block under the "This is the special case of 
merging down to one segment" comment runs _before_ we check whether the merge 
is a final merge. (Do we need this special case at all?)
 - If there are less than maxMergeAtOnceExplicit segments in the index, we are 
doing the right thing, but if there are eg. maxMergeAtOnceExplicit +3 segments 
in the index, maxSegmentCount is 2, and a merge is ongoing, then we will run 
one merge of maxMergeAtOnceExplicit segments and another one of 3 segments, 
which feels wrong: if there is an ongoing merge, we should only run merges of 
the maximum size, ie. that either merge maxMergeAtOnceExplicit segments 
together, or that create a segment that is close to the maximum segment size. 
(This is why the check for the final merge was done after the loop in prior 
versions in TieredMergePolicy.)

> Forced merges merge more than necessary
> ---
>
> Key: LUCENE-8688
> URL: https://issues.apache.org/jira/browse/LUCENE-8688
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-8688.patch, LUCENE-8688.patch, LUCENE-8688.patch
>
>
> A user reported some surprise after the upgrade to Lucene 7.5 due to changes 
> to how forced merges are selected when maxSegmentCount is greater than 1.
> Before 7.5 forceMerge used to pick up the least amount of merging that would 
> result in an index that has maxSegmentCount segments at most. Now that we 
> share the same logic as regular merges, we are almost sure to pick a 
> maxMergeAtOnceExplicit-segments merge (30 segments) given that merges that 
> have more segments usually score better. This is due to the fact that natural 
> merges assume that merges that run now save work for later, so the more 
> segments get merged, the better. This assumption doesn't hold for forced 
> merges that should run on read-only indices, so there won't be any future 
> merging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >