[jira] [Resolved] (LUCENE-7580) Spans tree scoring

2018-04-02 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot resolved LUCENE-7580.
--
Resolution: Won't Fix

Resolved: not enough interest. I'll keep the github branches available for now.

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.0
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: Elschot20170326Counting.pdf, LUCENE-7580.patch, 
> LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7613) Update Surround query language

2018-04-02 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot resolved LUCENE-7613.
--
Resolution: Won't Fix

Resolved: not enough interest.

> Update Surround query language
> --
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613-spanstree.patch, LUCENE-7613.patch, 
> LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7615) SpanSynonymQuery

2018-04-02 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot resolved LUCENE-7615.
--
Resolution: Won't Fix

Resolved: not enough interest.

> SpanSynonymQuery
> 
>
> Key: LUCENE-7615
> URL: https://issues.apache.org/jira/browse/LUCENE-7615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: 7.0
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7615.patch, LUCENE-7615.patch
>
>
> A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-6596) Make width of unordered near spans consistent with ordered

2017-08-20 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-6596.

Resolution: Fixed

Closing, not enough interest.

> Make width of unordered near spans consistent with ordered
> --
>
> Key: LUCENE-6596
> URL: https://issues.apache.org/jira/browse/LUCENE-6596
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 6.0
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.0
>
> Attachments: LUCENE-6596.patch, LUCENE-6596.patch
>
>
> Use actual slop for width in NearSpansUnordered.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-6453) Specialize SpanPositionQueue similar to DisiPriorityQueue

2017-08-20 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-6453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-6453.

Resolution: Fixed

Closing, not enough interest.

> Specialize SpanPositionQueue similar to DisiPriorityQueue
> -
>
> Key: LUCENE-6453
> URL: https://issues.apache.org/jira/browse/LUCENE-6453
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x, 6.0
>
> Attachments: LUCENE-6453.patch, LUCENE-6453.patch, LUCENE-6453.patch, 
> LUCENE-6453.patch
>
>
> Inline the position comparison function



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-7602) Fix compiler warnings for ant clean compile

2017-08-20 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-7602.


> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch, 
> LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-5687) Split off SinkTokenStream from TeeSinkTokenFilter (was add PrefillTokenStream ...)

2017-04-19 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-5687.

Resolution: Won't Fix

Not enough interest

> Split off SinkTokenStream from TeeSinkTokenFilter (was add PrefillTokenStream 
> ...)
> --
>
> Key: LUCENE-5687
> URL: https://issues.apache.org/jira/browse/LUCENE-5687
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Affects Versions: 4.9
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 4.9
>
> Attachments: LUCENE-5687.patch, LUCENE-5687.patch, LUCENE-5687.patch, 
> LUCENE-5687.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-7068) Retrieve ranks

2017-04-19 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-7068.

Resolution: Won't Fix

Not enough interest

> Retrieve ranks
> --
>
> Key: LUCENE-7068
> URL: https://issues.apache.org/jira/browse/LUCENE-7068
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/other
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7068.patch, LUCENE-7068.patch, LUCENE-7068.patch
>
>
> Join TopDocs by docs, keep the result ranks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-7471) Simplify NearSpansOrdered

2017-04-19 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-7471.

Resolution: Duplicate

Duplicate of LUCENE-7715, fixed and closed.

> Simplify NearSpansOrdered
> -
>
> Key: LUCENE-7471
> URL: https://issues.apache.org/jira/browse/LUCENE-7471
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7471.patch
>
>
> Extend the span positions priority queue, remove SpansCell.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-7602) Fix compiler warnings for ant clean compile

2017-04-19 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot resolved LUCENE-7602.
--
Resolution: Won't Fix

Not enough interest

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch, 
> LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

2017-03-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942393#comment-15942393
 ] 

Paul Elschot edited comment on LUCENE-7580 at 3/26/17 6:51 PM:
---

I just pushed two branches to github, pullable as:

git pull https://github.com/PaulElschot/lucene-solr lucene7580-20170326

and

git pull https://github.com/PaulElschot/lucene-solr lucene7580report-20170326

The lucene7580-20170326 branch is an update of the previous pull request with a 
few minor improvements. Most notable is putting SpansTreeWeight into its own 
source file.

The  lucene7580report-20170326 branch is on top of the  lucene7580-20170326 
branch, with the addition of the tex sources for a report on this issue.
I'll attach the pdf shortly here.



was (Author: paul.elsc...@xs4all.nl):
I just pushed to branches to github, pullable as:

git pull https://github.com/PaulElschot/lucene-solr lucene7580-20170326

and

git pull https://github.com/PaulElschot/lucene-solr lucene7580report-20170326

The lucene7580-20170326 branch is an update of the previous pull request with a 
few minor improvements. Most notable is putting SpansTreeWeight into its own 
source file.

The  lucene7580report-20170326 branch is on top of the  lucene7580-20170326 
branch, with the addition of the tex sources for a report on this issue.
I'll attach the pdf shortly here.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: Elschot20170326Counting.pdf, LUCENE-7580.patch, 
> LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

2017-03-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942393#comment-15942393
 ] 

Paul Elschot edited comment on LUCENE-7580 at 3/26/17 6:51 PM:
---

I just pushed to branches to github, pullable as:

git pull https://github.com/PaulElschot/lucene-solr lucene7580-20170326

and

git pull https://github.com/PaulElschot/lucene-solr lucene7580report-20170326

The lucene7580-20170326 branch is an update of the previous pull request with a 
few minor improvements. Most notable is putting SpansTreeWeight into its own 
source file.

The  lucene7580report-20170326 branch is on top of the  lucene7580-20170326 
branch, with the addition of the tex sources for a report on this issue.
I'll attach the pdf shortly here.



was (Author: paul.elsc...@xs4all.nl):
I just pushed to branches to github, pullable as:

git pull https://github.com/PaulElschot/lucene-solr lucene7580-20170326

and

git pull https://github.com/PaulElschot/lucene-solr lucene7580report-20170326

The lucene7580-20170326 branch an update of the previous pull request with a 
few minor improvements.
Most notable is putting SpansTreeWeight into its own source file.

The  lucene7580report-20170326 is on top of the  lucene7580-20170326 branch, 
with the addition of the tex sources for a report on this issue.
I'll attach the pdf shortly here.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: Elschot20170326Counting.pdf, LUCENE-7580.patch, 
> LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7580) Spans tree scoring

2017-03-26 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7580:
-
Attachment: Elschot20170326Counting.pdf

Report of 26 March 2017, generated from the lucene7580report-20170326 branch 
and renamed to include the full date in the name.

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: Elschot20170326Counting.pdf, LUCENE-7580.patch, 
> LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2017-03-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15942393#comment-15942393
 ] 

Paul Elschot commented on LUCENE-7580:
--

I just pushed to branches to github, pullable as:

git pull https://github.com/PaulElschot/lucene-solr lucene7580-20170326

and

git pull https://github.com/PaulElschot/lucene-solr lucene7580report-20170326

The lucene7580-20170326 branch an update of the previous pull request with a 
few minor improvements.
Most notable is putting SpansTreeWeight into its own source file.

The  lucene7580report-20170326 is on top of the  lucene7580-20170326 branch, 
with the addition of the tex sources for a report on this issue.
I'll attach the pdf shortly here.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch, 
> LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7398) Nested Span Queries are buggy

2017-03-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892972#comment-15892972
 ] 

Paul Elschot edited comment on LUCENE-7398 at 3/2/17 8:54 PM:
--

One way to view the problem is that when span end positions are used to 
determine the slop, it becomes impossible to determine an order for moving the 
subspans to a next position.

So one direction out of this could be: use NearSpans that determines the slop 
only by the start positions of the subspans. That leaves only the cases in 
which the subspans can start (and maybe also end) at the same position.
To make sure that all the subspans move forward after a match we could move 
them all forward until after the current match, and while doing that also 
count/collect them for scoring/highlighting as long as they are within the 
match. That should solve the bug reported here, which is about scoring a missed 
matching occurrence.

This limits the required slop to using only the starting positions of the 
subspans. Could this work?



was (Author: paul.elsc...@xs4all.nl):
On way to view the problem is that when span end positions are used to 
determine the slop, it becomes impossible to determine an order for moving the 
subspans to a next position.

So one direction out of this could be: use NearSpans that determines the slop 
only by the start positions of the subspans. That leaves only the cases in 
which the subspans can start (and maybe also end) at the same position.
To make sure that all the subspans move forward after a match we could move 
them all forward until after the current match, and while doing that also 
count/collect them for scoring/highlighting as long as they are within the 
match. That should solve the bug reported here, which is about scoring a missed 
matching occurrence.

This limits the required slop to using only the starting positions of the 
subspans. Could this work?


> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7398) Nested Span Queries are buggy

2017-03-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892972#comment-15892972
 ] 

Paul Elschot commented on LUCENE-7398:
--

On way to view the problem is that when span end positions are used to 
determine the slop, it becomes impossible to determine an order for moving the 
subspans to a next position.

So one direction out of this could be: use NearSpans that determines the slop 
only by the start positions of the subspans. That leaves only the cases in 
which the subspans can start (and maybe also end) at the same position.
To make sure that all the subspans move forward after a match we could move 
them all forward until after the current match, and while doing that also 
count/collect them for scoring/highlighting as long as they are within the 
match. That should solve the bug reported here, which is about scoring a missed 
matching occurrence.

This limits the required slop to using only the starting positions of the 
subspans. Could this work?


> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7715) Simplify NearSpansUnordered

2017-03-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892930#comment-15892930
 ] 

Paul Elschot commented on LUCENE-7715:
--

Thanks Adrien.

> Simplify NearSpansUnordered
> ---
>
> Key: LUCENE-7715
> URL: https://issues.apache.org/jira/browse/LUCENE-7715
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-7715.patch
>
>
> {code}
> git diff --stat master...
>  .../spans/NearSpansUnordered.java   | 211 -
>  1 file changed, 59 insertions(+), 152 deletions(-)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7715) Simplify NearSpansUnordered

2017-02-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15888703#comment-15888703
 ] 

Paul Elschot commented on LUCENE-7715:
--

bq. ... how it deals with the initial state that all sub spans have a start 
position of -1.

There is no need for that, the intermediate data structure is a priority queue 
that is not a Spans itself.

If the names of this priority queue (SpanTotalLengthEndPositionWindow) and its 
methods (startDocument/nextPosition) are misleading, they need to be improved.

The core search tests and precommit pass.


> Simplify NearSpansUnordered
> ---
>
> Key: LUCENE-7715
> URL: https://issues.apache.org/jira/browse/LUCENE-7715
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7715.patch
>
>
> {code}
> git diff --stat master...
>  .../spans/NearSpansUnordered.java   | 211 -
>  1 file changed, 59 insertions(+), 152 deletions(-)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7715) Simplify NearSpansUnordered

2017-02-27 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7715:
-
Summary: Simplify NearSpansUnordered  (was: SImplify NearSpansUnordered)

> Simplify NearSpansUnordered
> ---
>
> Key: LUCENE-7715
> URL: https://issues.apache.org/jira/browse/LUCENE-7715
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7715.patch
>
>
> {code}
> git diff --stat master...
>  .../spans/NearSpansUnordered.java   | 211 -
>  1 file changed, 59 insertions(+), 152 deletions(-)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7715) SImplify NearSpansUnordered

2017-02-27 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7715:
-
Attachment: LUCENE-7715.patch

> SImplify NearSpansUnordered
> ---
>
> Key: LUCENE-7715
> URL: https://issues.apache.org/jira/browse/LUCENE-7715
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7715.patch
>
>
> {code}
> git diff --stat master...
>  .../spans/NearSpansUnordered.java   | 211 -
>  1 file changed, 59 insertions(+), 152 deletions(-)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7715) SImplify NearSpansUnordered

2017-02-27 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7715:


 Summary: SImplify NearSpansUnordered
 Key: LUCENE-7715
 URL: https://issues.apache.org/jira/browse/LUCENE-7715
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: master (7.0)
Reporter: Paul Elschot
Priority: Minor


{code}
git diff --stat master...
 .../spans/NearSpansUnordered.java   | 211 -
 1 file changed, 59 insertions(+), 152 deletions(-)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

2017-02-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720077#comment-15720077
 ] 

Paul Elschot edited comment on LUCENE-7580 at 2/26/17 8:08 PM:
---

What SpansTreeQuery does not do, and some rough edges:

The SpansDocScorer objects do the match recording and scoring, and there is one 
for each Spans.
These SpansDocScorer objects might be merged into their Spans to reduce the 
number of objects.
Related: how to deal with the same term occurring in more than one subquery? 
See also LUCENE-7398.

Normally the term frequency score has a diminishing contribution for extra 
occurrences.
In the patch the slop factors for a term are applied in decreasing order on 
these diminished contributions.
This requires sorting of the slop factors.
Sorting the slop factors could be avoided when an actual score of a single term 
occurrence was available.
In that case the given slop factor could be used as a weight on that score.
It might be possible to estimate an actual score for a single term occurrence
from the distances to other occurrences of the same term.
Similarly, the decreasing term frequency contributions can be seen as a 
proximity weighting for the same term (or subquery):
the closer a term occurs to itself, the smaller its contribution.
This might be refined by using the actual distances to other the term 
occurrences (or subquery occurrences)
to provide a weight for each term occurrence. This is unusual because the 
weight decreases for smaller distances.

The slop factor from the Similarity may need to be adapted because of the way 
it is combined here
with diminishing term contributions.

Another use of a score of each term occurrence could be to use the absolute 
term position
to influence the score, possibly in combination with the field length.

There is an assert in TermSpansDocScorer.docScore() that verifies that
the smallest occurring slop factor is at least as large as the non matching 
slop factor.
This condition is necessary for consistency.
Instead of using this assert, this condition might be enforced by somehow
automatically determining the non matching slop factor.

This is a prototype. No profiling has been done, it will take more CPU, but I 
have no idea how much.
Garbage collection might be affected by the reference cycles between the 
SpansDocScorers
and their Spans.

Since this allows weighting of subqueries, it might be possible to implement 
synonym scoring
in SpanOrQuery by providing good subweights, and wrapping the whole thing in 
SpansTreeQuery.
The only thing that might still be needed then is a SpansDocScorer that applies 
the SimScorer.score()
over the total term frequency of the synonyms in a document.

SpansTreeScorer multiplies the slop factor for nested near queries at each 
level.
Alternatively a minimum distance could be passed down.
This would need to change recordMatch(float slopFactor) to recordMatch(int 
minDistance).
Would minDistance make sense, or is there a better distance?

What is a good way to test whether the score values from SpansTreeQuery 
actually improve on
the score values from the current SpanScorer?

There are no tests for SpanFirstQuery/SpanContainingQuery/SpanWithinQuery.
These tests are not there because these queries provide FilterSpans and that is 
already supported for SpanNotQuery.

The explain() method is not implemented for SpansTreeQuery.
This should be doable with an explain() method added to SpansTreeScorer to 
provide the explanations.

There is no support for PayloadSpanQuery.
PayloadSpanQuery is not in here because it is not in the core module.
I think it can fit here in because PayloadSpanQuery also scores per matching 
term occurrence.
Then Spans.doStartCurrentDoc() and Spans.doCurrentSpans() could be removed.

In case this is acceptable as a good way to score Spans:
Spans.width() and Scorer.freq() and SpansDocScorer.docMatchFreq() might be 
removed.
Would it make sense to implement child Scorers in the tree of SpansDocScorer 
objects?



was (Author: paul.elsc...@xs4all.nl):
What SpansTreeQuery does not do, and some rough edges:

The SpansDocScorer objects do the match recording and scoring, and there is one 
for each Spans.
These SpansDocScorer objects might be merged into their Spans to reduce the 
number of objects.
Related: how to deal with the same term occurring in more than one subquery? 
See also LUCENE-7398.

Normally the term frequency score has a diminishing contribution for extra 
occurrences.
In the patch the slop factors for a term are applied in decreasing order on 
these diminished contributions.
This requires sorting of the slop factors.
Sorting the slop factors could be avoided when an actual score of a single term 
occurrence was available.
In that case the given slop factor could be used as a weight on that score.
It might be possible to estimate an actual score for a single term 

[jira] [Commented] (LUCENE-7682) UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery

2017-02-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15884727#comment-15884727
 ] 

Paul Elschot commented on LUCENE-7682:
--

For queries requiring t1 near t2 with enough slop, t1 t1 t2 matches twice, but 
t1 t2 t2 matches only once. This behaviour was introduced with the lazy 
iteration, see:
https://issues.apache.org/jira/browse/LUCENE-6537?focusedCommentId=14579537=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579537

This is also a problem for LUCENE-7580 where matching term occurrences are 
scored: there the second occurrence of t2 will not influence the score because 
it is never reported as a match.

LUCENE-7398 is probably also of interest here.

To improve highlighting and scoring, we will probably have to rethink how 
matches of span queries are reported.
One way could be to report all occurrences in the matching window, and forward 
all the sub-spans to after the matching window.
Would that be feasible?


> UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery
> ---
>
> Key: LUCENE-7682
> URL: https://issues.apache.org/jira/browse/LUCENE-7682
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/highlighter
>Reporter: Michael Braun
>
> Original text: "Something for protecting wildlife feed in a feed thing."
> Query is:
>SpanNearQuery with Slop 9 - in order - 
>   1. SpanTermQuery(wildlife)
>   2. SpanTermQuery(feed)
> This should highlight both instances of "feed" since they are both within 
> slop of 9 of "wildlife". However, only the first instance is highlighted. 
> This occurs with unordered SpanNearQuery as well.  Test below replicates. 
> Affects both the current 6.x line and master.
> Test that fits within TestUnifiedHighlighterMTQ:
> {code}
>   public void testOrderedSpanNearQueryWithDupeTerms() throws Exception {
> RandomIndexWriter iw = new RandomIndexWriter(random(), dir, 
> indexAnalyzer);
> Document doc = new Document();
> doc.add(new Field("body", "Something for protecting wildlife feed in a 
> feed thing.", fieldType));
> doc.add(newTextField("id", "id", Field.Store.YES));
> iw.addDocument(doc);
> IndexReader ir = iw.getReader();
> iw.close();
> IndexSearcher searcher = newSearcher(ir);
> UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, 
> indexAnalyzer);
> int docID = searcher.search(new TermQuery(new Term("id", "id")), 
> 1).scoreDocs[0].doc;
> SpanTermQuery termOne = new SpanTermQuery(new Term("body", "wildlife"));
> SpanTermQuery termTwo = new SpanTermQuery(new Term("body", "feed"));
> SpanNearQuery topQuery = new SpanNearQuery.Builder("body", true)
> .setSlop(9)
> .addClause(termOne)
> .addClause(termTwo)
> .build();
> int[] docIds = new int[] {docID};
> String snippets[] = highlighter.highlightFields(new String[] {"body"}, 
> topQuery, docIds, new int[] {2}).get("body");
> assertEquals(1, snippets.length);
> assertEquals("Something for protecting wildlife feed in a 
> feed thing.", snippets[0]);
> ir.close();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

2017-02-01 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720073#comment-15720073
 ] 

Paul Elschot edited comment on LUCENE-7580 at 2/1/17 1:36 PM:
--

Some related issues, thanks for these discussions:

LUCENE-533
LUCENE-2878
LUCENE-2879
LUCENE-2880
LUCENE-6226
LUCENE-6371
LUCENE-6466
LUCENE-7398


Some related web pages:

http://www.gossamer-threads.com/lists/lucene/java-user/33902 March 2006.

http://www.gossamer-threads.com/lists/lucene/java-user/53027 September 2007, 
suggests to:
"recurse the spans tree to compose a score based on the type of subqueries 
(near, and, or, not) and what matched."

http://www.gossamer-threads.com/lists/lucene/java-user/60103 April 2008.

http://www.flax.co.uk/blog/2016/04/26/can-make-contribution-apache-solr-core-development/
 see point 4.

How to use BM25:
http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/




was (Author: paul.elsc...@xs4all.nl):

Some related issues, thanks for these discussions:

LUCENE-533
LUCENE-2878
LUCENE-2879
LUCENE-2880
LUCENE-6371
LUCENE-6466
LUCENE-7398


Some related web pages:

http://www.gossamer-threads.com/lists/lucene/java-user/33902 March 2006.

http://www.gossamer-threads.com/lists/lucene/java-user/53027 September 2007, 
suggests to:
"recurse the spans tree to compose a score based on the type of subqueries 
(near, and, or, not) and what matched."

http://www.gossamer-threads.com/lists/lucene/java-user/60103 April 2008.

http://www.flax.co.uk/blog/2016/04/26/can-make-contribution-apache-solr-core-development/
 see point 4.

How to use BM25:
http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/



> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch, 
> LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-7633) Rename Terms to IndexedField (was to FieldTerms)

2017-02-01 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot closed LUCENE-7633.

Resolution: Won't Fix

Meanwhile there is a plan for 7.0, and that might be an opportunity here.

I'm still closing this because the disadvantages seem to outweigh advantages.

> Rename Terms to IndexedField (was to FieldTerms)
> 
>
> Key: LUCENE-7633
> URL: https://issues.apache.org/jira/browse/LUCENE-7633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7633) Rename Terms to IndexedField (was to FieldTerms)

2017-01-28 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844031#comment-15844031
 ] 

Paul Elschot commented on LUCENE-7633:
--

The problem I have with the name Terms is that it is too general, it should be 
a little more verbose.
I think IndexedField is a good name for what it provides: term enumeration and 
term statistics.
The other renames just follow this.

In doing this I learned quite a bit about flexible indexing, and I really like 
the class structure.
With that behind me, I don't really need this renaming any more...

The renames are straightforward, so adopting them elsewhere should be easy.
In case these names are actually preferred, I'd gladly add backward compatible 
class names.
That would probably boil down to inserting classes with the current names as 
deprecated superclasses.

> Rename Terms to IndexedField (was to FieldTerms)
> 
>
> Key: LUCENE-7633
> URL: https://issues.apache.org/jira/browse/LUCENE-7633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7633) Rename Terms to IndexedField (was to FieldTerms)

2017-01-28 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7633:
-
Summary: Rename Terms to IndexedField (was to FieldTerms)  (was: Rename 
Terms to FieldTerms)

> Rename Terms to IndexedField (was to FieldTerms)
> 
>
> Key: LUCENE-7633
> URL: https://issues.apache.org/jira/browse/LUCENE-7633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7637) TermInSetQuery should require that all terms come from the same field

2017-01-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15833005#comment-15833005
 ] 

Paul Elschot commented on LUCENE-7637:
--

There is a pull request from me at LUCENE-7624 that landed there because that 
issue was mentioned in the commit message.

> TermInSetQuery should require that all terms come from the same field
> -
>
> Key: LUCENE-7637
> URL: https://issues.apache.org/jira/browse/LUCENE-7637
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Fix For: master (7.0), 6.5
>
> Attachments: LUCENE-7637.patch
>
>
> Spin-off from LUCENE-7624. Requiring that all terms are in the same field 
> would make things simpler and more consistent with other queries. It might 
> also make it easier to improve this query in the future since other similar 
> queries like AutomatonQuery also work on the per-field basis. The only 
> downside is that querying terms across multiple fields would be less 
> efficient, but this does not seem to be a common use-case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7633) Rename Terms to FieldTerms

2017-01-18 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15828371#comment-15828371
 ] 

Paul Elschot commented on LUCENE-7633:
--

The Terms javadocs:

bq. Access to the terms in a specific field. See also Fields

But Terms does more than accessing the terms of a field, it also provides index 
statistics for a field.
So in the o.a.l.index package, Field might actually be better name, but that 
would be confusing with o.a.l.document.Field.

Perhaps this would be ideal:
Rename Fields to IndexedFields
Rename Terms to IndexedField

The current patch to rename Terms to FieldTerms changes 860 occurrences in 290 
source code files,
so there is no point in keeping this open for a longer period.
I'll close in two weeks or so, unless there are other opinions.

> Rename Terms to FieldTerms
> --
>
> Key: LUCENE-7633
> URL: https://issues.apache.org/jira/browse/LUCENE-7633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7633) Rename Terms to FieldTerms

2017-01-15 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7633:


 Summary: Rename Terms to FieldTerms
 Key: LUCENE-7633
 URL: https://issues.apache.org/jira/browse/LUCENE-7633
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Paul Elschot






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7624) Consider moving TermsQuery to core

2017-01-15 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15823102#comment-15823102
 ] 

Paul Elschot commented on LUCENE-7624:
--

This one is an interesting target for surround, so I had a look.

Allowing more than one field for the terms also has an advantage in that only 
one doc id set will be built for all the terms.

As to the code: 
There is a small javadoc mistake in line 54 using both "@{" and "{@".
When constructing a Term a deep copy of the given BytesRef is taken, so the 
deep copy in line 154 is superfluous.
(The deep copy in line 222 of the termEnum.term() is needed there.)

> Consider moving TermsQuery to core
> --
>
> Key: LUCENE-7624
> URL: https://issues.apache.org/jira/browse/LUCENE-7624
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
> Fix For: master (7.0), 6.4
>
> Attachments: LUCENE-7624.patch
>
>
> TermsQuery current sits in the queries module, but it's used in both 
> spatial-extras and in facets, and currently is the only reason that the 
> facets module has a dependency on queries.  I think it's a generally useful 
> query, and would fit in perfectly well in core.
> This would also allow us to explore rewriting BooleanQuery to TermsQuery to 
> avoid the max-clauses limit



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

2017-01-13 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822372#comment-15822372
 ] 

Paul Elschot commented on LUCENE-7628:
--

Hopefully this is not getting too far off topic.

In case there is interest in taking DocIdSetIterator out of Spans, please let 
me know, I have version that passes the lucene tests.
I don't know whether it speeds up Spans.
The split makes a lot of the Spans code more readable.

> Add a getMatchingChildren() method to DisjunctionScorer
> ---
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Attachments: LUCENE-7628.patch
>
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

2017-01-13 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15822361#comment-15822361
 ] 

Paul Elschot commented on LUCENE-7628:
--

To continue about using Spans directly for this
(earlier posted on github, see 
https://github.com/flaxsearch/luwak/commit/36c91e8bdd3ab0d07578b76359d1f2a87eb53797)

Other than AND and OR in the same field, what is also still needed is dealing 
with multiple fields.
For this we need a Spans that can share its DocIdSetIterator with another Spans.

Iirc that is what LUCENE-2878 is about, so I'm finally beginning to understand 
the real point of that issue, and why it is still open.

Meanwhile we had DocIdSetIterator split off from Searcher (for speed).
How about doing something similar for Spans? I think that would leave Spans 
pretty close to the Positions of LUCENE-2787. The only change in semantics for 
Spans would be that at least one of the Spans that share a DocIdSetIterator 
should provide a real position in a document. Maybe we could have sth like 
MultiFieldSpans for that.

> Add a getMatchingChildren() method to DisjunctionScorer
> ---
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Attachments: LUCENE-7628.patch
>
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

2017-01-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819228#comment-15819228
 ] 

Paul Elschot commented on LUCENE-7628:
--

I'll answer myself.

With AND and OR available, i.e. the Spans parallels of ConjunctionScorer and 
DisjunctionScorer, what would still be needed is the Spans parallel of 
ReqExclScorer, for NOT at document level.

Something like ReqExclSpans for a SpanBooleanNotQuery.


> Add a getMatchingChildren() method to DisjunctionScorer
> ---
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

2017-01-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819158#comment-15819158
 ] 

Paul Elschot edited comment on LUCENE-7628 at 1/11/17 9:09 PM:
---

And if there was also something like SpanAndMergeQuery that merges the Spans 
positions when all of them are present in a document?

This could have an AndMergeSpans as a subclass of DisjunctionSpans above.


was (Author: paul.elsc...@xs4all.nl):
And if there was also something like SpanAndMergeQuery that merges the Spans 
positions when all of them are present in a document?

> Add a getMatchingChildren() method to DisjunctionScorer
> ---
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

2017-01-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819158#comment-15819158
 ] 

Paul Elschot commented on LUCENE-7628:
--

And if there was also something like SpanAndMergeQuery that merges the Spans 
positions when all of them are present in a document?

> Add a getMatchingChildren() method to DisjunctionScorer
> ---
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7628) Add a getMatchingChildren() method to DisjunctionScorer

2017-01-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15819020#comment-15819020
 ] 

Paul Elschot commented on LUCENE-7628:
--

bq. This all works correctly except in the case of a DisjunctionScorer where 
one of the children is a two-phase iterator that has matched its approximation, 
but not its refinement query. A SpanScorer in this situation will be correctly 
positioned on the current document, but its Spans will be in an undefined 
state, meaning the highlighter will either collect incorrect hits, or it will 
throw an Exception and prevent hits being collected from other subspans.

Does the highlight code call collect() before nextStartPosition() ?
That should be avoided, see the javadocs of Spans.

For LUCENE-7580 I had a very similar issue, that one computes scores per 
matching (i.e. to be highlighted) term occurrence.
The solution there is to split off DisjunctionSpans from SpanOrQuery and to add 
these methods:
{code}

  public List subSpans() {
return subSpans;
  }

  public void extractSubSpansAtCurrentDoc(List spansList) {
byPositionQueue.extractSpansList(spansList);
  }

  public Spans getCurrentPositionSpans() {
return byPositionQueue.top();
  }
{code}

With that in place a highlighter could use SpanOrQuery instead of a 
BooleanQuery OR, and then the highlighter should be able to do its work.

(Aside: getCurrentPositionSpans() is wrongly named getFirstPositionSpans() at 
LUCENE-7580, I'll fix that later).


> Add a getMatchingChildren() method to DisjunctionScorer
> ---
>
> Key: LUCENE-7628
> URL: https://issues.apache.org/jira/browse/LUCENE-7628
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
>
> This one is a bit convoluted, so bear with me...
> The luwak highlighter works by rewriting queries into their Span-equivalents, 
> and then running them with a special Collector.  At each matching doc, the 
> highlighter gathers all the Spans objects positioned on the current doc and 
> collects their positions using the SpanCollection API.
> Some queries can't be translated into Spans.  For those queries that generate 
> Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on 
> the Scorer and see if any of them are SpanScorers, and for those that aren't 
> we can call .getChildren() again and recurse down.  For each child scorer, we 
> check that it's positioned on the current document, so non-matching 
> subscorers can be skipped.
> This all works correctly *except* in the case of a DisjunctionScorer where 
> one of the children is a two-phase iterator that has matched its 
> approximation, but not its refinement query.  A SpanScorer in this situation 
> will be correctly positioned on the current document, but its Spans will be 
> in an undefined state, meaning the highlighter will either collect incorrect 
> hits, or it will throw an Exception and prevent hits being collected from 
> other subspans.
> We've tried various ways around this (including forking SpanNearQuery and 
> adding a bunch of slow position checks to it that are used only by the 
> highlighting code), but it turns out that the simplest fix is to add a new 
> method to DisjunctionScorer that only returns the currently matching child 
> Scorers.  It's a bit of a hack, and it won't be used anywhere else, but it's 
> a fairly small and contained hack.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7613) Update Surround query language

2017-01-07 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7613:
-
Attachment: LUCENE-7613-spanstree.patch

Patch of 7 Jan 2017, combine with LUCENE-7580.

This issue and LUCENE-7580 both depend on LUCENE-7615, and this patch is to use 
that dependency only via LUCENE-7580.

To use this with SpansTreeQuery, apply the patch at LUCENE-7580 first, and then 
apply this patch of 7 Jan 2017.

This contains the changes of this issue to surround/query, updates the surround 
tests to use SpansTreeQuery.wrapAfterRewrite(), and changes a few expected 
document orders in the surround tests.



> Update Surround query language
> --
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613-spanstree.patch, LUCENE-7613.patch, 
> LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2017-01-06 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15805767#comment-15805767
 ] 

Paul Elschot commented on LUCENE-7580:
--

SpanSynonymQuery is unusual here because it uses a single SpansDocScorer per 
segment, independent of the number of synonym terms.

Since the TermSpans for SynonymSpans are Spans without a SpansDocScorer it 
makes some sense not to merge Spans and SpansDocScorer later.



> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch, 
> LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7580) Spans tree scoring

2017-01-06 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7580:
-
Attachment: LUCENE-7580.patch

Patch of 6 Jan 2017.

This contains:

The changes in the patch of 30 Dec 2016.

Support for SpanSynonymQuery, see SynonymSpans and SynonymSpansDocScorer.

Class AsSingleTermSpansDocScorer as common superclass for TermSpansDocScorer 
and SynonymSpansDocScorer. This is the place where matching and non matching 
term occurrences are scored with a SimScorer from Similarity while taking into 
account the slop factors.

Method SpansTreeQuery.wrapAfterRewrite() to use SpansTreeQuery.wrap() at the 
right moment.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch, 
> LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7621) Per-document minShouldMatch

2017-01-05 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802710#comment-15802710
 ] 

Paul Elschot commented on LUCENE-7621:
--

Starting from the number of indexed terms in a doc, when more than one of any 
synonym occurs, such extra occurrences would have to be ignored for counting 
the number of present clauses.

> Per-document minShouldMatch
> ---
>
> Key: LUCENE-7621
> URL: https://issues.apache.org/jira/browse/LUCENE-7621
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Adrien Grand
>Priority: Minor
>
> I have seen similar requirements a couple times but could not find any 
> related issue so I am opening one now. The idea would be to allow passing a 
> {{LongValuesSource}} rather than an integer as the {{minShouldMatch}} 
> parameter of {{BooleanQuery}} so that the number of required clauses can 
> depend on the document that is being matched. In terms of implementation, it 
> looks like it would be straightforward as we would just have to update the 
> value of {{minShouldMatch}} in {{MinShouldMatchSumScorer.setDocAndFreq}} and 
> things would still be efficient, ie. we would still use advance on the costly 
> clauses.
> This kind of feature would allow to run queries that must match eg. 80% of 
> the terms that a document contains (by indexing the number of terms in a 
> separate field). It would also make it possible for Luwak or ES' percolator 
> to index boolean queries that have a value of {{minShouldMatch}} greater than 
> 1 more efficiently.
> I do not have any plans to work on it soon but I am curious how much interest 
> this feature would drive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7621) Per-document minShouldMatch

2017-01-05 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15802649#comment-15802649
 ] 

Paul Elschot commented on LUCENE-7621:
--

Could this also work when the clauses are SynonymQueries?

> Per-document minShouldMatch
> ---
>
> Key: LUCENE-7621
> URL: https://issues.apache.org/jira/browse/LUCENE-7621
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Adrien Grand
>Priority: Minor
>
> I have seen similar requirements a couple times but could not find any 
> related issue so I am opening one now. The idea would be to allow passing a 
> {{LongValuesSource}} rather than an integer as the {{minShouldMatch}} 
> parameter of {{BooleanQuery}} so that the number of required clauses can 
> depend on the document that is being matched. In terms of implementation, it 
> looks like it would be straightforward as we would just have to update the 
> value of {{minShouldMatch}} in {{MinShouldMatchSumScorer.setDocAndFreq}} and 
> things would still be efficient, ie. we would still use advance on the costly 
> clauses.
> This kind of feature would allow to run queries that must match eg. 80% of 
> the terms that a document contains (by indexing the number of terms in a 
> separate field). It would also make it possible for Luwak or ES' percolator 
> to index boolean queries that have a value of {{minShouldMatch}} greater than 
> 1 more efficiently.
> I do not have any plans to work on it soon but I am curious how much interest 
> this feature would drive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7613) Update Surround query language

2017-01-05 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7613:
-
Attachment: LUCENE-7613.patch

Patch of 5 Jan 2017

This includes:
- the previous patch for using DisjunctionMaxQuery over fields,
- using (Span)SynonymQuery for truncations and prefixes, i.e. groups of terms.
- the patch of LUCENE-7615 for SpanSynonymQuery.
- Further improvements in the surround query code, mostly:
-- Removal of SimpleTerm implementing Comparable as deprecated in 2011.
-- Move all creation of primitive queries (i.e. rewrite results) into 
BasicQueryFactory.
-- Use BytesRef for visiting index terms.
-- A Test for TooManyBasicQueries.


> Update Surround query language
> --
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613.patch, LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7613) Update Surround query language

2017-01-05 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7613:
-
Lucene Fields: New,Patch Available  (was: New)
  Summary: Update Surround query language  (was: Make Surround use 
DisjunctionMaxQuery for multiple fields)

> Update Surround query language
> --
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7615) SpanSynonymQuery

2017-01-05 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801532#comment-15801532
 ] 

Paul Elschot commented on LUCENE-7615:
--

In SpanSynonymQuery.java here, this is not used:
{code}
import org.apache.lucene.search.similarities.Similarity;
{code}


> SpanSynonymQuery
> 
>
> Key: LUCENE-7615
> URL: https://issues.apache.org/jira/browse/LUCENE-7615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7615.patch, LUCENE-7615.patch
>
>
> A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7615) SpanSynonymQuery

2017-01-03 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7615:
-
Attachment: LUCENE-7615.patch

Patch of 3 Jan 2017.

Compared to yesterday, this adds getTermContexts() in SynonymWeight for use in 
SpanSynonymQuery.

> SpanSynonymQuery
> 
>
> Key: LUCENE-7615
> URL: https://issues.apache.org/jira/browse/LUCENE-7615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7615.patch, LUCENE-7615.patch
>
>
> A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7615) SpanSynonymQuery

2017-01-03 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15795030#comment-15795030
 ] 

Paul Elschot commented on LUCENE-7615:
--

In the patch of 2 Jan 2017 the term contexts are extracted twice, once in 
SynonymWeight and once to create the SpanSynonymWeight.
I'll post a fix later.

> SpanSynonymQuery
> 
>
> Key: LUCENE-7615
> URL: https://issues.apache.org/jira/browse/LUCENE-7615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7615.patch
>
>
> A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7615) SpanSynonymQuery

2017-01-02 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15792842#comment-15792842
 ] 

Paul Elschot commented on LUCENE-7615:
--

Some plans for using this:

In LUCENE-7580 to get real synonym scoring behaviour.

In Surround to score truncations.

> SpanSynonymQuery
> 
>
> Key: LUCENE-7615
> URL: https://issues.apache.org/jira/browse/LUCENE-7615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7615.patch
>
>
> A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7615) SpanSynonymQuery

2017-01-02 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7615:
-
Attachment: LUCENE-7615.patch

Patch of 2 Jan 2017.

This can be used as proximity subquery whenever SynonymQuery is used now, i.e. 
for synonym terms.

I think this improves span scoring somewhat, see the tests and the test output 
when uncommenting showQueryResults for the test cases with two terms.

Implementation:
SynonymQuery exposes new methods getField() and SynonymWeight.getSimScorer() 
for use in SpanSynonymQuery.

Improved use of o.a.l.index.Terms and TermsEnum in SynonymQuery, at most a 
single TermsEnum will be used.
Aside: how about renaming Terms to FieldTerms?

This takes DisjunctionSpans out of SpanOrQuery.
This adds SynonymSpans as (an almost empty) subclass of DisjunctionSpans, for 
later further scoring improvement.

PHRASE_TO_SPAN_TERM_POSITIONS_COST is used from SpanTermQuery and made package 
private there.


> SpanSynonymQuery
> 
>
> Key: LUCENE-7615
> URL: https://issues.apache.org/jira/browse/LUCENE-7615
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7615.patch
>
>
> A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7615) SpanSynonymQuery

2017-01-02 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7615:


 Summary: SpanSynonymQuery
 Key: LUCENE-7615
 URL: https://issues.apache.org/jira/browse/LUCENE-7615
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: master (7.0)
Reporter: Paul Elschot
Priority: Minor


A SpanQuery that tries to score as SynonymQuery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7613) Make Surround use DisjunctionMaxQuery for multiple fields

2016-12-30 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788338#comment-15788338
 ] 

Paul Elschot commented on LUCENE-7613:
--

I would not mind to make a similar update for LUCENE-5205, but I am not 
familiar enough with the code there.

> Make Surround use DisjunctionMaxQuery for multiple fields
> -
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7613) Make Surround use DisjunctionMaxQuery for multiple fields

2016-12-30 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15788332#comment-15788332
 ] 

Paul Elschot edited comment on LUCENE-7613 at 12/30/16 9:01 PM:


Patch of 30 Dec 2016.

This does not affect the syntax of surround, this only adapts the lucene side 
to make better use of lucene facilities that are newer than the initial version 
of surround.

This uses DisjunctionMaxQuery when a query specifies multiple fields.
The method to convert to a lucene query also allows multiple default fields.

This adds methods to BasicQueryFactory to create a new SpanNearQuery and to 
create a new DisjunctionMaxQuery.

This uses SpanBoostQuery when proximity (sub)queries are boosted. There is no 
effect on the scores yet, LUCENE-7580 can change that.

This updates the test code to use CheckHits, and one test case is added.
The changes to the test code form the larger part of the patch.



was (Author: paul.elsc...@xs4all.nl):
Patch of 30 Dec 2016.

This does not affect the syntax of surround, this only adapts the lucene side 
to make better use of lucene facilities that are newer than the current version.

This uses DisjunctionMaxQuery when a query specifies multiple fields.
The method to convert to a lucene query also allows multiple default fields.

This adds methods to BasicQueryFactory to create a new SpanNearQuery and to 
create a new DisjunctionMaxQuery.

This uses SpanBoostQuery when proximity (sub)queries are boosted. There is no 
effect on the scores yet, LUCENE-7580 can change that.

This updates the test code to use CheckHits, and one test case is added.
The changes to the test code form the larger part of the patch.


> Make Surround use DisjunctionMaxQuery for multiple fields
> -
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7613) Make Surround use DisjunctionMaxQuery for multiple fields

2016-12-30 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7613:
-
Attachment: LUCENE-7613.patch

Patch of 30 Dec 2016.

This does not affect the syntax of surround, this only adapts the lucene side 
to make better use of lucene facilities that are newer than the current version.

This uses DisjunctionMaxQuery when a query specifies multiple fields.
The method to convert to a lucene query also allows multiple default fields.

This adds methods to BasicQueryFactory to create a new SpanNearQuery and to 
create a new DisjunctionMaxQuery.

This uses SpanBoostQuery when proximity (sub)queries are boosted. There is no 
effect on the scores yet, LUCENE-7580 can change that.

This updates the test code to use CheckHits, and one test case is added.
The changes to the test code form the larger part of the patch.


> Make Surround use DisjunctionMaxQuery for multiple fields
> -
>
> Key: LUCENE-7613
> URL: https://issues.apache.org/jira/browse/LUCENE-7613
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7613.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7613) Make Surround use DisjunctionMaxQuery for multiple fields

2016-12-30 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7613:


 Summary: Make Surround use DisjunctionMaxQuery for multiple fields
 Key: LUCENE-7613
 URL: https://issues.apache.org/jira/browse/LUCENE-7613
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/queryparser
Reporter: Paul Elschot
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7580) Spans tree scoring

2016-12-29 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7580:
-
Attachment: LUCENE-7580.patch

Patch of 29 Dec 2016.

Compared to the previous patch, this adds:

Limiting the max allowed slop to Integer.MAX_VALUE-1 in the SpanNearQuery 
constructor and in TestSpanSearchEquivalence. An actual slop of 
Integer.MAX_VALUE causes an overflow in distance+1 that is used in 
computeSlopFactor. Since the same limitation is already present for indexed 
positions, I would not expect this slop factor miscalculation to actually occur.

The negative slops that occur for overlapping spans are changed to 0 before 
passing them to computeSlopFactor in NearSpansDocScorer in the patch here.

The non match distance passed to SpanNearQuery in the patch is verified to be 
at least the given slop.

A wrapper method SpansTreeScorer.wrap() is added that will wrap the span 
(subqueries of a) given query in a SpansTreeQuery. This works for span 
subqueries of BooleanQuery, DisjunctionMaxQuery and BoostQuery.

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-29 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785476#comment-15785476
 ] 

Paul Elschot commented on LUCENE-7602:
--

After taking a closer look at the other issues:

How about renaming the ContextMap here to ValueSourceContext or to VSContext ?

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch, 
> LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-29 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785307#comment-15785307
 ] 

Paul Elschot commented on LUCENE-7602:
--

Looking at that code, the copyOf method should be named asCopyOf..., and the 
javadocs should be "as a copy of the given Map".

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch, 
> LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-29 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785298#comment-15785298
 ] 

Paul Elschot commented on LUCENE-7602:
--

For easy reference, ContextMap now looks like this:
{code}
/** Modifiable {@link Map} of key objects to value objects for {@link 
ValueSource}. */
public class ContextMap extends HashMap {
  protected ContextMap() {
  }

  protected ContextMap(Map source) {
super(source);
  }

  /** Create an empty ContextMap */
  public static ContextMap newContext() {
return new ContextMap();
  }

  /** Create a ContextMap as a copy of the given one. */
  public static ContextMap copyOf(Map source) {
return new ContextMap(source);
  }
}
{code}
The first protected constructor is used in solr's QueryContext, the other one 
could be private.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch, 
> LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-29 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7602:
-
Attachment: LUCENE-7602.patch

Patch of 29 Dec 2016.

This patch takes tries to separate interface from implementation by making the 
constructors in ContextMap and QueryContext protected and using only a few 
factory methods for ContextMap.

It would be nicer to use private inheritance from HashMap and 
only expose Map, but java does not have private inheritance. The 
alternative of object composition gives a detour to AbstractMap, and that only 
complicates the code.

So the question is which is preferable: use of a local class ContextMap as in 
the patch, or use of Map ?

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch, 
> LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-29 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785283#comment-15785283
 ] 

Paul Elschot commented on LUCENE-7602:
--

Thanks for linking to these other issues.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779059#comment-15779059
 ] 

Paul Elschot edited comment on LUCENE-7602 at 12/26/16 10:06 PM:
-

bq. can't we just use Map ?

ContextMap implements that interface.
Since this is widely used, I prefer to use a lucene class (ContextMap) over an 
interface that is defined in the java language (Map), because it 
allows a change in a single place.

We could still separate the implementation from the interface, but that would 
be more than fixing the compiler warnings here.




was (Author: paul.elsc...@xs4all.nl):
bq. can't we just use Map ?

ContextMap implements that interface.
Since this is widely used, I prefer use a lucene class class (ContextMap) over 
an interface that is defined in the java language (Map), because 
it allows a change in a single place.

We could still separate the implementation from the interface, but that would 
be more than fixing the compiler warnings here.



> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779059#comment-15779059
 ] 

Paul Elschot commented on LUCENE-7602:
--

bq. can't we just use Map ?

ContextMap implements that interface.
Since this is widely used, I prefer not use a lucene class class (ContextMap) 
over an interface that is defined in the java language (Map), 
because it allows a change in a single place.

We could still separate the implementation from the interface, but that would 
be more than fixing the compiler warnings here.



> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15779059#comment-15779059
 ] 

Paul Elschot edited comment on LUCENE-7602 at 12/26/16 10:02 PM:
-

bq. can't we just use Map ?

ContextMap implements that interface.
Since this is widely used, I prefer use a lucene class class (ContextMap) over 
an interface that is defined in the java language (Map), because 
it allows a change in a single place.

We could still separate the implementation from the interface, but that would 
be more than fixing the compiler warnings here.




was (Author: paul.elsc...@xs4all.nl):
bq. can't we just use Map ?

ContextMap implements that interface.
Since this is widely used, I prefer not use a lucene class class (ContextMap) 
over an interface that is defined in the java language (Map), 
because it allows a change in a single place.

We could still separate the implementation from the interface, but that would 
be more than fixing the compiler warnings here.



> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-26 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7602:
-
Attachment: LUCENE-7602.patch

Patch of 26 Dec 2016.
Mostly as discussed above.

ContextMap extends HashMap. I tried implementing AbstractMap, but that ends up 
in a detour to a HashMap anyway, so I left it at direct extension.

Is there a way to quickly check for unused imports at top level?
I used ant precommit for that, but it is quite slow because it stops after the 
first module with an error, and quite a few modules are involved here.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15778692#comment-15778692
 ] 

Paul Elschot commented on LUCENE-7602:
--

Meanwhile I tried implementing solr's QueryContext by extending ContextMap, and 
no more wrapping of fcontext.qcontext in a SolrContextMap, see 
FuncSlotAcc.setNextReader above.

The solr tests passed, so I think there is no more need for IdentityHashMap, in 
both lucene and solr.

Shall I post a complete patch against master, or just the changes changes since 
yesterday?

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-26 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15778681#comment-15778681
 ] 

Paul Elschot commented on LUCENE-7602:
--

Do you mean like this:
{code}
public class ContextMap extends AbstractMap
{ ... }
{code}


> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776698#comment-15776698
 ] 

Paul Elschot edited comment on LUCENE-7602 at 12/25/16 4:26 PM:


With both patches applied, ant clean compile-test gives a single warning.

Perhaps ContextMap should be separate issue.

For lucene I think it is worthwhile to use a lucene class without generics as 
the context map instead of the Map variations that are present in the 
current code.

For solr I have no idea. I would hope that using IdentityHashMap is no more 
needed because queries are immutable.



was (Author: paul.elsc...@xs4all.nl):
With both patches applied, ant clean compile-test gives a single warning.

Perhaps ContextMap should be separate issue.

For lucene I think it is worthwhile to use a lucene class without generics as 
the context map instead of the Map variations that are present in the 
current code.

For solr I have no idea. I would hope that by using IdentityHashMap is no more 
needed because queries are immutable.


> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776698#comment-15776698
 ] 

Paul Elschot commented on LUCENE-7602:
--

With both patches applied, ant clean compile-test gives a single warning.

Perhaps ContextMap should be separate issue.

For lucene I think it is worthwhile to use a lucene class without generics as 
the context map instead of the Map variations that are present in the 
current code.

For solr I have no idea. I would hope that by using IdentityHashMap is no more 
needed because queries are immutable.


> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776689#comment-15776689
 ] 

Paul Elschot commented on LUCENE-7602:
--

The lucene patch replaces the use of IdentityHashMap in ValueSource.java by a 
ContextMap, because I think it is wrong to use an IdentityHashMap there. 
Strings are used as keys, and Strings are not unique objects in java.

However, this conflicts somewhat with the way ContextMap is used in solr. There 
QueryContext needs an IdentityHashMap, but the solr tests pass even with the 
patch applied.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15776683#comment-15776683
 ] 

Paul Elschot commented on LUCENE-7602:
--

The interesting pieces of code for ContextMap:

The class itself:
{code}
public class ContextMap extends HashMap {
  public ContextMap() { // empty
  }

  public ContextMap(ContextMap source) { // new copy
super(source);
  }

  public ContextMap(IdentityHashMap source) { // for solr
super(source);
  }
}
{code}


the way it is used in solr, in SlotAcc.java:
{code}
class SolrContextMap extends ContextMap {
  SolrContextMap(org.apache.solr.search.QueryContext context) {
super((java.util.IdentityHashMap)context); // CHECKME: copy ok?
  }
}

// TODO: we should really have a decoupled value provider...
// This would enhance reuse and also prevent multiple lookups of same value 
across diff stats
abstract class FuncSlotAcc extends SlotAcc {
  protected final ValueSource valueSource;
  protected FunctionValues values;

  public FuncSlotAcc(ValueSource values, FacetContext fcontext, int numSlots) {
super(fcontext);
this.valueSource = values;
  }

  @Override
  public void setNextReader(LeafReaderContext readerContext) throws IOException 
{
values = valueSource.getValues(new SolrContextMap(fcontext.qcontext), 
readerContext);
  }
}
{code}

and some unchanged code from solr's QueryContext.java:
{code}
/*
 * Bridge between old style context and a real class.
 * This is currently slightly more heavy weight than necessary because of the 
need to inherit from IdentityHashMap rather than
 * instantiate it on demand (and the need to put "searcher" in the map)
 * @lucene.experimental
 */
public class QueryContext extends IdentityHashMap implements Closeable {
   ...
}
{code}


> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7602:
-
Attachment: LUCENE-7602-ContextMap-solr.patch

The ContextMap solr patch needs the ContextMap lucene patch.
It uses ContextMap in solr, also changing public API.
All solr tests pass.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, 
> LUCENE-7602-ContextMap-solr.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-25 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7602:
-
Attachment: LUCENE-7602-ContextMap-lucene.patch

This ContextMap lucene patch includes the previous patch of 24 December 2016.
This also changes all use of Map for function queries to ContextMap, which 
changes the public API, although not much.
All lucene tests pass.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602-ContextMap-lucene.patch, LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-24 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15775468#comment-15775468
 ] 

Paul Elschot commented on LUCENE-7602:
--

The patch also contains a few similar fixes for test code, but it is too much 
work to be complete for the test code, so I left it at that.

> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-24 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7602:
-
Attachment: LUCENE-7602.patch

Patch of 24 Dec 2016.

This consists of:
- replacing Map by Map in a few submodules.
- 7 @SuppressWarnings for unchecked casts (see also below),
- a split off of Const*DocValues classes into their own source files,
- one removal of close() on an AutoClosable,
- a few minor generics improvements,

I have a question on these cases, there are 7 of them:

{code}
  @SuppressWarnings("unchecked")
   public void createWeight(Map context, IndexSearcher searcher) throws 
IOException {
   // FIXME: how to use a helper method here to avoid the unchecked cast? See 
https://docs.oracle.com/javase/tutorial/java/generics/capture.html
 ((Map)context).put("searcher",searcher);
{code}

I tried to implement such a helper function, but I could not get it to compile 
cleanly.
Any suggestions for this?


> Fix compiler warnings for ant clean compile
> ---
>
> Key: LUCENE-7602
> URL: https://issues.apache.org/jira/browse/LUCENE-7602
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Paul Elschot
>Priority: Minor
>  Labels: build
> Fix For: trunk
>
> Attachments: LUCENE-7602.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7602) Fix compiler warnings for ant clean compile

2016-12-24 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7602:


 Summary: Fix compiler warnings for ant clean compile
 Key: LUCENE-7602
 URL: https://issues.apache.org/jira/browse/LUCENE-7602
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Paul Elschot
Priority: Minor
 Fix For: trunk






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7055) Better execution path for costly queries

2016-12-24 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15774874#comment-15774874
 ] 

Paul Elschot commented on LUCENE-7055:
--

bq.  Intersecting such queries with a selective query is very inefficient since 
these queries build a doc id set of matching documents for the entire index.

Just thinking out loud: how about also using a lazy doc id set builder that 
works on the go?
This would use one extra bit per document to indicate whether the document is 
already evaluated.

> Better execution path for costly queries
> 
>
> Key: LUCENE-7055
> URL: https://issues.apache.org/jira/browse/LUCENE-7055
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Attachments: LUCENE-7055.patch
>
>
> In Lucene 5.0, we improved the execution path for queries that run costly 
> operations on a per-document basis, like phrase queries or doc values 
> queries. But we have another class of costly queries, that return fine 
> iterators, but these iterators are very expensive to build. This is typically 
> the case for queries that leverage DocIdSetBuilder, like TermsQuery, 
> multi-term queries or the new point queries. Intersecting such queries with a 
> selective query is very inefficient since these queries build a doc id set of 
> matching documents for the entire index.
> Is there something we could do to improve the execution path for these 
> queries?
> One idea that comes to mind is that most of these queries could also run on 
> doc values, so maybe we could come up with something that would help decide 
> how to run a query based on other parts of the query? (Just thinking out 
> loud, other ideas are very welcome)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

2016-12-20 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764653#comment-15764653
 ] 

Paul Elschot edited comment on LUCENE-7580 at 12/20/16 4:53 PM:


I have started using this in the tests for the surround query language, I'll 
open an issue for that later.

This found a bug in ConjunctionNearSpansDocScorer.recordMatch().
A -1 slop can occur when the same term is used twice in a SpanNearQuery, and 
this causes a division by zero in computing the slop factor in from 
recordMatch().
This can be easily avoided by using 0 slop in such cases.


was (Author: paul.elsc...@xs4all.nl):
I have started with on using this in the tests for the surround query language, 
I'll open an issue for that later.

This found a bug in ConjunctionNearSpansDocScorer.recordMatch().
A -1 slop can occur when the same term is used twice in a SpanNearQuery, and 
this causes a division by zero in computing the slop factor in from 
recordMatch().
This can be easily avoided by using 0 slop in such cases.

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-20 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15764653#comment-15764653
 ] 

Paul Elschot commented on LUCENE-7580:
--

I have started with on using this in the tests for the surround query language, 
I'll open an issue for that later.

This found a bug in ConjunctionNearSpansDocScorer.recordMatch().
A -1 slop can occur when the same term is used twice in a SpanNearQuery, and 
this causes a division by zero in computing the slop factor in from 
recordMatch().
This can be easily avoided by using 0 slop in such cases.

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740406#comment-15740406
 ] 

Paul Elschot commented on LUCENE-7580:
--

Some scientific articles on this subject:

Metzler, Donald, and W. Bruce Croft.
"A Markov random field model for term dependencies."
Proceedings of the 28th annual international ACM SIGIR conference
on Research and development in information retrieval. ACM, 2005.

In section 2.3 they use terms and ordered and unordered phrases
The ranking function is a weighted linear combination for these.
The optimal weights are about 80/10/10 for simple terms, unordered, and ordered.
Here this led to the use of a weighting factor non matching occurrences.
They also found that the minimum distance is the best indicator of relevance.


Bendersky, Michael, and W. Bruce Croft.
"Modeling Higher-Order Term Dependencies in Information Retrieval using Query 
Hypergraphs"
SIGIR'12.

The concepts there can be nested, like span queries.
The approach there is much more general. For example:
- Table 2 shows the use of the frequency of a concept in various collections
to determine its weight.
- In section 2.4.2 there is an indication that the slop factor needs attention:
"... the existing term proximity measures usually capture close, sentence-level,
co-occurrences of the query terms ... The dependency range is much longer for
concept dependencies."


Blanco, Roi, and Paolo Boldi.
"Extending BM25 with multiple query operators."
Proceedings of the 35th international ACM SIGIR conference
on Research and development in information retrieval. ACM, 2012.

This scores regions with BM25F.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740402#comment-15740402
 ] 

Paul Elschot commented on LUCENE-7580:
--

This adds a nonMatchSlop attribute to SpanNearQuery,
and drops the nonMatchSlopFactor argument from SpansTreeQuery.

nonMatchSlop is the distance for determining a slop factor that is to be used 
for non matching occurrences of a SpanNearQuery.
Smaller values for this distance will increase the score contribution of non 
matching occurrences via
SimScorer.computeSlopFactor()

But smaller values for this distance, i.e. higher score contribution of non 
matching occurrences,
may lead to a scoring inconsistency between two span near queries that only 
differ in the allowed slop.
For example consider query A with a smaller allowed slop and query B with a 
larger one.
For query B there can be more matches, and these should increase the score of B
when compared to the score of A.
So for each extra match at B, the non matching score for query A should be 
lower than
the matching score for query B.
This may not be the case when the non matching score contribution is too high.

To have consistent scoring between two such queries,
choose a non matching slop that is larger than the largest allowed match slop,
and provide that non matching slop to both queries.
In case this consistency is not needed, nonMatchSlop can be chosen to be 
somewhat
larger than the maximum allowed match slop.

This nonMatchSlop is used in SpansTreeWeight to compute a minimal nested slop 
factor
from the maximum possible slops that can occur
in a SpanQuery for the nested SpanNearQueries and for nested SpanOrQueries with 
distance.
Finally, this minimal nested slop factor is used as the weight for scoring non 
matching terms.

The default nonMatchSlop for SpanNearQuery is large, Integer.MAX_VALUE/2.
Therefore by default non matching occurrences have no real score contribution.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7580) Spans tree scoring

2016-12-11 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15740402#comment-15740402
 ] 

Paul Elschot edited comment on LUCENE-7580 at 12/11/16 9:44 PM:


Compared to the previous patch, this adds a nonMatchSlop attribute to 
SpanNearQuery,
and drops the nonMatchSlopFactor argument from SpansTreeQuery.

nonMatchSlop is the distance for determining a slop factor that is to be used 
for non matching occurrences of a SpanNearQuery.
Smaller values for this distance will increase the score contribution of non 
matching occurrences via
SimScorer.computeSlopFactor()

But smaller values for this distance, i.e. higher score contribution of non 
matching occurrences,
may lead to a scoring inconsistency between two span near queries that only 
differ in the allowed slop.
For example consider query A with a smaller allowed slop and query B with a 
larger one.
For query B there can be more matches, and these should increase the score of B
when compared to the score of A.
So for each extra match at B, the non matching score for query A should be 
lower than
the matching score for query B.
This may not be the case when the non matching score contribution is too high.

To have consistent scoring between two such queries,
choose a non matching slop that is larger than the largest allowed match slop,
and provide that non matching slop to both queries.
In case this consistency is not needed, nonMatchSlop can be chosen to be 
somewhat
larger than the maximum allowed match slop.

This nonMatchSlop is used in SpansTreeWeight to compute a minimal nested slop 
factor
from the maximum possible slops that can occur
in a SpanQuery for the nested SpanNearQueries and for nested SpanOrQueries with 
distance.
Finally, this minimal nested slop factor is used as the weight for scoring non 
matching terms.

The default nonMatchSlop for SpanNearQuery is large, Integer.MAX_VALUE/2.
Therefore by default non matching occurrences have no real score contribution.



was (Author: paul.elsc...@xs4all.nl):
This adds a nonMatchSlop attribute to SpanNearQuery,
and drops the nonMatchSlopFactor argument from SpansTreeQuery.

nonMatchSlop is the distance for determining a slop factor that is to be used 
for non matching occurrences of a SpanNearQuery.
Smaller values for this distance will increase the score contribution of non 
matching occurrences via
SimScorer.computeSlopFactor()

But smaller values for this distance, i.e. higher score contribution of non 
matching occurrences,
may lead to a scoring inconsistency between two span near queries that only 
differ in the allowed slop.
For example consider query A with a smaller allowed slop and query B with a 
larger one.
For query B there can be more matches, and these should increase the score of B
when compared to the score of A.
So for each extra match at B, the non matching score for query A should be 
lower than
the matching score for query B.
This may not be the case when the non matching score contribution is too high.

To have consistent scoring between two such queries,
choose a non matching slop that is larger than the largest allowed match slop,
and provide that non matching slop to both queries.
In case this consistency is not needed, nonMatchSlop can be chosen to be 
somewhat
larger than the maximum allowed match slop.

This nonMatchSlop is used in SpansTreeWeight to compute a minimal nested slop 
factor
from the maximum possible slops that can occur
in a SpanQuery for the nested SpanNearQueries and for nested SpanOrQueries with 
distance.
Finally, this minimal nested slop factor is used as the weight for scoring non 
matching terms.

The default nonMatchSlop for SpanNearQuery is large, Integer.MAX_VALUE/2.
Therefore by default non matching occurrences have no real score contribution.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7580) Spans tree scoring

2016-12-11 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7580:
-
Attachment: LUCENE-7580.patch

Patch of 11 Dec 2016.

Add automatically determining a weight for non matching terms.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch, LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720077#comment-15720077
 ] 

Paul Elschot commented on LUCENE-7580:
--

What SpansTreeQuery does not do, and some rough edges:

The SpansDocScorer objects do the match recording and scoring, and there is one 
for each Spans.
These SpansDocScorer objects might be merged into their Spans to reduce the 
number of objects.
Related: how to deal with the same term occurring in more than one subquery? 
See also LUCENE-7398.

Normally the term frequency score has a diminishing contribution for extra 
occurrences.
In the patch the slop factors for a term are applied in decreasing order on 
these diminished contributions.
This requires sorting of the slop factors.
Sorting the slop factors could be avoided when an actual score of a single term 
occurrence was available.
In that case the given slop factor could be used as a weight on that score.
It might be possible to estimate an actual score for a single term occurrence
from the distances to other occurrences of the same term.
Similarly, the decreasing term frequency contributions can be seen as a 
proximity weighting for the same term (or subquery):
the closer a term occurs to itself, the smaller its contribution.
This might be refined by using the actual distances to other the term 
occurrences (or subquery occurrences)
to provide a weight for each term occurrence. This is unusual because the 
weight decreases for smaller distances.

The slop factor from the Similarity may need to be adapted because of the way 
it is combined here
with diminishing term contributions.

Another use of a score of each term occurrence could be to use the absolute 
term position
to influence the score, possibly in combination with the field length.

There is an assert in TermSpansDocScorer.docScore() that verifies that
the smallest occurring slop factor is at least as large as the non matching 
slop factor.
This condition is necessary for consistency.
Instead of using this assert, this condition might be enforced by somehow
automatically determining the non matching slop factor.

This is a prototype. No profiling has been done, it will take more CPU, but I 
have no idea how much.
The sorting of the slop factors per matching term occurrence has roughly the 
same
time complexity as the position priority queues used for SpanOr and SpanNear.
Garbage collection might be affected by the reference cycles between the 
SpansDocScorers
and their Spans.

Since this allows weighting of subqueries, it might be possible to implement 
synonym scoring
in SpanOrQuery by providing good subweights, and wrapping the whole thing in 
SpansTreeQuery.
The only thing that might still be needed then is a SpansDocScorer that applies 
the SimScorer.score()
over the total term frequency of the synonyms in a document.

SpansTreeScorer multiplies the slop factor for nested near queries at each 
level.
Alternatively a minimum distance could be passed down.
This would need to change recordMatch(float slopFactor) to recordMatch(int 
minDistance).
Would minDistance make sense, or is there a better distance?

What is a good way to test whether the score values from SpansTreeQuery 
actually improve on
the score values from the current SpanScorer?

There are no tests for SpanFirstQuery/SpanContainingQuery/SpanWithinQuery.
These tests are not there because these queries provide FilterSpans and that is 
already supported for SpanNotQuery.

The explain() method is not implemented for SpansTreeQuery.
This should be doable with an explain() method added to SpansTreeScorer to 
provide the explanations.

There is no support for PayloadSpanQuery.
PayloadSpanQuery is not in here because it is not in the core module.
I think it can fit here in because PayloadSpanQuery also scores per matching 
term occurrence.
Then Spans.doStartCurrentDoc() and Spans.doCurrentSpans() could be removed.

In case this is acceptable as a good way to score Spans:
Spans.width() and Scorer.freq() and SpansDocScorer.docMatchFreq() might be 
removed.
Would it make sense to implement child Scorers in the tree of SpansDocScorer 
objects?


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: 

[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720073#comment-15720073
 ] 

Paul Elschot commented on LUCENE-7580:
--


Some related issues, thanks for these discussions:

LUCENE-533
LUCENE-2878
LUCENE-2879
LUCENE-2880
LUCENE-6371
LUCENE-6466
LUCENE-7398


Some related web pages:

http://www.gossamer-threads.com/lists/lucene/java-user/33902 March 2006.

http://www.gossamer-threads.com/lists/lucene/java-user/53027 September 2007, 
suggests to:
"recurse the spans tree to compose a score based on the type of subqueries 
(near, and, or, not) and what matched."

http://www.gossamer-threads.com/lists/lucene/java-user/60103 April 2008.

http://www.flax.co.uk/blog/2016/04/26/can-make-contribution-apache-solr-core-development/
 see point 4.

How to use BM25:
http://opensourceconnections.com/blog/2015/10/16/bm25-the-next-generation-of-lucene-relevation/



> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720070#comment-15720070
 ] 

Paul Elschot commented on LUCENE-7580:
--

SpansTreeQuery is implemented as a wrapper in order to change the existing code 
as little as possible.
But it was necessary to take DisjunctionSpans out of SpanOrQuery.
In DisjunctionSpans there are only additions for inspection at a match,
otherwise it is the same as in the current SpanOrQuery.

Changes to the current code are mostly additions to allow inspection of matches:
- For the ordered/unordered nearspans a common superclass ConjunctionNearSpans 
is added that provides the SimScorer and a currentSlop() method.
- DisjunctionSpans allows inspection of all subspans, of the subspans at the 
current doc, and of the subspans with the first and second positions.
  SpanPositionQueue also has some additions for this.
- In the TermSpans constructor the currently unused SimScorer argument is saved 
so it can be used to score() the various term frequencies.
- In Spans a reference to a SpansDocScorer object is added to allow direct 
access by disjunctions.

The only existing state that is changed is the use of needsScores (instead of 
the current false)
for weights of subqueries of SpanOrQuery and SpanNearQuery and for the weight 
of the included subquery of SpanNotQuery.

All core tests pass with the patch applied on the master branch. Ant precommit 
also passes.

There is a correction to the javadocs of Similarity.Simscorer on the use of 
float for term frequencies.

The patch also adds a constructor for SpanOrQuery with an extra parameter 
maxDistance.
When wrapped in a SpansTreeQuery, this SpanOrQuery will provide a slop factor 
at each match
that is determined by the minimum distance between any two subspans where 
possible,
and this distance is maximized to the given maxDistance.
The class DisjunctionNearSpans and its SpansDocScorer implement this.

All score calculations are done with doubles.
Most of the additions have public/protected visibility in order to allow easy 
extension.

In case there is interest in back porting this, a patch for branch_6x can be 
made available.
The tests on branch_6x disable the coordination in BooleanQuery and they only 
use the BM25 similarity.



> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720066#comment-15720066
 ] 

Paul Elschot commented on LUCENE-7580:
--

In the patch, SpansTreeQuery is a wrapper for SpanQuery that uses basically the 
same scoring as the scoring for other queries.
When all term occurrences match at top level or at 0 distance the score is the 
same as
the score for a boolean OR over the terms, independently of the Similarity that 
is used.
SpansTreeScorer scores each query term matching occurrence, and it applies 
discounts for non matching terms
and for distance matches. It also uses weights of subqueries.

The matching occurrences are recorded per document in the spans tree at each 
top level match of a document.
For each match SpansTreeScorer descends the tree down to the leaf level of the 
terms of each match.
SpansDocScorer objects are used as the tree nodes, there is one for each 
supported Spans.

Each matching term occurrence is recorded with a slop factor.
At the top level this slop factor is normally 1, and for each span near nesting 
level
the slop factor at the match is multiplied into this.

The term frequency scoring from the Similarity is used per matching term 
occurrence,
and these term occurrence scores are weighted by the slop factors sorted in 
decreasing order.
The purpose of using the given slop factors in decreasing order is to provide 
scoring consistency
between span near queries that only differ in the maximum allowed slop.
This consistency requires that an extra match with a lower slop increases the 
score of the document.
I would expect scoring to be consistent this way, but I'm not 100% sure.

The non matching term occurrences get a score that is the difference of
the normal document term frequency score and the term frequency score for the 
matching terms.
This non matching score is weighted by the slop factor of a non matching 
distance.
The non matching distance is a parameter that must be provided.
This non matching distance can for example be chosen as a little larger
than the largest distance used in the span near queries that are wrapped.

SpansTreeQuery is implemented for any combination of
SpanNearQuery, SpanOrQuery, SpanTermQuery, SpanBoostQuery,
SpanNotQuery, SpanFirstQuery, SpanContainingQuery and SpanWithinQuery.

See the javadocs and the test code on how to use SpansTreeQuery.


> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7580:
-
Attachment: LUCENE-7580.patch

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
> Attachments: LUCENE-7580.patch
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15720059#comment-15720059
 ] 

Paul Elschot commented on LUCENE-7580:
--

"Recurse the spans tree to compose a score based on the type of subqueries ... 
and what matched"
was suggested in September 2007 on the java-user list 
http://www.gossamer-threads.com/lists/lucene/java-user/53027 .

Currently SpanScorer provides score values that have no real meaning when more 
than one SpanTermQuery is used.

Patch follows.

> Spans tree scoring
> --
>
> Key: LUCENE-7580
> URL: https://issues.apache.org/jira/browse/LUCENE-7580
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (7.0)
>Reporter: Paul Elschot
>Priority: Minor
> Fix For: 6.x
>
>
> Recurse the spans tree to compose a score based on the type of subqueries and 
> what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7580) Spans tree scoring

2016-12-04 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7580:


 Summary: Spans tree scoring
 Key: LUCENE-7580
 URL: https://issues.apache.org/jira/browse/LUCENE-7580
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: master (7.0)
Reporter: Paul Elschot
Priority: Minor
 Fix For: 6.x


Recurse the spans tree to compose a score based on the type of subqueries and 
what matched



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7398) Nested Span Queries are buggy

2016-11-17 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15674083#comment-15674083
 ] 

Paul Elschot commented on LUCENE-7398:
--

[~mikemccand], the patch is as you stated, and having MatchNear as an enum to 
choose the matching method is easy to extend.
I would not mind to have some more opinions on whether the progress is enough 
to actually add the code.

I know this MG4J paper and it could well be that theorem 11 in there proves 
that no lazy algorithm is possible for the general case with more than 2 
subqueries, but for now I cannot really follow their terminology.  In 
particular I'd like to know whether or not these efficient algorithms 
correspond to the current lazy implementations in Lucene. I'm hoping that they 
do not, because then there might be some room for improvement in Lucene without 
losing speed.

As [~gol...@detego-software.de] stated above:
bq.  I want a higher score if the user-query matches for more than one variant
I don't think the ORDERED_LOOKAHEAD of the patch does  that, because it only 
matches one variant.
I hope that there is a non backtracking implementation that can do this, but 
I'm not sure.



> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7398) Nested Span Queries are buggy

2016-11-16 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15671305#comment-15671305
 ] 

Paul Elschot commented on LUCENE-7398:
--

 bq. is this latest patch ready to be committed, or are there still known 
problems?

Both actually, assuming that master has not had a conflicting update since.
To completely solve this backtracking is needed, and the patch does not provide 
that.

To allow collecting/payloads easily, I'd rather accept the limitations/bugs of 
the current lazy implementation.
As a minimum a reference to this issue could be added to the javadocs of the 
(un)ordered near spans.

AFAIK:
- a complete solution that can be made with lazy iteration is a span near query 
that has two subqueries
and that only checks the span starting positions,
- for subqueries that are terms or that do not vary in length, completeness for 
two subqueries is already there.

In case there is interest in span near queries that only use starting 
positions, well, that should be easy.




> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-10-04 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398.patch

Patch of 4 Oct 2016.

This is the patch of 25 Sep 2016, but without the UNORDERED_STARTPOS case.

In a nutshell this:
- adds ORDERED_LOOKAHEAD, 
- is backward compatible,
- tries to document the limitations of the matching methods for SpanNearQuery.


> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7471) Simplify NearSpansOrdered

2016-09-29 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7471:
-
Attachment: LUCENE-7471.patch

Patch of 29 Sep 2016.
Builds on the two phase approach.
This has 89 lines less code than master.
All tests pass.

It uses the same approach as UNORDERED_STARTPOS at LUCENE-7398.


> Simplify NearSpansOrdered
> -
>
> Key: LUCENE-7471
> URL: https://issues.apache.org/jira/browse/LUCENE-7471
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Paul Elschot
>Priority: Minor
> Attachments: LUCENE-7471.patch
>
>
> Extend the span positions priority queue, remove SpansCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-7471) Simplify NearSpansOrdered

2016-09-29 Thread Paul Elschot (JIRA)
Paul Elschot created LUCENE-7471:


 Summary: Simplify NearSpansOrdered
 Key: LUCENE-7471
 URL: https://issues.apache.org/jira/browse/LUCENE-7471
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Paul Elschot
Priority: Minor


Extend the span positions priority queue, remove SpansCell.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7398) Nested Span Queries are buggy

2016-09-25 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15519320#comment-15519320
 ] 

Paul Elschot edited comment on LUCENE-7398 at 9/25/16 10:00 PM:


Patch of 24 Sep 2016, work in progress. Edit: superseded on 25 Sep, this can be 
ignored.

This introduces SpanNearQuery.MatchNear to choose the matching method.

The ORDERED_LAZY case is still the patch of 14 August, this should be changed 
back to the current implementation, and be used to implement ORDERED_LOOKAHEAD.

This implements MatchNear.UNORDERED_STARTPOS and uses that as the default 
implementation for the unordered case.
The implementation of UNORDERED_STARTPOS is in NearSpansUnorderedStartPos, 
which is simpler than the current NearSpansUnordered, there is no SpansCell.
I'd expect this StartPos implementation to be a little faster, so I also 
implemented it as default for the unordered case.  In only one test case the 
UNORDERED_LAZY method is needed to pass the test.

The question is whether it is ok to change the default unordered implementation 
to only use the span start positions.

The collect() method is moved to the superclass ConjunctionSpans, this 
simplification might be done at another issue.


was (Author: paul.elsc...@xs4all.nl):
Patch of 24 Sep 2016, work in progress.

This introduces SpanNearQuery.MatchNear to choose the matching method.

The ORDERED_LAZY case is still the patch of 14 August, this should be changed 
back to the current implementation, and be used to implement ORDERED_LOOKAHEAD.

This implements MatchNear.UNORDERED_STARTPOS and uses that as the default 
implementation for the unordered case.
The implementation of UNORDERED_STARTPOS is in NearSpansUnorderedStartPos, 
which is simpler than the current NearSpansUnordered, there is no SpansCell.
I'd expect this StartPos implementation to be a little faster, so I also 
implemented it as default for the unordered case.  In only one test case the 
UNORDERED_LAZY method is needed to pass the test.

The question is whether it is ok to change the default unordered implementation 
to only use the span start positions.

The collect() method is moved to the superclass ConjunctionSpans, this 
simplification might be done at another issue.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-09-25 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398-20160925.patch

Patch of 25 Sep 2016.
Compared to the previous patch, this removes the ORDERED_STARTPOS case, because 
I don't know whether that is needed.
Also this restores backward compatibility.

Compared to master, this has:
Four MatchNear methods, two are the current ones, they are called ORDERED_LAZY 
and UNORDERED_LAZY, and these are used when the current builder and 
constructors use a boolean ordered argument.

The third case is ORDERED_LOOKAHEAD, which is from the patch of 18 August.

The last case is UNORDERED_STARTPOS, which is simpler than UNORDERED_LAZY, 
hopefully a little faster, and with better completeness of the result.

Javadocs for all four cases have been added.

All test cases from here have been added, and where necessary they have been 
modified to use ORDERED_LOOKAHEAD and to not do span collection. These tests 
pass.

For the last case, UNORDERED_STARTPOS, no test cases have been added yet. This 
is still to be done. Does anyone have more difficult cases?

Minor point: the collect() method was moved to the superclass ConjunctionSpans.

Feedback welcome, especially on the javadocs of SpanNearQuery.MatchNear.

Instead of adding backtracking methods, it might be better to do counting of 
input spans in a matching window. I'm hoping that the UNORDERED_STARTPOS case 
can be extended for that. Any ideas there?

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398-20160925.patch, LUCENE-7398.patch, LUCENE-7398.patch, 
> TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-7398) Nested Span Queries are buggy

2016-09-24 Thread Paul Elschot (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Elschot updated LUCENE-7398:
-
Attachment: LUCENE-7398-20160924.patch

Patch of 24 Sep 2016, work in progress.

This introduces SpanNearQuery.MatchNear to choose the matching method.

The ORDERED_LAZY case is still the patch of 14 August, this should be changed 
back to the current implementation, and be used to implement ORDERED_LOOKAHEAD.

This implements MatchNear.UNORDERED_STARTPOS and uses that as the default 
implementation for the unordered case.
The implementation of UNORDERED_STARTPOS is in NearSpansUnorderedStartPos, 
which is simpler than the current NearSpansUnordered, there is no SpansCell.
I'd expect this StartPos implementation to be a little faster, so I also 
implemented it as default for the unordered case.  In only one test case the 
UNORDERED_LAZY method is needed to pass the test.

The question is whether it is ok to change the default unordered implementation 
to only use the span start positions.

The collect() method is moved to the superclass ConjunctionSpans, this 
simplification might be done at another issue.

> Nested Span Queries are buggy
> -
>
> Key: LUCENE-7398
> URL: https://issues.apache.org/jira/browse/LUCENE-7398
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 5.5, 6.x
>Reporter: Christoph Goller
>Assignee: Alan Woodward
>Priority: Critical
> Attachments: LUCENE-7398-20160814.patch, LUCENE-7398-20160924.patch, 
> LUCENE-7398.patch, LUCENE-7398.patch, TestSpanCollection.java
>
>
> Example for a nested SpanQuery that is not working:
> Document: Human Genome Organization , HUGO , is trying to coordinate gene 
> mapping research worldwide.
> Query: spanNear([body:coordinate, spanOr([spanNear([body:gene, body:mapping], 
> 0, true), body:gene]), body:research], 0, true)
> The query should match "coordinate gene mapping research" as well as 
> "coordinate gene research". It does not match  "coordinate gene mapping 
> research" with Lucene 5.5 or 6.1, it did however match with Lucene 4.10.4. It 
> probably stopped working with the changes on SpanQueries in 5.3. I will 
> attach a unit test that shows the problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511145#comment-15511145
 ] 

Paul Elschot edited comment on LUCENE-7453 at 9/21/16 8:56 PM:
---

bq. But the seg examples you have still have docid, just with seg prepended. It 
still has the problem that it uses "id", when id means identifier,

This is meant as an identifier for a document within a segment; in a segment 
this identifier is permanent. There may be another identifier in a document 
field, but that is irrelevant here.

For compound readers there are multiple segments, and also in that case adding 
seg to the name is correct.



was (Author: paul.elsc...@xs4all.nl):
bq. But the seg examples you have still have docid, just with seg prepended. It 
still has the problem that it uses "id", when id means identifier,

This is meant as an identifier for a document within a segment; in a segment 
this identifier is permanent, and the only one.

For compound readers there are multiple segments, and also in that case adding 
seg to the name is correct.


> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15511145#comment-15511145
 ] 

Paul Elschot commented on LUCENE-7453:
--

bq. But the seg examples you have still have docid, just with seg prepended. It 
still has the problem that it uses "id", when id means identifier,

This is meant as an identifier for a document within a segment; in a segment 
this identifier is permanent, and the only one.

For compound readers there are multiple segments, and also in that case adding 
seg to the name is correct.


> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7453) Change naming of variables/apis from docid to docnum

2016-09-21 Thread Paul Elschot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15510965#comment-15510965
 ] 

Paul Elschot commented on LUCENE-7453:
--

I tried an alternative that adds an variation of segment wherever docID is used 
in some form.

Here is an overview of renaming possibilities for core/src/java, in three 
column python strings.

The first column contains the current name, the second column a segment 
variant, the third column an index variant.
Please assume an appropriate amount of question marks (??) in the second and 
third columns.

{code}
classFileRenames = """

DocIdSet SegDocIdSet  DocIndexSet
DocIdSetIterator SegDocIdSetIterator  
DocIndexIterator
ConjunctionDISI  ConjunctionSegDisi   
ConjunctionDixi
DisjunctionDISIApproximation DisjunctionSegDisiApproximation  
DisjunctionDixiApproximation
DisiPriorityQueueSegDisiPriorityQueue 
DixiPriorityQueue
DisiWrapper  SegDisiWrapper   DixiWrapper
FilteredDocIdSetIterator FilteredSegDisi  FilteredDixi
DocIdSetBuilder  SegDocIdSetBuilder   
DocIndexSetBuilder
RoaringDocIdSet  RoaringSegDocIdSet   
RoaringDocIndexSet
IntArrayDocIdSet IntArraySegDocIdSet  
IntArrayDocIndexSet
NotDocIdSet  NotSegDocIdSet   NotDocIndexSet
BitDocIdSet  BitSegDocIdSet   BitDocIndexSet
DocIdsWriter SegDocIdsWriter  
DocIndexesWriter
DocIdMerger  SegDocIdMerger   DocIndexMerger
"""

identifierRenames = classFileRenames + """

TwoPhaseIteratorAsDocIdSetIterator TwoPhaseIteratorAsSegDocIdSetIterator 
TwoPhaseIteratorAsDocIndexIterator
BitSetConjunctionDISI  BitSetConjunctionDisi 
BitSetConjunctionDisi
IntArrayDocIdSetIterator   IntArraySegDocIdSetIterator   
IntArrayDocIndexIterator

asDocIdSetIterator asSegDocIdSetIterator 
asDocIndexIterator
getDocId   getSegDocId   
getDocIndex
docID  sdocID
docIndex

docID  sdocIDdocIdx
docId  sdocIddocIdx
docIDs sdocIDs   docIdxs
docIds sdocIds   docIdxs
disi   sdisi dixi
docIdSet   sDocIdSet 
docIndexSet

"""
{code}

(The identifiers here are for local classes, methods and variables.)

I don't like overloading index for this, especially in the class names, so for 
now I'd prefer the segment variants in second column.

Anyway, we could use the opportunity to shorten some of the longer names.


> Change naming of variables/apis from docid to docnum
> 
>
> Key: LUCENE-7453
> URL: https://issues.apache.org/jira/browse/LUCENE-7453
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ryan Ernst
>
> In SOLR-9528 a suggestion was made to change {{docid}} to {{docnum}}. The 
> reasoning for this is most notably that {{docid}} has a connotation about a 
> persistent unique identifier (eg like {{_id}} in elasticsearch or {{id}} in 
> solr), while {{docid}} in lucene is currently some local to a segment, and 
> not comparable directly across segments.
> When I first started working on Lucene, I had this same confusion. {{docnum}} 
> is a much better name for this transient, segment local identifier for a doc. 
> Regardless of what solr wants to do in their api (eg keeping _docid_), I 
> think we should switch the lucene apis and variable names to use docnum.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   9   10   >