[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-07-09 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536633#comment-16536633
 ] 

Alan Woodward commented on LUCENE-8229:
---

Simon added IOSupplier rather than IOConsumer, which would already work for 
this I think - I'll open an issue.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-07-06 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534937#comment-16534937
 ] 

David Smiley commented on LUCENE-8229:
--

I was just looking at Matches.MatchesIteratorSupplier.  It's a shame to need 
mirror images of existing java.util.function interfaces that only differ in 
that it throws IOException.  See org.apache.lucene.util.IOUtils.IOConsumer 
added by [~simonw] recently.  I propose that we add an IOSupplier here and get 
rid of MatchesIteratorSupplier (in a new issue of course).  WDYT?  We ought to 
have a consistent approach in Lucene to this scenario.  I've wanted an 
IOSupplier in Solr for something recently and saw it hadn't been added yet.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435723#comment-16435723
 ] 

ASF subversion and git services commented on LUCENE-8229:
-

Commit dc7f841e361ad9f29dc54a638856d6becc8c99d3 in lucene-solr's branch 
refs/heads/branch_7x from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dc7f841 ]

LUCENE-8229: add lucene.experimental, plus small changes

(cherry picked from commit e6b6515)


> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435716#comment-16435716
 ] 

ASF subversion and git services commented on LUCENE-8229:
-

Commit e6b65151b6f4aec66376b3d4acc1a057167f62f6 in lucene-solr's branch 
refs/heads/master from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e6b6515 ]

LUCENE-8229: add lucene.experimental, plus small changes


> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-12 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435020#comment-16435020
 ] 

Alan Woodward commented on LUCENE-8229:
---

+1

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434924#comment-16434924
 ] 

David Smiley commented on LUCENE-8229:
--

I'm excited about this too :-)

I made some small tweaks, such as adding lucene.experimental annotations (at 
least until 8.0), and using java 8 streams in one place.  What do you think?

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Fix For: 7.4
>
> Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433640#comment-16433640
 ] 

ASF subversion and git services commented on LUCENE-8229:
-

Commit 502fd4bf12b8860b8eea504a96ad1b49dd52938c in lucene-solr's branch 
refs/heads/branch_7x from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=502fd4b ]

LUCENE-8229: Add Weight.matches() to iterate over match positions


> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8229.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433641#comment-16433641
 ] 

ASF subversion and git services commented on LUCENE-8229:
-

Commit 040a9601b1b346391ad37e5a0a4f2f598e72d26e in lucene-solr's branch 
refs/heads/master from [~romseygeek]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=040a960 ]

LUCENE-8229: Add Weight.matches() to iterate over match positions


> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8229.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-06 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428112#comment-16428112
 ] 

Alan Woodward commented on LUCENE-8229:
---

OK, the latest iteration moves the notion of a match containing no hits up to 
the Matches object, and has a default implementation on Weight.  This makes the 
patch much smaller - thanks for the suggestion [~dsmiley]!

I think I'd like to keep the name Matches - we might at some point in the 
future want to add the ability to return matches from DocValues fields, for 
example, so we wouldn't necessarily be returning positions.

Re payloads, that's a convincing use-case.  I have some concerns as to how to 
implement them over composite matches, such as Phrases or Spans, which aren't 
dealt with yet, so let's do that in a follow-up issue.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8229.patch
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-05 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427422#comment-16427422
 ] 

Alan Woodward commented on LUCENE-8229:
---

bq. because even a no-match response requires knowledge of the field

Thinking about it, this is unnecessary, isn't it.  We can have a specialised 
Matches object which just means 'a match in this doc, but no term hits', which 
would be returned by default if the scorer matched.  Which would allow a 
default implementation.  I'll work on a new patch.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8229.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-05 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427185#comment-16427185
 ] 

David Smiley commented on LUCENE-8229:
--

It's really looking great Alan.  I looked over your patch a bit more

* I wonder if "Matches" sounds too generic; perhaps "PositionMatches" to 
emphasize it has position information and not simply matching document IDs?
* It's a shame that every Weight must implement this (no default impl) because 
even a no-match response requires knowledge of the field.  Is the distinction 
important to know the field?  I suppose it might be useful for figuring out 
generically which fields a query references... but no not really because you 
have to execute it on a matching document first to even figure that out with 
this API.
* Matcher.EMPTY (a empty version of MatchesIterator) should perhaps be moved to 
MatchesIterator?  Come to think of it, maybe MatchesIterator could be 
Matches.Iterator (inner class of Matches)?  (avoids polluting the busy .search 
namespace).
* RE payloads: I appreciate you want to keep things simple for now.  I've heard 
of putting OCR document offset information in them, for example, and a 
highlighter might want this.  A highlighter might want whatever metadata is 
being put in a payload, even if it is relevancy oriented -- consider a 
relevancy debugger tool that could show you what's in the payload.  This might 
not even be a "highlighter" per-se.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8229.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-04 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425406#comment-16425406
 ] 

Alan Woodward commented on LUCENE-8229:
---

Patch up to date with master, precommit and tests pass.  I think this is ready?

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
> Attachments: LUCENE-8229.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-04-02 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422822#comment-16422822
 ] 

Alan Woodward commented on LUCENE-8229:
---

I've pushed a few more changes - IndexOrDocValuesQuery should use the dvWeight 
to check if it matches, I've added a term() method so that the iterator can 
report which term it's currently positioned on, and I've removed the iteration 
for SpanQueries.  I want to think more about how we iterate over composite 
queries like Span or phrase (or interval, soon), as I can see situations where 
we'd both want to iterate over the whole thing, or where we'd want iterate over 
the sub parts as well, and I'd like to leave that to a follow-up issue.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-03-29 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418770#comment-16418770
 ] 

Alan Woodward commented on LUCENE-8229:
---

Having slept on it, I've come round to [~dsmiley]'s suggestion of returning 
matches from all fields.  I've pushed some changes which add an intermediate 
Matches object, which holds iterators for all fields with matches.  So the 
method signature on Weight now looks like this:
{code:java}
public abstract Matches getMatches(LeafReaderContext ctx, int doc){code}

You can then get a MatchesIterator for a given field by calling 
{code}Matches.getFieldMatches(String field}{code}, or get the set of all fields 
containing matches by calling {code}Matches.getMatchFields(){code}.  This has 
the nice side-effect of making BooleanWeight.matches() much more efficient.

Re AutomatonQuery, we have a lot more leeway here because it's only working on 
a single document at a time.  The way I've done things so far is to pull 
postings for all the matching terms, but only create a MatchesIterator if the 
postings can be advanced to the document we're interested in.  Otherwise, the 
PostingsEnum gets re-used.  This should have similar performance 
characteristics to the creation of a scorer over a single segment.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-03-29 Thread Jim Ferenczi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418708#comment-16418708
 ] 

Jim Ferenczi commented on LUCENE-8229:
--

I like the proposal here. For simple queries it makes the extraction of matched 
positions trivial. Though I wonder how the complex queries would handle this, 
for instance the AutomatonQuery cannot just return an enum over all matching 
terms, we have a special handling of this query in highlighters to avoid the 
explosion for instance. What is your current plan to handle this query ? Should 
it return null for simplicity or should it try to expand the automaton with a 
limit on the number of terms ? I prefer the former which is safe and if users 
want to check the matching of a complex automaton they can use use a 
MemoryIndex for each TopDocument and change the query to use the rewrite method 
that builds a boolean query.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-03-28 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418007#comment-16418007
 ] 

Alan Woodward commented on LUCENE-8229:
---

{quote}A caller might want all fields, or perhaps just some
{quote}
I've done it this way to keep the API as simple as possible.  If we start 
iterating over multiple fields then MatchesIterator becomes a lot more 
complicated, and I don't think it gains us anything?  If consumers want to get 
the matches on multiple fields, then they can call Weight.matches() multiple 
times.

Re payloads, I think of them as a search-time feature, and not really relevant 
here.  Let's keep this API focussed.

I have tried putting something similar to the MatchesIterator on Scorer, but it 
doesn't really fit.  Scorers are designed to iterate over matching documents 
very efficiently, and lots of them have optimizations which mean that positions 
and/or offsets aren't actually available - for example, things like TermInSet 
or AutomatonQuery get rewritten to bitsets, or disjunctions can use bulk 
scorers, or the query cache can intercept things.  Whereas Weight already has 
explain(), which has similar semantics to this - useful information that you 
might sometimes want for your TopDocs, but not something you want to be running 
against every matching document.  And if anything, there are more Scorer 
implementations than Weights, so it would be more invasive a change.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-03-28 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417909#comment-16417909
 ] 

Dawid Weiss commented on LUCENE-8229:
-

bq. The ability to find out exactly what a query has matched on is a fairly 
frequent feature request

I confirm the need for this -- we have custom highlighters and they're way too 
complex because of the need to decompose each and every query into match ranges.

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-03-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417849#comment-16417849
 ] 

David Smiley commented on LUCENE-8229:
--

This is really interesting [~romseygeek]!

Here's your proposed signature: {{public MatchesIterator 
matches(LeafReaderContext context, int doc, String field) throws IOException}}

* I'm unsure about this new matches method requiring a field reference, thus 
insisting all fields in the query match the field in this argument.  A caller 
might want all fields, or perhaps just some.  This could easily be converted to 
a Predicate to match the field.
* Add payloads to {{MatchesIterator}}
* Perhaps {{matches}} should take an int for the PostingsEnum flags.  This way 
it could choose to ask for offsets and/or payloads.  Or maybe just always get 
both to keep the API simpler, assuming the perf difference is negligible for 
practical uses of this feature (which sounds plausible to me).  It could be 
added later if desired.  Yeah, lets not now then.

Have you considered a very different approach of modifying Scorer to expose 
more information about the matches in a document?  I'm just thinking out-loud 
here; might be a bad idea ;-).  Maybe I'm saying the same thing as "adding 
positions to Scorers" as you reference in the description, but maybe it could 
hang off indirectly using the {{MatchesIterator}} you developed here.  Your 
proposed {{Weight.matches(...)}} is a visitor-like thing and we already have 
Scorer doing that.  Lots of Weight classes to be modified; I wonder if it's 
less invasive at the Scorer?  Hmm.


> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document

2018-03-28 Thread Alan Woodward (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417443#comment-16417443
 ] 

Alan Woodward commented on LUCENE-8229:
---

The PR linked above illustrates the idea.  There are still some TODOs (I 
haven't added anything to PhraseWeight yet, for example).  Comments welcome!

> Add a method to Weight to retrieve matches for a single document
> 
>
> Key: LUCENE-8229
> URL: https://issues.apache.org/jira/browse/LUCENE-8229
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The ability to find out exactly what a query has matched on is a fairly 
> frequent feature request, and would also make highlighters much easier to 
> implement.  There have been a few attempts at doing this, including adding 
> positions to Scorers, or re-writing queries as Spans, but these all either 
> compromise general performance or involve up-front knowledge of all queries.
> Instead, I propose adding a method to Weight that exposes an iterator over 
> matches in a particular document and field.  It should be used in a similar 
> manner to explain() - ie, just for TopDocs, not as part of the scoring loop, 
> which relieves some of the pressure on performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org