[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16536633#comment-16536633 ] Alan Woodward commented on LUCENE-8229: --- Simon added IOSupplier rather than IOConsumer, which would already work for this I think - I'll open an issue. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534937#comment-16534937 ] David Smiley commented on LUCENE-8229: -- I was just looking at Matches.MatchesIteratorSupplier. It's a shame to need mirror images of existing java.util.function interfaces that only differ in that it throws IOException. See org.apache.lucene.util.IOUtils.IOConsumer added by [~simonw] recently. I propose that we add an IOSupplier here and get rid of MatchesIteratorSupplier (in a new issue of course). WDYT? We ought to have a consistent approach in Lucene to this scenario. I've wanted an IOSupplier in Solr for something recently and saw it hadn't been added yet. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435723#comment-16435723 ] ASF subversion and git services commented on LUCENE-8229: - Commit dc7f841e361ad9f29dc54a638856d6becc8c99d3 in lucene-solr's branch refs/heads/branch_7x from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=dc7f841 ] LUCENE-8229: add lucene.experimental, plus small changes (cherry picked from commit e6b6515) > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435716#comment-16435716 ] ASF subversion and git services commented on LUCENE-8229: - Commit e6b65151b6f4aec66376b3d4acc1a057167f62f6 in lucene-solr's branch refs/heads/master from [~dsmiley] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=e6b6515 ] LUCENE-8229: add lucene.experimental, plus small changes > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16435020#comment-16435020 ] Alan Woodward commented on LUCENE-8229: --- +1 > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434924#comment-16434924 ] David Smiley commented on LUCENE-8229: -- I'm excited about this too :-) I made some small tweaks, such as adding lucene.experimental annotations (at least until 8.0), and using java 8 streams in one place. What do you think? > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Fix For: 7.4 > > Attachments: LUCENE-8229.patch, LUCENE-8229_small_improvements.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433640#comment-16433640 ] ASF subversion and git services commented on LUCENE-8229: - Commit 502fd4bf12b8860b8eea504a96ad1b49dd52938c in lucene-solr's branch refs/heads/branch_7x from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=502fd4b ] LUCENE-8229: Add Weight.matches() to iterate over match positions > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8229.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16433641#comment-16433641 ] ASF subversion and git services commented on LUCENE-8229: - Commit 040a9601b1b346391ad37e5a0a4f2f598e72d26e in lucene-solr's branch refs/heads/master from [~romseygeek] [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=040a960 ] LUCENE-8229: Add Weight.matches() to iterate over match positions > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8229.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16428112#comment-16428112 ] Alan Woodward commented on LUCENE-8229: --- OK, the latest iteration moves the notion of a match containing no hits up to the Matches object, and has a default implementation on Weight. This makes the patch much smaller - thanks for the suggestion [~dsmiley]! I think I'd like to keep the name Matches - we might at some point in the future want to add the ability to return matches from DocValues fields, for example, so we wouldn't necessarily be returning positions. Re payloads, that's a convincing use-case. I have some concerns as to how to implement them over composite matches, such as Phrases or Spans, which aren't dealt with yet, so let's do that in a follow-up issue. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8229.patch > > Time Spent: 2h 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427422#comment-16427422 ] Alan Woodward commented on LUCENE-8229: --- bq. because even a no-match response requires knowledge of the field Thinking about it, this is unnecessary, isn't it. We can have a specialised Matches object which just means 'a match in this doc, but no term hits', which would be returned by default if the scorer matched. Which would allow a default implementation. I'll work on a new patch. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8229.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427185#comment-16427185 ] David Smiley commented on LUCENE-8229: -- It's really looking great Alan. I looked over your patch a bit more * I wonder if "Matches" sounds too generic; perhaps "PositionMatches" to emphasize it has position information and not simply matching document IDs? * It's a shame that every Weight must implement this (no default impl) because even a no-match response requires knowledge of the field. Is the distinction important to know the field? I suppose it might be useful for figuring out generically which fields a query references... but no not really because you have to execute it on a matching document first to even figure that out with this API. * Matcher.EMPTY (a empty version of MatchesIterator) should perhaps be moved to MatchesIterator? Come to think of it, maybe MatchesIterator could be Matches.Iterator (inner class of Matches)? (avoids polluting the busy .search namespace). * RE payloads: I appreciate you want to keep things simple for now. I've heard of putting OCR document offset information in them, for example, and a highlighter might want this. A highlighter might want whatever metadata is being put in a payload, even if it is relevancy oriented -- consider a relevancy debugger tool that could show you what's in the payload. This might not even be a "highlighter" per-se. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8229.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425406#comment-16425406 ] Alan Woodward commented on LUCENE-8229: --- Patch up to date with master, precommit and tests pass. I think this is ready? > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Attachments: LUCENE-8229.patch > > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422822#comment-16422822 ] Alan Woodward commented on LUCENE-8229: --- I've pushed a few more changes - IndexOrDocValuesQuery should use the dvWeight to check if it matches, I've added a term() method so that the iterator can report which term it's currently positioned on, and I've removed the iteration for SpanQueries. I want to think more about how we iterate over composite queries like Span or phrase (or interval, soon), as I can see situations where we'd both want to iterate over the whole thing, or where we'd want iterate over the sub parts as well, and I'd like to leave that to a follow-up issue. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418770#comment-16418770 ] Alan Woodward commented on LUCENE-8229: --- Having slept on it, I've come round to [~dsmiley]'s suggestion of returning matches from all fields. I've pushed some changes which add an intermediate Matches object, which holds iterators for all fields with matches. So the method signature on Weight now looks like this: {code:java} public abstract Matches getMatches(LeafReaderContext ctx, int doc){code} You can then get a MatchesIterator for a given field by calling {code}Matches.getFieldMatches(String field}{code}, or get the set of all fields containing matches by calling {code}Matches.getMatchFields(){code}. This has the nice side-effect of making BooleanWeight.matches() much more efficient. Re AutomatonQuery, we have a lot more leeway here because it's only working on a single document at a time. The way I've done things so far is to pull postings for all the matching terms, but only create a MatchesIterator if the postings can be advanced to the document we're interested in. Otherwise, the PostingsEnum gets re-used. This should have similar performance characteristics to the creation of a scorer over a single segment. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418708#comment-16418708 ] Jim Ferenczi commented on LUCENE-8229: -- I like the proposal here. For simple queries it makes the extraction of matched positions trivial. Though I wonder how the complex queries would handle this, for instance the AutomatonQuery cannot just return an enum over all matching terms, we have a special handling of this query in highlighters to avoid the explosion for instance. What is your current plan to handle this query ? Should it return null for simplicity or should it try to expand the automaton with a limit on the number of terms ? I prefer the former which is safe and if users want to check the matching of a complex automaton they can use use a MemoryIndex for each TopDocument and change the query to use the rewrite method that builds a boolean query. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418007#comment-16418007 ] Alan Woodward commented on LUCENE-8229: --- {quote}A caller might want all fields, or perhaps just some {quote} I've done it this way to keep the API as simple as possible. If we start iterating over multiple fields then MatchesIterator becomes a lot more complicated, and I don't think it gains us anything? If consumers want to get the matches on multiple fields, then they can call Weight.matches() multiple times. Re payloads, I think of them as a search-time feature, and not really relevant here. Let's keep this API focussed. I have tried putting something similar to the MatchesIterator on Scorer, but it doesn't really fit. Scorers are designed to iterate over matching documents very efficiently, and lots of them have optimizations which mean that positions and/or offsets aren't actually available - for example, things like TermInSet or AutomatonQuery get rewritten to bitsets, or disjunctions can use bulk scorers, or the query cache can intercept things. Whereas Weight already has explain(), which has similar semantics to this - useful information that you might sometimes want for your TopDocs, but not something you want to be running against every matching document. And if anything, there are more Scorer implementations than Weights, so it would be more invasive a change. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417909#comment-16417909 ] Dawid Weiss commented on LUCENE-8229: - bq. The ability to find out exactly what a query has matched on is a fairly frequent feature request I confirm the need for this -- we have custom highlighters and they're way too complex because of the need to decompose each and every query into match ranges. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417849#comment-16417849 ] David Smiley commented on LUCENE-8229: -- This is really interesting [~romseygeek]! Here's your proposed signature: {{public MatchesIterator matches(LeafReaderContext context, int doc, String field) throws IOException}} * I'm unsure about this new matches method requiring a field reference, thus insisting all fields in the query match the field in this argument. A caller might want all fields, or perhaps just some. This could easily be converted to a Predicate to match the field. * Add payloads to {{MatchesIterator}} * Perhaps {{matches}} should take an int for the PostingsEnum flags. This way it could choose to ask for offsets and/or payloads. Or maybe just always get both to keep the API simpler, assuming the perf difference is negligible for practical uses of this feature (which sounds plausible to me). It could be added later if desired. Yeah, lets not now then. Have you considered a very different approach of modifying Scorer to expose more information about the matches in a document? I'm just thinking out-loud here; might be a bad idea ;-). Maybe I'm saying the same thing as "adding positions to Scorers" as you reference in the description, but maybe it could hang off indirectly using the {{MatchesIterator}} you developed here. Your proposed {{Weight.matches(...)}} is a visitor-like thing and we already have Scorer doing that. Lots of Weight classes to be modified; I wonder if it's less invasive at the Scorer? Hmm. > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8229) Add a method to Weight to retrieve matches for a single document
[ https://issues.apache.org/jira/browse/LUCENE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417443#comment-16417443 ] Alan Woodward commented on LUCENE-8229: --- The PR linked above illustrates the idea. There are still some TODOs (I haven't added anything to PhraseWeight yet, for example). Comments welcome! > Add a method to Weight to retrieve matches for a single document > > > Key: LUCENE-8229 > URL: https://issues.apache.org/jira/browse/LUCENE-8229 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Alan Woodward >Assignee: Alan Woodward >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The ability to find out exactly what a query has matched on is a fairly > frequent feature request, and would also make highlighters much easier to > implement. There have been a few attempts at doing this, including adding > positions to Scorers, or re-writing queries as Spans, but these all either > compromise general performance or involve up-front knowledge of all queries. > Instead, I propose adding a method to Weight that exposes an iterator over > matches in a particular document and field. It should be used in a similar > manner to explain() - ie, just for TopDocs, not as part of the scoring loop, > which relieves some of the pressure on performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org