[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303720#comment-17303720 ] Uwe Schindler commented on LUCENE-9640: --- Elasticsearch has this functionality using the named query. When building the query tree you tag all "interesting queries" with a name and for each hit you get back all tag names the result was a hit. See docs: https://www.elastic.co/guide/en/elasticsearch/reference/7.11/query-dsl-bool-query.html#named-queries They implement this outside and without wrapping: https://github.com/elastic/elasticsearch/blob/a92a647b9f17d1bddf5c707490a19482c273eda3/server/src/main/java/org/elasticsearch/search/fetch/subphase/MatchedQueriesPhase.java The idea is to create a separate weight for each tagged query somewhere in the tree. For each hit, you just check by advancing the scorer of each tagged query to check if it's a hit. This is completely outside and works quite well. I have used it quite often in Elasticsearch. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286757#comment-17286757 ] Michael Sokolov commented on LUCENE-9640: - Top N would cover most of the use cases we've been discussing, yes, so maybe Explain is a viable approach. I was worrying about performance, maybe too much. We can try that and see. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286725#comment-17286725 ] Robert Muir commented on LUCENE-9640: - MIke its still a bit confusing to me (sorry, that kind of day): Is it true you don't need this really on every matching document (e.g. no need for relying on full DAAT traversal / large bitsets / slowing down query execution / preventing WAND)... you just care about the top-N? How different is what you want than invoking explain() on some of the top-N to try to record some details about how they were ranked? > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286712#comment-17286712 ] Alan Woodward commented on LUCENE-9640: --- Would the Matches API help here? You can create a marker query that returns a particular Matches implementation from its corresponding Weight, and then you can descend through the Matches tree via `Matches#getSubMatches()` to find those marker queries that appear in the match. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286698#comment-17286698 ] Michael Sokolov commented on LUCENE-9640: - Yes, I don't like the mutable Query idea either, but we're struggling to find alternatives. Highlighter sounds interesting. Here's a problem statement: for a given query, I'd like to be able to specify certain subqueries (like all the SHOULD clauses for example, or just pick some based on knowledge of how the query was constructed) and once I have a hit, find out which of those clauses matched. Then I want to use this information in a variety of ways, but at a minimum, log it or add it to the search response for later analysis. One example use case: I am experimenting with adding a new source of matches, like a bunch of garbage text I mined from somewhere that an oracle tells me is supposed to be relevant to a document, but I'm not sure. Then I run an A/B test to see if adding this source of matches is helpful, and I want to control this at the query level rather than deploying parallel indexes. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286635#comment-17286635 ] Robert Muir commented on LUCENE-9640: - I really can't believe we'd take the hit to make something like Query mutable for a corner-case, esp. when we don't even have a high-level use-case described. this issue immediately jumped to implementation, reinventing filters or highlighting or Scorer.getChildren for some unclear reason. Don't mean it to come as an attack, but its similar to Scorer.getChildren (which I equally hate): we need to take a step back and think about the use-case and what other solutions might be appropriate. Maybe it should be a highlighter functionality and not a Query, for example. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286515#comment-17286515 ] Michael Sokolov commented on LUCENE-9640: - I worked up a version of this which in createWeight caches an array of Scorers, one for each leaf, and then supports a \{matched(int leaf, int docid)} intended to be called from \{LeafCollector.collect()} for each document, for each such tracked query which checks whether the leaf's scorer has advanced to the given docid. This is lightweight and works nicely, but relies on the TrackingQuery maintaining per-execution state, so it can only be used once, is not thread-safe, etc. To avoid that we could instead walk the tree of Scorers, but then we need \{Scorable.getChildren}, which I guess is undesirable too. So there's a lesser-of-two-evils situation. Maybe having a mutable Query is not so bad? > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259708#comment-17259708 ] Michael McCandless commented on LUCENE-9640: Right, the idea here is that the user can carefully pick which clauses (of possibly many) they care to "track", and maybe pay a small performance penalty due to that choice if e.g. {{BooleanQuery}} had to choose a less efficient pure "doc at a time" {{Scorer}} as a result. And, maybe, we could remove/deprecate {{Scorable.getChildren}} at the same time. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258638#comment-17258638 ] Elbek Kamoliddinov commented on LUCENE-9640: Thanks Mike. {{QueryCache}} seems to be for the whole searcher instance, this would provide per query basis. user might have complex boolean query and only interested small part of this query to be tracked. I will check out {{Scorable.getChildren}} > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252538#comment-17252538 ] Michael McCandless commented on LUCENE-9640: Disclaimer: [~elbek@gmail.com] and I both work at Amazon, on customer facing product search. I suggested [~elbek@gmail.com] to open this issue, but the idea is not very far along yet The idea here is to provide a better / more contained solution than {{Scorable.getChildren}} for tracking which query clause matched which hits. I.e. when Lucene users want to know if a given query clause matched each hit, they could wrap those clauses in {{TrackingQuery}}, but other clauses that they don't care about can remain as normal clauses. They would also need custom {{Collector}} to record details from their {{TrackingQuery}}. And then we could deprecate/remove {{Scorable.getChildren}}, maybe? So this way it would only be {{TrackingQuery}} that suffers from the performance hit of to truly do "doc at a time scoring", rather than any {{BooleanQuery}}. Or maybe it's a bad idea. Or maybe there is some other way to enable users to track which query clause(s) matched which hits ... > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251059#comment-17251059 ] Robert Muir commented on LUCENE-9640: - This sounds like yet another filter with all the complexity/traps/problems associated there. Does something about the query cache not work for you? > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents
[ https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250097#comment-17250097 ] Elbek Kamoliddinov commented on LUCENE-9640: I have a naive implementation where {{TrackingQuery}} creates a sparse bitset per segment and sets a bit for a matching doc as query runs. I will put a PR later this week. I wanted to start a discussion and opinion from the community. Thanks, Elbek. > Add TrackingQuery to track matching documents > - > > Key: LUCENE-9640 > URL: https://issues.apache.org/jira/browse/LUCENE-9640 > Project: Lucene - Core > Issue Type: New Feature > Components: core/search >Reporter: Elbek Kamoliddinov >Priority: Major > Labels: query > > Some users benefit having {{TrackingQuery}} functionality. This query would > wrap another query and should be able to provide the matched DocIds for the > wrapped query after search is run. For example a user running a boolean > query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the > boolean query and check if documents that matched the boolean query matches > the query {{A}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org