[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-03-17 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17303720#comment-17303720
 ] 

Uwe Schindler commented on LUCENE-9640:
---

Elasticsearch has this functionality using the named query. When building the 
query tree you tag all "interesting queries" with a name and for each hit you 
get back all tag names the result was a hit. See docs: 
https://www.elastic.co/guide/en/elasticsearch/reference/7.11/query-dsl-bool-query.html#named-queries

They implement this outside and without wrapping: 
https://github.com/elastic/elasticsearch/blob/a92a647b9f17d1bddf5c707490a19482c273eda3/server/src/main/java/org/elasticsearch/search/fetch/subphase/MatchedQueriesPhase.java

The idea is to create a separate weight for each tagged query somewhere in the 
tree. For each hit, you just check by advancing the scorer of each tagged query 
to check if it's a hit.

This is completely outside and works quite well. I have used it quite often in 
Elasticsearch.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-02-18 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286757#comment-17286757
 ] 

Michael Sokolov commented on LUCENE-9640:
-

Top N would  cover most of the use cases we've been discussing, yes, so maybe 
Explain is a viable approach. I was worrying about performance, maybe too much. 
We can try that and see. 

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-02-18 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286725#comment-17286725
 ] 

Robert Muir commented on LUCENE-9640:
-

MIke its still a bit confusing to me (sorry, that kind of day): Is it true you 
don't need this really on every matching document (e.g. no need for relying on 
full DAAT traversal / large bitsets / slowing down query execution / preventing 
WAND)... you just care about the top-N? 

How different is what you want than invoking explain() on some of the top-N to 
try to record some details about how they were ranked?



> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-02-18 Thread Alan Woodward (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286712#comment-17286712
 ] 

Alan Woodward commented on LUCENE-9640:
---

Would the Matches API help here?  You can create a marker query that returns a 
particular Matches implementation from its corresponding Weight, and then you 
can descend through the Matches tree via `Matches#getSubMatches()` to find 
those marker queries that appear in the match.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-02-18 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286698#comment-17286698
 ] 

Michael Sokolov commented on LUCENE-9640:
-

Yes, I don't like the mutable Query idea either, but we're struggling to find 
alternatives. Highlighter sounds interesting. 

Here's a problem statement: for a given query, I'd like to be able to specify 
certain subqueries (like all the SHOULD clauses for example, or just pick some 
based on knowledge of how the query was constructed) and once I have a hit, 
find out which of those clauses matched. Then I want to use this information in 
a variety of ways, but at a minimum, log it  or add it to the search response 
for later analysis. One example use case: I am experimenting with adding a new 
source of matches, like a bunch of garbage text I mined from somewhere that an 
oracle tells me is supposed to be relevant to a document, but I'm not sure. 
Then I run an A/B test to see if adding this source of matches is helpful, and 
I want to control this at the query level rather than deploying parallel 
indexes.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-02-18 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286635#comment-17286635
 ] 

Robert Muir commented on LUCENE-9640:
-

I really can't believe we'd take the hit to make something like Query mutable 
for a corner-case, esp. when we don't even have a high-level use-case 
described. this issue immediately jumped to implementation, reinventing filters 
or highlighting or Scorer.getChildren for some unclear reason.

Don't mean it to come as an attack, but its similar to Scorer.getChildren 
(which I equally hate): we need to take a step back and think about the 
use-case and what other solutions might be appropriate. Maybe it should be a 
highlighter functionality and not a Query, for example.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-02-18 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17286515#comment-17286515
 ] 

Michael Sokolov commented on LUCENE-9640:
-

I worked up a version of this which in createWeight caches an array of Scorers, 
one for each leaf, and then supports a \{matched(int leaf, int docid)} intended 
to be called from \{LeafCollector.collect()} for each document, for each such 
tracked query which checks whether the leaf's scorer has advanced to the given 
docid. This is lightweight and works nicely, but relies on the TrackingQuery 
maintaining per-execution state, so it can only be used once, is not 
thread-safe, etc.

To avoid that we could instead walk the tree of Scorers, but then we need 
\{Scorable.getChildren}, which I guess is undesirable too. So there's a 
lesser-of-two-evils situation. Maybe having a mutable Query is not so bad?

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-01-06 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259708#comment-17259708
 ] 

Michael McCandless commented on LUCENE-9640:


Right, the idea here is that the user can carefully pick which clauses (of 
possibly many) they care to "track", and maybe pay a small performance penalty 
due to that choice if e.g. {{BooleanQuery}} had to choose a less efficient pure 
"doc at a time" {{Scorer}} as a result.

And, maybe, we could remove/deprecate {{Scorable.getChildren}} at the same time.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2021-01-04 Thread Elbek Kamoliddinov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258638#comment-17258638
 ] 

Elbek Kamoliddinov commented on LUCENE-9640:


Thanks Mike. {{QueryCache}} seems to be for the whole searcher instance, this 
would provide per query basis. user might have complex boolean query and only 
interested small part of this query to be tracked. I will check out 
{{Scorable.getChildren}}

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2020-12-20 Thread Michael McCandless (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17252538#comment-17252538
 ] 

Michael McCandless commented on LUCENE-9640:


Disclaimer: [~elbek@gmail.com] and I both work at Amazon, on customer 
facing product search.  I suggested [~elbek@gmail.com] to open this issue, 
but the idea is not very far along yet 

The idea here is to provide a better / more contained solution than 
{{Scorable.getChildren}} for tracking which query clause matched which hits. 
I.e. when Lucene users want to know if a given query clause matched each hit, 
they could wrap those clauses in {{TrackingQuery}}, but other clauses that they 
don't care about can remain as normal clauses.  They would also need custom 
{{Collector}} to record details from their {{TrackingQuery}}.

And then we could deprecate/remove {{Scorable.getChildren}}, maybe?

So this way it would only be {{TrackingQuery}} that suffers from the 
performance hit of to truly do "doc at a time scoring", rather than any 
{{BooleanQuery}}.

Or maybe it's a bad idea.  Or maybe there is some other way to enable users to 
track which query clause(s) matched which hits ...

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2020-12-17 Thread Robert Muir (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251059#comment-17251059
 ] 

Robert Muir commented on LUCENE-9640:
-

This sounds like yet another filter with all the complexity/traps/problems 
associated there.

Does something about the query cache not work for you?

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9640) Add TrackingQuery to track matching documents

2020-12-15 Thread Elbek Kamoliddinov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250097#comment-17250097
 ] 

Elbek Kamoliddinov commented on LUCENE-9640:


I have a naive implementation where {{TrackingQuery}} creates a sparse bitset 
per segment and sets a bit for a matching doc as query runs. I will put a PR 
later this week. I wanted to start a discussion and opinion from the community.

Thanks, 
 Elbek.

> Add TrackingQuery to track matching documents
> -
>
> Key: LUCENE-9640
> URL: https://issues.apache.org/jira/browse/LUCENE-9640
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/search
>Reporter: Elbek Kamoliddinov
>Priority: Major
>  Labels: query
>
> Some users benefit having {{TrackingQuery}} functionality. This query would 
> wrap another query and should be able to provide the matched DocIds for the 
> wrapped query after search is run.  For example a user running a boolean 
> query {{A or B}} could wrap query {{A}} into {{TrackingQuery}} and run the 
> boolean query and check if documents that matched the boolean query matches 
> the query {{A}}.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org