I think it's a matter of tradeoff. For example when you do faceting then we
require complete evaluation, and since this field-matching is a kind of
aggregation I think it's OK if that's how it works. Users can choose which
technique they want to apply based on their usecase.
Anyway I don't think
I think it depends on what information we actually want to get here. If it’s
just finding which fields matched in which document, then running Matches over
the top-k results is fine. If you want to get some kind of aggregate data, as
in you want to get a list of fields that matched in *any*
For a quick hack, you can use highlighting. That does more than you want,
showing which words match, but it does have the info.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jun 27, 2022, at 3:23 AM, Shai Erera wrote:
>
> Thanks Uwe, I didn't
Hi Adrien,
maybe it changed a bit, but last time I looked into is it was somehow
wrapping all Queries using a wrapper "NamedQuery" or similiar. When it
collected hits it was able to figure out by a wrapper somewhere around
weight/scorer/DISI and set a flag that the query was a hit. It could
Uwe,
Elasticsearch's named queries are not using a collector actually. Ater top
hits have been evaluated for the whole query, they are evaluated
independently on each of the top hits. It's probably faster than the
collector approach since it doesn't add per-document overhead to
collection, but
Thanks Alan, yeah I guess I was thinking about the usecase I described,
which involves (usually) simple term queries, but you're definitely right
about complex boolean clauses as well non-term queries.
I think the case for highlighter is different though? I mean you usually
generate highlights
A side note - I've been using a highlighter based on matches API for
quite some time now and it's been fantastic. Very precise and handles
non-trivial queries (interval queries) very well.
Your approach is almost certainly more efficient, but it might give you false
matches in some cases - for example, if you have a complex query with many
nested MUST and SHOULD clauses, you can have a leaf TermScorer that is
positioned on the correct document, but which is part of a clause that
Thanks Uwe, I didn't know about named queries, but it seems useful. Is
there interest in getting similar functionality in Lucene, or perhaps just
the FieldMatching collector? I'd be happy to PR-it.
As for usecase, I was thinking of using something similar to this collector
for some kind of
I think the collector approach is perfectly fine for mass-processing of
queries.
By the way: Elasticserach/Opensearch have a feature already built-in and
it is working based on collector API in a similar way like you mentioned
(as far as I remember). It is a bit different as you can tag any
Out of curiosity and for education purposes, is the Collector approach I
proposed wrong/inefficient? Or less efficient than the matches() API?
I'm thinking, if you want to both match/rank documents and as a side effect
know which fields matched, the Collector will perform better than
The matches API is awesome. Use it. You can also get a rough glimpse
into a superset of fields potentially matching the query via:
query.visit(
new QueryVisitor() {
@Override
public boolean acceptField(String field) {
affectedFields.add(field);
The Matches API will give you this information - it’s still likely to be fairly
slow, but it’s a lot easier to use than trying to parse Explain output.
Query q = ….;
Weight w = searcher.createWeight(searcher.rewrite(query),
ScoreMode.COMPLETE_NO_SCORES, 1.0f);
Matches m = w.matches(context,
What is the reason you need the matched fields? Maybe your use case can be
solved using sth completely different than knowing which fields were matched.
> Am 25.06.2022 um 06:58 schrieb Yichen Sun :
>
> Hello!
>
> I’m a MSCS student from BU and learning to use Lucene. Recently I try to
>
Hi Yichen,
I think you can implement a custom Collector which tracks the fields that
were matched for each Scorer. I implemented an example such Collector below:
public class FieldMatchingCollector implements Collector {
/** Holds the number of matching documents for each field. */
public
Hello!
I’m a MSCS student from BU and learning to use Lucene. Recently I try to
output matched fields by one query. For example, for one document, there
are 10 fields and 2 of them match the query. I want to get the name of
these fields.
I have tried using explain() method and getting
16 matches
Mail list logo