Jeff,
explain() algorithm is definitely too slow to be used at search time. There
is an approach which I'm aware of - watch for scorers during the search
time. If scorer matches some doc _at some moment_ scorer.docID()==docNum.
My team successfully implemented such Match Spotting algorithm, it performs
quite well, and provides info like http://goo.gl/7vgrB
The problem with this algorithm is that it's tightly coupled with low level
scorers behavior, and they intended to behave contra-intuitively sometimes,
and changes that behavior due to performance optimizations in lucene core.
https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same,
but I never looked into the source.


On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes <jwar...@whitepages.com> wrote:

> Thanks, I did start to dig into how DebugComponent does its thing a
> little, and I'm not all the way down the rabbit hole yet, but the lucene
> indexSearcher's explain() method has this comment:
>
> "This is intended to be used in developing Similarity implementations,
> and, for good performance, should not be displayed with every hit.
> Computing an explanation is as expensive as executing the query over the
> entire index."
>
> Which makes me wonder if I'd get almost all of the debugQuery=true
> performance penalty anyway if I try to do as you suggest.
>
>
> -----Original Message-----
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Friday, December 07, 2012 10:47 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Which fields matched?
>
> The debugQuery "explain" is simply a text display of what Lucene has
> already calculated. As such, you could do a custom search component that
> gets the non-text Lucene "Explanation" object for the query and then
> traverse it to get your matched field list without all the text. No parsed
> would be required, but the Explanation structure could get messy.
>
> -- Jack Krupansky
>
> -----Original Message-----
> From: Jeff Wartes
> Sent: Friday, December 07, 2012 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Which fields matched?
>
>
> If I have an arbitrarily complex query that uses ORs, something like:
> q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND
> (another_simple_fieldtype:bar OR another_complex_fieldtype:bar)
>
> I want to know which fields actually contributed to the match for each
> document returned. Something like:
> docID=1,
> fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype
> docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype
>
>
> My basic use case is that I have several copyField'ed variations on the
> same
> data (using different complex FieldTypes), and I want to know which
> variations contributed to the document so I can conclude things like "Well,
> this document matched the field with the SynonymFilterFactory, but not the
> one without, so this particular document must've been a synonym match."
>
> I know you could probably lift this from debugQuery output, but that's a
> non-starter due to parsing complexity and query performance impact.
> I think you could edge into some of this using the HighlightComponent
> output, but that's a non-starter because it requires fields be stored=true.
> Most of my fieldTypes are intended solely for indexing/search, and make no
> sense from a stored/retrieval standpoint. And to be clear, I really don't
> care about which terms matched anyway, only which fields.
>
> If there's an easy way to get this, I'd love to hear it. Otherwise, I'm
> mostly looking for a head start on where to go looking for this data so I
> can add my own Component or something - assuming the data is even available
> in the solr layer?
>
> Thanks.
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to