Jeff, explain() algorithm is definitely too slow to be used at search time. There is an approach which I'm aware of - watch for scorers during the search time. If scorer matches some doc _at some moment_ scorer.docID()==docNum. My team successfully implemented such Match Spotting algorithm, it performs quite well, and provides info like http://goo.gl/7vgrB The problem with this algorithm is that it's tightly coupled with low level scorers behavior, and they intended to behave contra-intuitively sometimes, and changes that behavior due to performance optimizations in lucene core. https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same, but I never looked into the source.
On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes <jwar...@whitepages.com> wrote: > Thanks, I did start to dig into how DebugComponent does its thing a > little, and I'm not all the way down the rabbit hole yet, but the lucene > indexSearcher's explain() method has this comment: > > "This is intended to be used in developing Similarity implementations, > and, for good performance, should not be displayed with every hit. > Computing an explanation is as expensive as executing the query over the > entire index." > > Which makes me wonder if I'd get almost all of the debugQuery=true > performance penalty anyway if I try to do as you suggest. > > > -----Original Message----- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Friday, December 07, 2012 10:47 AM > To: solr-user@lucene.apache.org > Subject: Re: Which fields matched? > > The debugQuery "explain" is simply a text display of what Lucene has > already calculated. As such, you could do a custom search component that > gets the non-text Lucene "Explanation" object for the query and then > traverse it to get your matched field list without all the text. No parsed > would be required, but the Explanation structure could get messy. > > -- Jack Krupansky > > -----Original Message----- > From: Jeff Wartes > Sent: Friday, December 07, 2012 11:59 AM > To: solr-user@lucene.apache.org > Subject: Which fields matched? > > > If I have an arbitrarily complex query that uses ORs, something like: > q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND > (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) > > I want to know which fields actually contributed to the match for each > document returned. Something like: > docID=1, > fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype > docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype > > > My basic use case is that I have several copyField'ed variations on the > same > data (using different complex FieldTypes), and I want to know which > variations contributed to the document so I can conclude things like "Well, > this document matched the field with the SynonymFilterFactory, but not the > one without, so this particular document must've been a synonym match." > > I know you could probably lift this from debugQuery output, but that's a > non-starter due to parsing complexity and query performance impact. > I think you could edge into some of this using the HighlightComponent > output, but that's a non-starter because it requires fields be stored=true. > Most of my fieldTypes are intended solely for indexing/search, and make no > sense from a stored/retrieval standpoint. And to be clear, I really don't > care about which terms matched anyway, only which fields. > > If there's an easy way to get this, I'd love to hear it. Otherwise, I'm > mostly looking for a head start on where to go looking for this data so I > can add my own Component or something - assuming the data is even available > in the solr layer? > > Thanks. > > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>