Thanks, this is good stuff, I hadn't seen LUCENE-1999, and it even has a 
reference to some methods now in core.

I think I'm still stuck for the moment though. I'm pretty fixed on Solr 3.5 for 
the next few development cycles, and I've been trying really hard to avoid 
compiling my own Solr - as it appears most of these approaches would require. 
(I'm not above inserting my own jars into the Stock Solr WAR, but I have to 
draw the line someplace.)

I'm going to spend some time with the highlight component and see if I can get 
something working with that. The basic String-version of my fields (which I 
copyField into many indexed-only field types) is stored, so I'm thinking I 
might be able to use hl.q to at least answer a basic "did this document match 
the original query unaltered" question.



-----Original Message-----
From: Paul Libbrecht [mailto:p...@hoplahup.net] 
Sent: Saturday, December 08, 2012 7:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Which fields matched?

We've used lucene-1999 with some success in ActiveMath to find the language 
that was matched.

paul


Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit :

> Jeff,
> explain() algorithm is definitely too slow to be used at search time. 
> There is an approach which I'm aware of - watch for scorers during the 
> search time. If scorer matches some doc _at some moment_ 
> scorer.docID()==docNum.
> My team successfully implemented such Match Spotting algorithm, it 
> performs quite well, and provides info like http://goo.gl/7vgrB The 
> problem with this algorithm is that it's tightly coupled with low 
> level scorers behavior, and they intended to behave contra-intuitively 
> sometimes, and changes that behavior due to performance optimizations in 
> lucene core.
> https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the 
> same, but I never looked into the source.
> 
> 
> On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes <jwar...@whitepages.com> wrote:
> 
>> Thanks, I did start to dig into how DebugComponent does its thing a 
>> little, and I'm not all the way down the rabbit hole yet, but the 
>> lucene indexSearcher's explain() method has this comment:
>> 
>> "This is intended to be used in developing Similarity 
>> implementations, and, for good performance, should not be displayed with 
>> every hit.
>> Computing an explanation is as expensive as executing the query over 
>> the entire index."
>> 
>> Which makes me wonder if I'd get almost all of the debugQuery=true 
>> performance penalty anyway if I try to do as you suggest.
>> 
>> 
>> -----Original Message-----
>> From: Jack Krupansky [mailto:j...@basetechnology.com]
>> Sent: Friday, December 07, 2012 10:47 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Which fields matched?
>> 
>> The debugQuery "explain" is simply a text display of what Lucene has 
>> already calculated. As such, you could do a custom search component 
>> that gets the non-text Lucene "Explanation" object for the query and 
>> then traverse it to get your matched field list without all the text. 
>> No parsed would be required, but the Explanation structure could get messy.
>> 
>> -- Jack Krupansky
>> 
>> -----Original Message-----
>> From: Jeff Wartes
>> Sent: Friday, December 07, 2012 11:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Which fields matched?
>> 
>> 
>> If I have an arbitrarily complex query that uses ORs, something like:
>> q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND 
>> (another_simple_fieldtype:bar OR another_complex_fieldtype:bar)
>> 
>> I want to know which fields actually contributed to the match for 
>> each document returned. Something like:
>> docID=1,
>> fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fie
>> ldtype docID=2, 
>> fields_matched=simple_fieldtype,another_complex_fieldtype
>> 
>> 
>> My basic use case is that I have several copyField'ed variations on 
>> the same data (using different complex FieldTypes), and I want to 
>> know which variations contributed to the document so I can conclude 
>> things like "Well, this document matched the field with the 
>> SynonymFilterFactory, but not the one without, so this particular 
>> document must've been a synonym match."
>> 
>> I know you could probably lift this from debugQuery output, but 
>> that's a non-starter due to parsing complexity and query performance impact.
>> I think you could edge into some of this using the HighlightComponent 
>> output, but that's a non-starter because it requires fields be stored=true.
>> Most of my fieldTypes are intended solely for indexing/search, and 
>> make no sense from a stored/retrieval standpoint. And to be clear, I 
>> really don't care about which terms matched anyway, only which fields.
>> 
>> If there's an easy way to get this, I'd love to hear it. Otherwise, 
>> I'm mostly looking for a head start on where to go looking for this 
>> data so I can add my own Component or something - assuming the data 
>> is even available in the solr layer?
>> 
>> Thanks.
>> 
>> 
> 
> 
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
> 
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>

Reply via email to