RE: Which fields matched?
Thanks, this is good stuff, I hadn't seen LUCENE-1999, and it even has a reference to some methods now in core. I think I'm still stuck for the moment though. I'm pretty fixed on Solr 3.5 for the next few development cycles, and I've been trying really hard to avoid compiling my own Solr - as it appears most of these approaches would require. (I'm not above inserting my own jars into the Stock Solr WAR, but I have to draw the line someplace.) I'm going to spend some time with the highlight component and see if I can get something working with that. The basic String-version of my fields (which I copyField into many indexed-only field types) is stored, so I'm thinking I might be able to use hl.q to at least answer a basic did this document match the original query unaltered question. -Original Message- From: Paul Libbrecht [mailto:p...@hoplahup.net] Sent: Saturday, December 08, 2012 7:36 PM To: solr-user@lucene.apache.org Subject: Re: Which fields matched? We've used lucene-1999 with some success in ActiveMath to find the language that was matched. paul Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit : Jeff, explain() algorithm is definitely too slow to be used at search time. There is an approach which I'm aware of - watch for scorers during the search time. If scorer matches some doc _at some moment_ scorer.docID()==docNum. My team successfully implemented such Match Spotting algorithm, it performs quite well, and provides info like http://goo.gl/7vgrB The problem with this algorithm is that it's tightly coupled with low level scorers behavior, and they intended to behave contra-intuitively sometimes, and changes that behavior due to performance optimizations in lucene core. https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same, but I never looked into the source. On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote: Thanks, I did start to dig into how DebugComponent does its thing a little, and I'm not all the way down the rabbit hole yet, but the lucene indexSearcher's explain() method has this comment: This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index. Which makes me wonder if I'd get almost all of the debugQuery=true performance penalty anyway if I try to do as you suggest. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, December 07, 2012 10:47 AM To: solr-user@lucene.apache.org Subject: Re: Which fields matched? The debugQuery explain is simply a text display of what Lucene has already calculated. As such, you could do a custom search component that gets the non-text Lucene Explanation object for the query and then traverse it to get your matched field list without all the text. No parsed would be required, but the Explanation structure could get messy. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Friday, December 07, 2012 11:59 AM To: solr-user@lucene.apache.org Subject: Which fields matched? If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like: docID=1, fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fie ldtype docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype My basic use case is that I have several copyField'ed variations on the same data (using different complex FieldTypes), and I want to know which variations contributed to the document so I can conclude things like Well, this document matched the field with the SynonymFilterFactory, but not the one without, so this particular document must've been a synonym match. I know you could probably lift this from debugQuery output, but that's a non-starter due to parsing complexity and query performance impact. I think you could edge into some of this using the HighlightComponent output, but that's a non-starter because it requires fields be stored=true. Most of my fieldTypes are intended solely for indexing/search, and make no sense from a stored/retrieval standpoint. And to be clear, I really don't care about which terms matched anyway, only which fields. If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly looking for a head start on where to go looking for this data so I can add my own Component or something - assuming the data is even available in the solr layer? Thanks. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Which fields matched?
Jeff, explain() algorithm is definitely too slow to be used at search time. There is an approach which I'm aware of - watch for scorers during the search time. If scorer matches some doc _at some moment_ scorer.docID()==docNum. My team successfully implemented such Match Spotting algorithm, it performs quite well, and provides info like http://goo.gl/7vgrB The problem with this algorithm is that it's tightly coupled with low level scorers behavior, and they intended to behave contra-intuitively sometimes, and changes that behavior due to performance optimizations in lucene core. https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same, but I never looked into the source. On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote: Thanks, I did start to dig into how DebugComponent does its thing a little, and I'm not all the way down the rabbit hole yet, but the lucene indexSearcher's explain() method has this comment: This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index. Which makes me wonder if I'd get almost all of the debugQuery=true performance penalty anyway if I try to do as you suggest. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, December 07, 2012 10:47 AM To: solr-user@lucene.apache.org Subject: Re: Which fields matched? The debugQuery explain is simply a text display of what Lucene has already calculated. As such, you could do a custom search component that gets the non-text Lucene Explanation object for the query and then traverse it to get your matched field list without all the text. No parsed would be required, but the Explanation structure could get messy. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Friday, December 07, 2012 11:59 AM To: solr-user@lucene.apache.org Subject: Which fields matched? If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like: docID=1, fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype My basic use case is that I have several copyField'ed variations on the same data (using different complex FieldTypes), and I want to know which variations contributed to the document so I can conclude things like Well, this document matched the field with the SynonymFilterFactory, but not the one without, so this particular document must've been a synonym match. I know you could probably lift this from debugQuery output, but that's a non-starter due to parsing complexity and query performance impact. I think you could edge into some of this using the HighlightComponent output, but that's a non-starter because it requires fields be stored=true. Most of my fieldTypes are intended solely for indexing/search, and make no sense from a stored/retrieval standpoint. And to be clear, I really don't care about which terms matched anyway, only which fields. If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly looking for a head start on where to go looking for this data so I can add my own Component or something - assuming the data is even available in the solr layer? Thanks. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Which fields matched?
We've used lucene-1999 with some success in ActiveMath to find the language that was matched. paul Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit : Jeff, explain() algorithm is definitely too slow to be used at search time. There is an approach which I'm aware of - watch for scorers during the search time. If scorer matches some doc _at some moment_ scorer.docID()==docNum. My team successfully implemented such Match Spotting algorithm, it performs quite well, and provides info like http://goo.gl/7vgrB The problem with this algorithm is that it's tightly coupled with low level scorers behavior, and they intended to behave contra-intuitively sometimes, and changes that behavior due to performance optimizations in lucene core. https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same, but I never looked into the source. On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote: Thanks, I did start to dig into how DebugComponent does its thing a little, and I'm not all the way down the rabbit hole yet, but the lucene indexSearcher's explain() method has this comment: This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index. Which makes me wonder if I'd get almost all of the debugQuery=true performance penalty anyway if I try to do as you suggest. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, December 07, 2012 10:47 AM To: solr-user@lucene.apache.org Subject: Re: Which fields matched? The debugQuery explain is simply a text display of what Lucene has already calculated. As such, you could do a custom search component that gets the non-text Lucene Explanation object for the query and then traverse it to get your matched field list without all the text. No parsed would be required, but the Explanation structure could get messy. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Friday, December 07, 2012 11:59 AM To: solr-user@lucene.apache.org Subject: Which fields matched? If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like: docID=1, fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype My basic use case is that I have several copyField'ed variations on the same data (using different complex FieldTypes), and I want to know which variations contributed to the document so I can conclude things like Well, this document matched the field with the SynonymFilterFactory, but not the one without, so this particular document must've been a synonym match. I know you could probably lift this from debugQuery output, but that's a non-starter due to parsing complexity and query performance impact. I think you could edge into some of this using the HighlightComponent output, but that's a non-starter because it requires fields be stored=true. Most of my fieldTypes are intended solely for indexing/search, and make no sense from a stored/retrieval standpoint. And to be clear, I really don't care about which terms matched anyway, only which fields. If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly looking for a head start on where to go looking for this data so I can add my own Component or something - assuming the data is even available in the solr layer? Thanks. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Which fields matched?
If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like: docID=1, fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype My basic use case is that I have several copyField'ed variations on the same data (using different complex FieldTypes), and I want to know which variations contributed to the document so I can conclude things like Well, this document matched the field with the SynonymFilterFactory, but not the one without, so this particular document must've been a synonym match. I know you could probably lift this from debugQuery output, but that's a non-starter due to parsing complexity and query performance impact. I think you could edge into some of this using the HighlightComponent output, but that's a non-starter because it requires fields be stored=true. Most of my fieldTypes are intended solely for indexing/search, and make no sense from a stored/retrieval standpoint. And to be clear, I really don't care about which terms matched anyway, only which fields. If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly looking for a head start on where to go looking for this data so I can add my own Component or something - assuming the data is even available in the solr layer? Thanks.
Re: Which fields matched?
The debugQuery explain is simply a text display of what Lucene has already calculated. As such, you could do a custom search component that gets the non-text Lucene Explanation object for the query and then traverse it to get your matched field list without all the text. No parsed would be required, but the Explanation structure could get messy. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Friday, December 07, 2012 11:59 AM To: solr-user@lucene.apache.org Subject: Which fields matched? If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like: docID=1, fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype My basic use case is that I have several copyField'ed variations on the same data (using different complex FieldTypes), and I want to know which variations contributed to the document so I can conclude things like Well, this document matched the field with the SynonymFilterFactory, but not the one without, so this particular document must've been a synonym match. I know you could probably lift this from debugQuery output, but that's a non-starter due to parsing complexity and query performance impact. I think you could edge into some of this using the HighlightComponent output, but that's a non-starter because it requires fields be stored=true. Most of my fieldTypes are intended solely for indexing/search, and make no sense from a stored/retrieval standpoint. And to be clear, I really don't care about which terms matched anyway, only which fields. If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly looking for a head start on where to go looking for this data so I can add my own Component or something - assuming the data is even available in the solr layer? Thanks.
RE: Which fields matched?
Thanks, I did start to dig into how DebugComponent does its thing a little, and I'm not all the way down the rabbit hole yet, but the lucene indexSearcher's explain() method has this comment: This is intended to be used in developing Similarity implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index. Which makes me wonder if I'd get almost all of the debugQuery=true performance penalty anyway if I try to do as you suggest. -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, December 07, 2012 10:47 AM To: solr-user@lucene.apache.org Subject: Re: Which fields matched? The debugQuery explain is simply a text display of what Lucene has already calculated. As such, you could do a custom search component that gets the non-text Lucene Explanation object for the query and then traverse it to get your matched field list without all the text. No parsed would be required, but the Explanation structure could get messy. -- Jack Krupansky -Original Message- From: Jeff Wartes Sent: Friday, December 07, 2012 11:59 AM To: solr-user@lucene.apache.org Subject: Which fields matched? If I have an arbitrarily complex query that uses ORs, something like: q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND (another_simple_fieldtype:bar OR another_complex_fieldtype:bar) I want to know which fields actually contributed to the match for each document returned. Something like: docID=1, fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype My basic use case is that I have several copyField'ed variations on the same data (using different complex FieldTypes), and I want to know which variations contributed to the document so I can conclude things like Well, this document matched the field with the SynonymFilterFactory, but not the one without, so this particular document must've been a synonym match. I know you could probably lift this from debugQuery output, but that's a non-starter due to parsing complexity and query performance impact. I think you could edge into some of this using the HighlightComponent output, but that's a non-starter because it requires fields be stored=true. Most of my fieldTypes are intended solely for indexing/search, and make no sense from a stored/retrieval standpoint. And to be clear, I really don't care about which terms matched anyway, only which fields. If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly looking for a head start on where to go looking for this data so I can add my own Component or something - assuming the data is even available in the solr layer? Thanks.
Re: Knowing which fields matched a search
Paul, I would think debugQuery would make it slower too, wouldn't it? Where is the thread you are referring to? Is there a lucene jira ticket for this? On Mar 11, 2012, at 9:38 AM, Paul Libbrecht wrote: Russel, there's been a thread on that in the lucene world... it's not really perfect yet. The suggestion to debugQuery gives only, to my experience, the explain monster which is good for developers (only). paul Le 11 mars 2012 à 08:40, William Bell a écrit : debugQuery tells you. On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote: When searching across multiple fields, is there a way to identify which field(s) resulted in a match without using highlighting or stored fields? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Knowing which fields matched a search
Russel, there's been a thread on that in the lucene world... it's not really perfect yet. The suggestion to debugQuery gives only, to my experience, the explain monster which is good for developers (only). paul Le 11 mars 2012 à 08:40, William Bell a écrit : debugQuery tells you. On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote: When searching across multiple fields, is there a way to identify which field(s) resulted in a match without using highlighting or stored fields? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Knowing which fields matched a search
debugQuery tells you. On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote: When searching across multiple fields, is there a way to identify which field(s) resulted in a match without using highlighting or stored fields? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Knowing which fields matched a search
When searching across multiple fields, is there a way to identify which field(s) resulted in a match without using highlighting or stored fields?