RE: Which fields matched?

2012-12-11 Thread Jeff Wartes
Thanks, this is good stuff, I hadn't seen LUCENE-1999, and it even has a 
reference to some methods now in core.

I think I'm still stuck for the moment though. I'm pretty fixed on Solr 3.5 for 
the next few development cycles, and I've been trying really hard to avoid 
compiling my own Solr - as it appears most of these approaches would require. 
(I'm not above inserting my own jars into the Stock Solr WAR, but I have to 
draw the line someplace.)

I'm going to spend some time with the highlight component and see if I can get 
something working with that. The basic String-version of my fields (which I 
copyField into many indexed-only field types) is stored, so I'm thinking I 
might be able to use hl.q to at least answer a basic did this document match 
the original query unaltered question.



-Original Message-
From: Paul Libbrecht [mailto:p...@hoplahup.net] 
Sent: Saturday, December 08, 2012 7:36 PM
To: solr-user@lucene.apache.org
Subject: Re: Which fields matched?

We've used lucene-1999 with some success in ActiveMath to find the language 
that was matched.

paul


Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit :

 Jeff,
 explain() algorithm is definitely too slow to be used at search time. 
 There is an approach which I'm aware of - watch for scorers during the 
 search time. If scorer matches some doc _at some moment_ 
 scorer.docID()==docNum.
 My team successfully implemented such Match Spotting algorithm, it 
 performs quite well, and provides info like http://goo.gl/7vgrB The 
 problem with this algorithm is that it's tightly coupled with low 
 level scorers behavior, and they intended to behave contra-intuitively 
 sometimes, and changes that behavior due to performance optimizations in 
 lucene core.
 https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the 
 same, but I never looked into the source.
 
 
 On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote:
 
 Thanks, I did start to dig into how DebugComponent does its thing a 
 little, and I'm not all the way down the rabbit hole yet, but the 
 lucene indexSearcher's explain() method has this comment:
 
 This is intended to be used in developing Similarity 
 implementations, and, for good performance, should not be displayed with 
 every hit.
 Computing an explanation is as expensive as executing the query over 
 the entire index.
 
 Which makes me wonder if I'd get almost all of the debugQuery=true 
 performance penalty anyway if I try to do as you suggest.
 
 
 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Friday, December 07, 2012 10:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Which fields matched?
 
 The debugQuery explain is simply a text display of what Lucene has 
 already calculated. As such, you could do a custom search component 
 that gets the non-text Lucene Explanation object for the query and 
 then traverse it to get your matched field list without all the text. 
 No parsed would be required, but the Explanation structure could get messy.
 
 -- Jack Krupansky
 
 -Original Message-
 From: Jeff Wartes
 Sent: Friday, December 07, 2012 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Which fields matched?
 
 
 If I have an arbitrarily complex query that uses ORs, something like:
 q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND 
 (another_simple_fieldtype:bar OR another_complex_fieldtype:bar)
 
 I want to know which fields actually contributed to the match for 
 each document returned. Something like:
 docID=1,
 fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fie
 ldtype docID=2, 
 fields_matched=simple_fieldtype,another_complex_fieldtype
 
 
 My basic use case is that I have several copyField'ed variations on 
 the same data (using different complex FieldTypes), and I want to 
 know which variations contributed to the document so I can conclude 
 things like Well, this document matched the field with the 
 SynonymFilterFactory, but not the one without, so this particular 
 document must've been a synonym match.
 
 I know you could probably lift this from debugQuery output, but 
 that's a non-starter due to parsing complexity and query performance impact.
 I think you could edge into some of this using the HighlightComponent 
 output, but that's a non-starter because it requires fields be stored=true.
 Most of my fieldTypes are intended solely for indexing/search, and 
 make no sense from a stored/retrieval standpoint. And to be clear, I 
 really don't care about which terms matched anyway, only which fields.
 
 If there's an easy way to get this, I'd love to hear it. Otherwise, 
 I'm mostly looking for a head start on where to go looking for this 
 data so I can add my own Component or something - assuming the data 
 is even available in the solr layer?
 
 Thanks.
 
 
 
 
 --
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Re: Which fields matched?

2012-12-08 Thread Mikhail Khludnev
Jeff,
explain() algorithm is definitely too slow to be used at search time. There
is an approach which I'm aware of - watch for scorers during the search
time. If scorer matches some doc _at some moment_ scorer.docID()==docNum.
My team successfully implemented such Match Spotting algorithm, it performs
quite well, and provides info like http://goo.gl/7vgrB
The problem with this algorithm is that it's tightly coupled with low level
scorers behavior, and they intended to behave contra-intuitively sometimes,
and changes that behavior due to performance optimizations in lucene core.
https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same,
but I never looked into the source.


On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote:

 Thanks, I did start to dig into how DebugComponent does its thing a
 little, and I'm not all the way down the rabbit hole yet, but the lucene
 indexSearcher's explain() method has this comment:

 This is intended to be used in developing Similarity implementations,
 and, for good performance, should not be displayed with every hit.
 Computing an explanation is as expensive as executing the query over the
 entire index.

 Which makes me wonder if I'd get almost all of the debugQuery=true
 performance penalty anyway if I try to do as you suggest.


 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Friday, December 07, 2012 10:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Which fields matched?

 The debugQuery explain is simply a text display of what Lucene has
 already calculated. As such, you could do a custom search component that
 gets the non-text Lucene Explanation object for the query and then
 traverse it to get your matched field list without all the text. No parsed
 would be required, but the Explanation structure could get messy.

 -- Jack Krupansky

 -Original Message-
 From: Jeff Wartes
 Sent: Friday, December 07, 2012 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Which fields matched?


 If I have an arbitrarily complex query that uses ORs, something like:
 q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND
 (another_simple_fieldtype:bar OR another_complex_fieldtype:bar)

 I want to know which fields actually contributed to the match for each
 document returned. Something like:
 docID=1,
 fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype
 docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype


 My basic use case is that I have several copyField'ed variations on the
 same
 data (using different complex FieldTypes), and I want to know which
 variations contributed to the document so I can conclude things like Well,
 this document matched the field with the SynonymFilterFactory, but not the
 one without, so this particular document must've been a synonym match.

 I know you could probably lift this from debugQuery output, but that's a
 non-starter due to parsing complexity and query performance impact.
 I think you could edge into some of this using the HighlightComponent
 output, but that's a non-starter because it requires fields be stored=true.
 Most of my fieldTypes are intended solely for indexing/search, and make no
 sense from a stored/retrieval standpoint. And to be clear, I really don't
 care about which terms matched anyway, only which fields.

 If there's an easy way to get this, I'd love to hear it. Otherwise, I'm
 mostly looking for a head start on where to go looking for this data so I
 can add my own Component or something - assuming the data is even available
 in the solr layer?

 Thanks.




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


Re: Which fields matched?

2012-12-08 Thread Paul Libbrecht
We've used lucene-1999 with some success in ActiveMath to find the language 
that was matched.

paul


Le 8 déc. 2012 à 10:09, Mikhail Khludnev a écrit :

 Jeff,
 explain() algorithm is definitely too slow to be used at search time. There
 is an approach which I'm aware of - watch for scorers during the search
 time. If scorer matches some doc _at some moment_ scorer.docID()==docNum.
 My team successfully implemented such Match Spotting algorithm, it performs
 quite well, and provides info like http://goo.gl/7vgrB
 The problem with this algorithm is that it's tightly coupled with low level
 scorers behavior, and they intended to behave contra-intuitively sometimes,
 and changes that behavior due to performance optimizations in lucene core.
 https://issues.apache.org/jira/browse/LUCENE-1999 sounds almost the same,
 but I never looked into the source.
 
 
 On Fri, Dec 7, 2012 at 11:00 PM, Jeff Wartes jwar...@whitepages.com wrote:
 
 Thanks, I did start to dig into how DebugComponent does its thing a
 little, and I'm not all the way down the rabbit hole yet, but the lucene
 indexSearcher's explain() method has this comment:
 
 This is intended to be used in developing Similarity implementations,
 and, for good performance, should not be displayed with every hit.
 Computing an explanation is as expensive as executing the query over the
 entire index.
 
 Which makes me wonder if I'd get almost all of the debugQuery=true
 performance penalty anyway if I try to do as you suggest.
 
 
 -Original Message-
 From: Jack Krupansky [mailto:j...@basetechnology.com]
 Sent: Friday, December 07, 2012 10:47 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Which fields matched?
 
 The debugQuery explain is simply a text display of what Lucene has
 already calculated. As such, you could do a custom search component that
 gets the non-text Lucene Explanation object for the query and then
 traverse it to get your matched field list without all the text. No parsed
 would be required, but the Explanation structure could get messy.
 
 -- Jack Krupansky
 
 -Original Message-
 From: Jeff Wartes
 Sent: Friday, December 07, 2012 11:59 AM
 To: solr-user@lucene.apache.org
 Subject: Which fields matched?
 
 
 If I have an arbitrarily complex query that uses ORs, something like:
 q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND
 (another_simple_fieldtype:bar OR another_complex_fieldtype:bar)
 
 I want to know which fields actually contributed to the match for each
 document returned. Something like:
 docID=1,
 fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype
 docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype
 
 
 My basic use case is that I have several copyField'ed variations on the
 same
 data (using different complex FieldTypes), and I want to know which
 variations contributed to the document so I can conclude things like Well,
 this document matched the field with the SynonymFilterFactory, but not the
 one without, so this particular document must've been a synonym match.
 
 I know you could probably lift this from debugQuery output, but that's a
 non-starter due to parsing complexity and query performance impact.
 I think you could edge into some of this using the HighlightComponent
 output, but that's a non-starter because it requires fields be stored=true.
 Most of my fieldTypes are intended solely for indexing/search, and make no
 sense from a stored/retrieval standpoint. And to be clear, I really don't
 care about which terms matched anyway, only which fields.
 
 If there's an easy way to get this, I'd love to hear it. Otherwise, I'm
 mostly looking for a head start on where to go looking for this data so I
 can add my own Component or something - assuming the data is even available
 in the solr layer?
 
 Thanks.
 
 
 
 
 -- 
 Sincerely yours
 Mikhail Khludnev
 Principal Engineer,
 Grid Dynamics
 
 http://www.griddynamics.com
 mkhlud...@griddynamics.com



Which fields matched?

2012-12-07 Thread Jeff Wartes

If I have an arbitrarily complex query that uses ORs, something like:
q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND 
(another_simple_fieldtype:bar OR another_complex_fieldtype:bar)

I want to know which fields actually contributed to the match for each document 
returned. Something like:
docID=1, 
fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype
docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype


My basic use case is that I have several copyField'ed variations on the same 
data (using different complex FieldTypes), and I want to know which variations 
contributed to the document so I can conclude things like Well, this document 
matched the field with the SynonymFilterFactory, but not the one without, so 
this particular document must've been a synonym match.

I know you could probably lift this from debugQuery output, but that's a 
non-starter due to parsing complexity and query performance impact.
I think you could edge into some of this using the HighlightComponent output, 
but that's a non-starter because it requires fields be stored=true. Most of my 
fieldTypes are intended solely for indexing/search, and make no sense from a 
stored/retrieval standpoint. And to be clear, I really don't care about which 
terms matched anyway, only which fields.

If there's an easy way to get this, I'd love to hear it. Otherwise, I'm mostly 
looking for a head start on where to go looking for this data so I can add my 
own Component or something - assuming the data is even available in the solr 
layer?

Thanks.


Re: Which fields matched?

2012-12-07 Thread Jack Krupansky
The debugQuery explain is simply a text display of what Lucene has already 
calculated. As such, you could do a custom search component that gets the 
non-text Lucene Explanation object for the query and then traverse it to 
get your matched field list without all the text. No parsed would be 
required, but the Explanation structure could get messy.


-- Jack Krupansky

-Original Message- 
From: Jeff Wartes

Sent: Friday, December 07, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Which fields matched?


If I have an arbitrarily complex query that uses ORs, something like:
q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND 
(another_simple_fieldtype:bar OR another_complex_fieldtype:bar)


I want to know which fields actually contributed to the match for each 
document returned. Something like:
docID=1, 
fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype

docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype


My basic use case is that I have several copyField'ed variations on the same 
data (using different complex FieldTypes), and I want to know which 
variations contributed to the document so I can conclude things like Well, 
this document matched the field with the SynonymFilterFactory, but not the 
one without, so this particular document must've been a synonym match.


I know you could probably lift this from debugQuery output, but that's a 
non-starter due to parsing complexity and query performance impact.
I think you could edge into some of this using the HighlightComponent 
output, but that's a non-starter because it requires fields be stored=true. 
Most of my fieldTypes are intended solely for indexing/search, and make no 
sense from a stored/retrieval standpoint. And to be clear, I really don't 
care about which terms matched anyway, only which fields.


If there's an easy way to get this, I'd love to hear it. Otherwise, I'm 
mostly looking for a head start on where to go looking for this data so I 
can add my own Component or something - assuming the data is even available 
in the solr layer?


Thanks. 



RE: Which fields matched?

2012-12-07 Thread Jeff Wartes
Thanks, I did start to dig into how DebugComponent does its thing a little, and 
I'm not all the way down the rabbit hole yet, but the lucene indexSearcher's 
explain() method has this comment:

This is intended to be used in developing Similarity implementations, and, for 
good performance, should not be displayed with every hit. Computing an 
explanation is as expensive as executing the query over the entire index.

Which makes me wonder if I'd get almost all of the debugQuery=true performance 
penalty anyway if I try to do as you suggest.


-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com] 
Sent: Friday, December 07, 2012 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Which fields matched?

The debugQuery explain is simply a text display of what Lucene has already 
calculated. As such, you could do a custom search component that gets the 
non-text Lucene Explanation object for the query and then traverse it to get 
your matched field list without all the text. No parsed would be required, but 
the Explanation structure could get messy.

-- Jack Krupansky

-Original Message-
From: Jeff Wartes
Sent: Friday, December 07, 2012 11:59 AM
To: solr-user@lucene.apache.org
Subject: Which fields matched?


If I have an arbitrarily complex query that uses ORs, something like:
q=(simple_fieldtype:foo OR complex_fieldtype:foo) AND 
(another_simple_fieldtype:bar OR another_complex_fieldtype:bar)

I want to know which fields actually contributed to the match for each 
document returned. Something like:
docID=1, 
fields_matched=simple_fieldtype,complex_fieldtype,another_complex_fieldtype
docID=2, fields_matched=simple_fieldtype,another_complex_fieldtype


My basic use case is that I have several copyField'ed variations on the same 
data (using different complex FieldTypes), and I want to know which 
variations contributed to the document so I can conclude things like Well, 
this document matched the field with the SynonymFilterFactory, but not the 
one without, so this particular document must've been a synonym match.

I know you could probably lift this from debugQuery output, but that's a 
non-starter due to parsing complexity and query performance impact.
I think you could edge into some of this using the HighlightComponent 
output, but that's a non-starter because it requires fields be stored=true. 
Most of my fieldTypes are intended solely for indexing/search, and make no 
sense from a stored/retrieval standpoint. And to be clear, I really don't 
care about which terms matched anyway, only which fields.

If there's an easy way to get this, I'd love to hear it. Otherwise, I'm 
mostly looking for a head start on where to go looking for this data so I 
can add my own Component or something - assuming the data is even available 
in the solr layer?

Thanks. 



Re: Knowing which fields matched a search

2012-03-12 Thread Russell Black
Paul,

I would think debugQuery would make it slower too, wouldn't it?  Where is the 
thread you are referring to?  Is there a lucene jira ticket for this?

On Mar 11, 2012, at 9:38 AM, Paul Libbrecht wrote:

 Russel,
 
 there's been a thread on that in the lucene world... it's not really perfect 
 yet.
 The suggestion to debugQuery gives only, to my experience, the explain 
 monster which is good for developers (only).
 
 paul
 
 
 Le 11 mars 2012 à 08:40, William Bell a écrit :
 
 debugQuery tells you.
 
 On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote:
 When searching across multiple fields, is there a way to identify which 
 field(s) resulted in a match without using highlighting or stored fields?
 
 
 
 -- 
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076
 



Re: Knowing which fields matched a search

2012-03-11 Thread Paul Libbrecht
Russel,

there's been a thread on that in the lucene world... it's not really perfect 
yet.
The suggestion to debugQuery gives only, to my experience, the explain monster 
which is good for developers (only).

paul


Le 11 mars 2012 à 08:40, William Bell a écrit :

 debugQuery tells you.
 
 On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote:
 When searching across multiple fields, is there a way to identify which 
 field(s) resulted in a match without using highlighting or stored fields?
 
 
 
 -- 
 Bill Bell
 billnb...@gmail.com
 cell 720-256-8076



Re: Knowing which fields matched a search

2012-03-10 Thread William Bell
debugQuery tells you.

On Fri, Mar 9, 2012 at 1:05 PM, Russell Black rbl...@fold3.com wrote:
 When searching across multiple fields, is there a way to identify which 
 field(s) resulted in a match without using highlighting or stored fields?



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Knowing which fields matched a search

2012-03-09 Thread Russell Black
When searching across multiple fields, is there a way to identify which 
field(s) resulted in a match without using highlighting or stored fields?