Re: Lucene Explanation

2021-04-23 Thread Puneeth Bikkumanla
Thank you this was very helpful!

On Mon, Apr 12, 2021 at 9:07 AM Michael Sokolov  wrote:

> You might want to check out
> https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to
> implement some debugging utilities on top of Explain. It never got
> committed, but it does explore some of the challenges around
> introducing a more structured explain response.
>
> On Fri, Apr 9, 2021 at 6:40 PM Puneeth Bikkumanla
>  wrote:
> >
> > Hello,
> > I am currently working on a project that would like to implement Document
> > Explain where we can see how a document was scored internally in lucene
> > given a query.
> >
> > I see that the IndexSearcher has an explain
> > <
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/IndexSearcher.html#explain-org.apache.lucene.search.Query-int-
> >
> > method
> > available that returns an Explanation
> > <
> https://lucene.apache.org/core/8_0_0/core/org/apache/lucene/search/Explanation.html
> >
> > object. An Explanation object only contains a description field (string)
> > but there is no way to know what part of a score that Explanation object
> is
> > for without parsing the description field itself. We wanted to implement
> > Document Explain in a more safe way where we could know what part of the
> > score an Explanation object is associated with and not parse the
> > description string field to find out. Here are a few of the options I
> have
> > thought of:
> >
> > 1. I was thinking about extending the similarity class (BM25Similarity)
> and
> > then overriding the particular methods that dealt with the different
> > subcomponents of explain but saw that the explainTF
> > <
> https://github.com/apache/lucene/blob/e510ef11c2a4307dd6ecc8c8974eef2c04e3e4d6/lucene/core/src/java/org/apache/lucene/search/similarities/BM25Similarity.java#L268
> >
> > method
> > is private. Is there a reason why this is? It would be very useful if it
> > could be public so that I can override it and store the knowledge that
> the
> > returned Explanation is for the TF component of the document score.
> >
> > 2. I also thought about extending the IndexSearcher and overriding the
> > createWeight method to store the weight structure and then use that to
> > understand the resulting Explanation structure from the IndexSearcher's
> > explain method.
> >
> > Please let me know if any of that didn't make sense. Also, if anyone has
> > any other ideas on how I could approach this problem suggestions would be
> > greatly appreciated. Lastly, I would be happy to submit a PR to modify
> > Lucene's Explanation to be more aware of where it is created.
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Lucene Explanation

2021-04-12 Thread Michael Sokolov
You might want to check out
https://issues.apache.org/jira/browse/LUCENE-8019 where I tried to
implement some debugging utilities on top of Explain. It never got
committed, but it does explore some of the challenges around
introducing a more structured explain response.

On Fri, Apr 9, 2021 at 6:40 PM Puneeth Bikkumanla
 wrote:
>
> Hello,
> I am currently working on a project that would like to implement Document
> Explain where we can see how a document was scored internally in lucene
> given a query.
>
> I see that the IndexSearcher has an explain
> 
> method
> available that returns an Explanation
> 
> object. An Explanation object only contains a description field (string)
> but there is no way to know what part of a score that Explanation object is
> for without parsing the description field itself. We wanted to implement
> Document Explain in a more safe way where we could know what part of the
> score an Explanation object is associated with and not parse the
> description string field to find out. Here are a few of the options I have
> thought of:
>
> 1. I was thinking about extending the similarity class (BM25Similarity) and
> then overriding the particular methods that dealt with the different
> subcomponents of explain but saw that the explainTF
> 
> method
> is private. Is there a reason why this is? It would be very useful if it
> could be public so that I can override it and store the knowledge that the
> returned Explanation is for the TF component of the document score.
>
> 2. I also thought about extending the IndexSearcher and overriding the
> createWeight method to store the weight structure and then use that to
> understand the resulting Explanation structure from the IndexSearcher's
> explain method.
>
> Please let me know if any of that didn't make sense. Also, if anyone has
> any other ideas on how I could approach this problem suggestions would be
> greatly appreciated. Lastly, I would be happy to submit a PR to modify
> Lucene's Explanation to be more aware of where it is created.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Lucene Explanation

2021-04-09 Thread Puneeth Bikkumanla
Hello,
I am currently working on a project that would like to implement Document
Explain where we can see how a document was scored internally in lucene
given a query.

I see that the IndexSearcher has an explain

method
available that returns an Explanation

object. An Explanation object only contains a description field (string)
but there is no way to know what part of a score that Explanation object is
for without parsing the description field itself. We wanted to implement
Document Explain in a more safe way where we could know what part of the
score an Explanation object is associated with and not parse the
description string field to find out. Here are a few of the options I have
thought of:

1. I was thinking about extending the similarity class (BM25Similarity) and
then overriding the particular methods that dealt with the different
subcomponents of explain but saw that the explainTF

method
is private. Is there a reason why this is? It would be very useful if it
could be public so that I can override it and store the knowledge that the
returned Explanation is for the TF component of the document score.

2. I also thought about extending the IndexSearcher and overriding the
createWeight method to store the weight structure and then use that to
understand the resulting Explanation structure from the IndexSearcher's
explain method.

Please let me know if any of that didn't make sense. Also, if anyone has
any other ideas on how I could approach this problem suggestions would be
greatly appreciated. Lastly, I would be happy to submit a PR to modify
Lucene's Explanation to be more aware of where it is created.


Re: lucene explanation

2008-12-23 Thread Chris Salem
That worked perfectly.
Thanks alot!
Sincerely,
Chris Salem 


- Original Message - 
To: java-user@lucene.apache.org
From: Erick Erickson erickerick...@gmail.com
Sent: 12/22/2008 5:00:51 PM
Subject: Re: lucene explanation


Warning! I'm really reaching on this


But it seems you could use TermDocs/TermEnum to
good effect here. Basically, you should be able, for a
given term, use the above to determine whether
doc N had a hit in one of your fields pretty efficiently.
There's even a WildcardTermEnum that will iterate
over wildcards.

Filters are surprisingly fast to construct, so you could
use the above to construct a filter on each term for
each field. Then determining whether the doc is a hit
for a particular field is just a matter of seeing if
that bit is on in the relevant filter.

Either one should be wy under 30 seconds,
although I don't know how big your index is
or how encompassing your wildcard searches
are...

FWIW
Erick

On Mon, Dec 22, 2008 at 4:48 PM, Chris Salem ch...@mainsequence.net wrote:

 Hello,
 I'm wondering what the best way to accomplish this is.
 When a user enters text to search on it customarily searches 3 fields,
 resume_text, profile_text, and summary_text, so a standard query would be
 something like:
 (resume_text:(query) OR profile_text:(query) OR summary_text:(query))
 For each hit (up to 50) I'd like to find out which part of the query
 matched with the document. Right now I use the Explanation object, here's
 the code:
 int len = hits.length();
 if(len  50) len = 50;
 for(int i=0; ilen; i++){
 Explanation ex = searcher.explain(Query.parse(resume_text:(query)),
 hits.id(i));
 if(ex.isMatch()) ...
 ex = searcher.explain(Query.parse(profile_text:(query)), hits.id(i));
 if(ex.isMatch()) ...
 ex = searcher.explain(Query.parse(summary_text:(query)), hits.id(i));
 if(ex.isMatch()) ...
 }
 This works fine with regular queries, but if someone does a query with a
 wildcard search times increase to more than 30 seconds. Is there a better
 way to do this?
 Thanks
 Sincerely,
 Chris Salem



lucene explanation

2008-12-22 Thread Chris Salem
Hello,
I'm wondering what the best way to accomplish this is.
When a user enters text to search on it customarily searches 3 fields, 
resume_text, profile_text, and summary_text, so a standard query would be 
something like:
(resume_text:(query) OR profile_text:(query) OR summary_text:(query))
For each hit (up to 50) I'd like to find out which part of the query matched 
with the document.  Right now I use the Explanation object, here's the code:
int len = hits.length();
if(len  50) len = 50;
for(int i=0; ilen; i++){
Explanation ex = searcher.explain(Query.parse(resume_text:(query)), 
hits.id(i));
if(ex.isMatch()) ...
ex = searcher.explain(Query.parse(profile_text:(query)), hits.id(i));
if(ex.isMatch()) ...
ex = searcher.explain(Query.parse(summary_text:(query)), hits.id(i));
if(ex.isMatch()) ...
}
This works fine with regular queries, but if someone does a query with a 
wildcard search times increase to more than 30 seconds.  Is there a better way 
to do this?
Thanks
Sincerely,
Chris Salem 


Re: lucene explanation

2008-12-22 Thread Erick Erickson
Warning! I'm really reaching on this


But it seems you could use TermDocs/TermEnum to
good effect here. Basically, you should be able, for a
given term, use the above to determine whether
doc N had a hit in one of your fields pretty efficiently.
There's even a WildcardTermEnum that will iterate
over wildcards.

Filters are surprisingly fast to construct, so you could
use the above to construct a filter on each term for
each field. Then determining whether the doc is a hit
for a particular field is just a matter of seeing if
that bit is on in the relevant filter.

Either one should be wy under 30 seconds,
although I don't know how big your index is
or how encompassing your wildcard searches
are...

FWIW
Erick

On Mon, Dec 22, 2008 at 4:48 PM, Chris Salem ch...@mainsequence.net wrote:

 Hello,
 I'm wondering what the best way to accomplish this is.
 When a user enters text to search on it customarily searches 3 fields,
 resume_text, profile_text, and summary_text, so a standard query would be
 something like:
 (resume_text:(query) OR profile_text:(query) OR summary_text:(query))
 For each hit (up to 50) I'd like to find out which part of the query
 matched with the document.  Right now I use the Explanation object, here's
 the code:
 int len = hits.length();
 if(len  50) len = 50;
 for(int i=0; ilen; i++){
 Explanation ex = searcher.explain(Query.parse(resume_text:(query)),
 hits.id(i));
 if(ex.isMatch()) ...
 ex = searcher.explain(Query.parse(profile_text:(query)), hits.id(i));
 if(ex.isMatch()) ...
 ex = searcher.explain(Query.parse(summary_text:(query)), hits.id(i));
 if(ex.isMatch()) ...
 }
 This works fine with regular queries, but if someone does a query with a
 wildcard search times increase to more than 30 seconds.  Is there a better
 way to do this?
 Thanks
 Sincerely,
 Chris Salem