[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

Hoss Man (JIRA) Fri, 04 May 2007 11:38:37 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493770
 ]


Hoss Man commented on SOLR-69:
------------------------------

looking back at the two main use cases Yonik described in his comment from 
06/Feb/07...

 At the most basic level, A request for MLT results for a single doc by 
uniqueKey (case#1) is just a simplistic example of asking for MLT results for 
an arbitrary query (case#2) ... that arbitrary query just happens to be on a 
uniqueKey field, and only returns one result.

Where things get more complicated is when you start returning other "tier 2" 
type information about the request -- which begs the question "what is tier 1 
data"?   If the MLT results are added as "tier 2" data to 
StandardRequestHandler response, then all of the other "tier 2" data blocks 
(highlighting, faceting, debugQuery score explanation, etc..) still refer to 
the main result from the original query ... this may be what you want in use 
case #2, but doesn't really make sense for use case #1, where the "tier 1" main 
result only contains the single document you asked for by id ... the score 
explanation and facet count numbers aren't very interesting in that case.

for case #1, what you really want is for the MLT data to be treated as the 
primary ("tier 1") result set, and all of hte "tier 2" data is about those 
results ... highlighting is done on the MLT docs, facet counts are for the MLT 
docs, debugQuery score explanation tells you *why* the MLT docs are like your 
original docs, etc..

Case #1 and case #2 are both useful, to address Brian's 02/May/07 comment..

 > I've personally never understood the "more documents 
 > that don't match this query but are like the documents 
 > in this query" ... I'm confused as to how querying by 
 > query would work -- if a query for 'apache' returned 10 
 > docs, would MLT work on each one and generate n more 
 > docs per doc? And would the original query results get 
 > returned? What's the ordering? 

in your example, yes ... the users main search on "apache" would return 10 
results sorted by whatever sort they specified.  for each of those 10 results, 
N similar results might me listed to the side (in a smaller font, or as a pop 
up widget) sorted most likely by how similar they are.  even if you don't want 
to surface those similar docs right there on the main result page, you still 
need to execute the MLT logic as part of hte initial request to know if there 
there are *any* similar docs (so you can surface the link/button for displaying 
them to the user.

I would even argue there is actually a third use case ... 

--
Case 3)
  The GUI queries the standard request handler to display a list of documents, 
with a single subsequent list of similar "mlt" documents that have things in 
common with all of the docs in the current page of results displayed elsewhere 
on the page.
--

...where case #2 is about having separate MLT lists for each of hte matching 
reuslts, this case is about having a single "if you are interested in *all* of 
these items, you might also be interested in these other items" list.

case#1 and case#3 can both easily be satisfied with a single 
"MoreLikeThisHandler" which takes as it's input a generic query (ie: 
"q=id:12345" for case#1, and "q=apache" for case#3) and then generates a single 
"tier 1" result block of MLT results that relate to all of the docs matching 
that query (simpel case of 1 doc for case#1) ... all other "tier 2" data would 
be in regards to this main MLT result set.

case#2 would still easily be handled by having some new "tier 2" MLT data added 
to the StandardRequestHandler.



> PATCH:MoreLikeThis support
> --------------------------
>
>                 Key: SOLR-69
>                 URL: https://issues.apache.org/jira/browse/SOLR-69
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>            Reporter: Bertrand Delacretaz
>            Priority: Minor
>         Attachments: lucene-queries-2.0.0.jar, lucene-queries-2.1.1-dev.jar, 
> SOLR-69-MoreLikeThisRequestHandler.patch, SOLR-69.patch, SOLR-69.patch, 
> SOLR-69.patch, SOLR-69.patch
>
>
> Here's a patch that implements simple support of Lucene's MoreLikeThis class.
> The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be 
> more appropriate ;-) Erik Hatcher's example mentioned in 
> http://www.mail-archive.com/[EMAIL PROTECTED]/msg00878.html
> To use it, add at least the following parameters to a standard or dismax 
> query:
>   mlt=true
>   mlt.fl=list,of,fields,which,define,similarity
> See the MoreLikeThisHelper source code for more parameters.
> Here are two URLs that work with the example config, after loading all 
> documents found in exampledocs in the index (just to show that it seems to 
> work - of course you need a larger corpus to make it interesting):
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score
> Results are added to the output like this:
> <response>
>   ...
>   <lst name="moreLikeThis">
>     <result name="UTF8TEST" numFound="1" start="0" maxScore="1.5293242">
>       <doc>
>         <float name="score">1.5293242</float>
>         <str name="id">SOLR1000</str>
>       </doc>
>     </result>
>     <result name="SOLR1000" numFound="1" start="0" maxScore="1.5293242">
>       <doc>
>         <float name="score">1.5293242</float>
>         <str name="id">UTF8TEST</str>
>       </doc>
>     </result>
>   </lst>
> I haven't tested this extensively yet, will do in the next few days. But 
> comments are welcome of course.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-69) PATCH:MoreLikeThis support

Reply via email to