[ https://issues.apache.org/jira/browse/SOLR-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493770 ]
Hoss Man commented on SOLR-69: ------------------------------ looking back at the two main use cases Yonik described in his comment from 06/Feb/07... At the most basic level, A request for MLT results for a single doc by uniqueKey (case#1) is just a simplistic example of asking for MLT results for an arbitrary query (case#2) ... that arbitrary query just happens to be on a uniqueKey field, and only returns one result. Where things get more complicated is when you start returning other "tier 2" type information about the request -- which begs the question "what is tier 1 data"? If the MLT results are added as "tier 2" data to StandardRequestHandler response, then all of the other "tier 2" data blocks (highlighting, faceting, debugQuery score explanation, etc..) still refer to the main result from the original query ... this may be what you want in use case #2, but doesn't really make sense for use case #1, where the "tier 1" main result only contains the single document you asked for by id ... the score explanation and facet count numbers aren't very interesting in that case. for case #1, what you really want is for the MLT data to be treated as the primary ("tier 1") result set, and all of hte "tier 2" data is about those results ... highlighting is done on the MLT docs, facet counts are for the MLT docs, debugQuery score explanation tells you *why* the MLT docs are like your original docs, etc.. Case #1 and case #2 are both useful, to address Brian's 02/May/07 comment.. > I've personally never understood the "more documents > that don't match this query but are like the documents > in this query" ... I'm confused as to how querying by > query would work -- if a query for 'apache' returned 10 > docs, would MLT work on each one and generate n more > docs per doc? And would the original query results get > returned? What's the ordering? in your example, yes ... the users main search on "apache" would return 10 results sorted by whatever sort they specified. for each of those 10 results, N similar results might me listed to the side (in a smaller font, or as a pop up widget) sorted most likely by how similar they are. even if you don't want to surface those similar docs right there on the main result page, you still need to execute the MLT logic as part of hte initial request to know if there there are *any* similar docs (so you can surface the link/button for displaying them to the user. I would even argue there is actually a third use case ... -- Case 3) The GUI queries the standard request handler to display a list of documents, with a single subsequent list of similar "mlt" documents that have things in common with all of the docs in the current page of results displayed elsewhere on the page. -- ...where case #2 is about having separate MLT lists for each of hte matching reuslts, this case is about having a single "if you are interested in *all* of these items, you might also be interested in these other items" list. case#1 and case#3 can both easily be satisfied with a single "MoreLikeThisHandler" which takes as it's input a generic query (ie: "q=id:12345" for case#1, and "q=apache" for case#3) and then generates a single "tier 1" result block of MLT results that relate to all of the docs matching that query (simpel case of 1 doc for case#1) ... all other "tier 2" data would be in regards to this main MLT result set. case#2 would still easily be handled by having some new "tier 2" MLT data added to the StandardRequestHandler. > PATCH:MoreLikeThis support > -------------------------- > > Key: SOLR-69 > URL: https://issues.apache.org/jira/browse/SOLR-69 > Project: Solr > Issue Type: Improvement > Components: search > Reporter: Bertrand Delacretaz > Priority: Minor > Attachments: lucene-queries-2.0.0.jar, lucene-queries-2.1.1-dev.jar, > SOLR-69-MoreLikeThisRequestHandler.patch, SOLR-69.patch, SOLR-69.patch, > SOLR-69.patch, SOLR-69.patch > > > Here's a patch that implements simple support of Lucene's MoreLikeThis class. > The MoreLikeThisHelper code is heavily based on (hmm..."lifted from" might be > more appropriate ;-) Erik Hatcher's example mentioned in > http://www.mail-archive.com/[EMAIL PROTECTED]/msg00878.html > To use it, add at least the following parameters to a standard or dismax > query: > mlt=true > mlt.fl=list,of,fields,which,define,similarity > See the MoreLikeThisHelper source code for more parameters. > Here are two URLs that work with the example config, after loading all > documents found in exampledocs in the index (just to show that it seems to > work - of course you need a larger corpus to make it interesting): > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=standard&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > http://localhost:8983/solr/select/?stylesheet=&q=apache&qt=dismax&mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mindf=1&fl=id,score > Results are added to the output like this: > <response> > ... > <lst name="moreLikeThis"> > <result name="UTF8TEST" numFound="1" start="0" maxScore="1.5293242"> > <doc> > <float name="score">1.5293242</float> > <str name="id">SOLR1000</str> > </doc> > </result> > <result name="SOLR1000" numFound="1" start="0" maxScore="1.5293242"> > <doc> > <float name="score">1.5293242</float> > <str name="id">UTF8TEST</str> > </doc> > </result> > </lst> > I haven't tested this extensively yet, will do in the next few days. But > comments are welcome of course. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.