Re: Need more info on MLT (More Like This) feature

Chee Yee Lim Fri, 13 Sep 2019 23:55:12 -0700

By default, MLT uses the top 25 terms from the target document to do
similarity searches. A quick look at the source code (
https://github.com/apache/lucene-solr/blob/master/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java
) and Lucene documentation (
https://lucene.apache.org/core/8_1_0/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html
) suggests that MLT's similarity score is defined as a simple TF x IDF for
the top 25 terms. (Others who know more about MLT, please correct me if I
am wrong.)


An easy way to improve your results is to tune the mindf, maxdf, minwl and
maxwl parameters for knnSearch (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch
).

Best wishes,
Chee Yee

On Sat, 14 Sep 2019 at 04:09, Dave <hastings.recurs...@gmail.com> wrote:

> As a side note, if you use shingles with the mlt handler I believe you
> will get better scores/relevant results. So “to be free” becomes indexes as
> “to_be” “to_be_free” and “be_free” but also as each word. It makes the
> index significantly larger but creates better “unique terms” in my opinion
> and improved the results for me at least.
>
> On Sep 13, 2019, at 2:51 PM, Srisatya Pyla <srisp...@in.ibm.com> wrote:
>
> Thank you very much for quick response. This is very much helpful to us.
> While analyzing the results for some jobs, it is returning high score for
> a document which is not much relevant to the base document.
> Is there any way we can improve the results and scoring?
> How it exactly give the score for matching document based on a matching
> field?  This is helpful to know why it is giving highest matching score for
> the specific documents.
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* <hemanth.kadamb...@in.ibm.com>
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
> From:        Chee Yee Lim <cheeyee....@gmail.com>
> To:        Srisatya Pyla <srisp...@in.ibm.com>
> Cc:        solr-user@lucene.apache.org, Rajeev Kasarabada1 <
> kasar...@in.ibm.com>, Archana Gavini1 <agavi...@in.ibm.com>
> Date:        13/09/2019 04:32 PM
> Subject:        [EXTERNAL] Re: Need more info on MLT (More Like This)
> feature
> ------------------------------
>
>
>
> To use knnSearch, you need to submit a POST request to the Stream request
> handler.
>
> Using your example query, you will need to rewrite them from this :
>
> *http://[SOLR*
> URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
> to this (using curl as an example to send POST request) :
>
> curl --data-urlencode 'expr=knnSearch([collection_name],
> id="1414462-25600-5258",
> qf="jobdescription",
> k=100,
> fl="jobtitle,jobdescription,score",
> sort="score desc",
> fq="siteid:5258",
> mintf=1,
> mindf=1)' http://[SOLRURL]/stream
>
> Note that this assume your document ID is sjkey.
>
> More detailed documentation on how Stream handler works can be seen here,
> *https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html*
> <https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html>.
>
> Best wishes,
> Chee Yee
>
> On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla <*srisp...@in.ibm.com*
> <srisp...@in.ibm.com>> wrote:
> Hi Chee Yee Lim,
>
>
> Thank you for your quick response.
> We do not find much documentation on knnsearch on how to do use that.
> Could you please guide us with more info on how this can be used?
>
> Can we use this the way we use Solr by querying with Solr URL like
> http://[SOLR URL]/mlt.... ?  OR any other way?
> And also please provide with any more detailed documentation if you have
> any.
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* <hemanth.kadamb...@in.ibm.com>
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>
>
>
> ----- Original message -----
> From: Chee Yee Lim <*cheeyee....@gmail.com* <cheeyee....@gmail.com>>
> To: *solr-user@lucene.apache.org* <solr-user@lucene.apache.org>
> Cc: Archana Gavini1 <*agavi...@in.ibm.com* <agavi...@in.ibm.com>>, Rajeev
> Kasarabada1 <*kasar...@in.ibm.com* <kasar...@in.ibm.com>>
> Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
> Date: Thu, Sep 12, 2019 6:43 PM
>
> I've been working with MLT handler (Solr 8.1.1) by calling it the same way
> you did, *http://[SOLR*URL]/mlt. But the response is very unreliable with
> 90% of the same queries resulting in Java null pointer exception, and only
> 10% returning expected response. I do not know what is the cause of this.
>
> I overcame this problem by using knnSearch via Stream handler (
> *https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch*
> <https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch>).
> It is just a wrapper on MLT, and it works brilliantly. It is worth checking
> it out if you are running Solr in cloud mode.
>
> If you pass the fl="score"&sort="score desc" to knnSearch, you will be
> able to get the results sorted by matching scores.
>
> Best wishes,
> Chee Yee
>
> On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <*srisp...@in.ibm.com*
> <srisp...@in.ibm.com>> wrote:
> Hi Solr Seatch Team,
>
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search
> engine for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents
> (jobs) based on one document (job).
>
> When trying to explore this option, we are using matching field as
> JobDescription of the job and we are getting some unrelated documents in
> the MLT results which are not expected.
>
> The query like below:
>
> *http://[SOLR*
> URL]/mlt?q=sjkey:1414462-25600-5258&wt=json&indent=true&mlt=true&rows=100&mlt.fl=jobdescription&mlt.mindf=1&mlt.mintf=1&fl=jobtitle,jobdescription&fq=siteid:5258
>
>
> *We have few questions*:
> 1) Is there any way we can get the matching score for each of the matching
> document we get in the MLT results, so that we can get the sorting done on
> the score to have the highest matching document at the top of the result.
>
> 2) Are there any best practices using MLT Handler?
>
>
> Regards,
> ------------------------------
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* <hemanth.kadamb...@in.ibm.com>
>
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>

Re: Need more info on MLT (More Like This) feature

Reply via email to