Re: More Like This on numeric fields - BF accepted by MLT handler

Alessandro Benedetti Mon, 28 Sep 2015 01:54:28 -0700

Hi Upaya,
thanks for the explanation, I actually already did some investigations
about it ( my first foundation was :
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and
then I took a look to the code.


Was just wondering what the community was thinking about
including/providing numerical similarity ( approaches, ideas, possible
existent solutions).
Customisation should be the last step, if anything already available.

Thanks for the support anyway !

Cheers

2015-09-25 12:47 GMT+01:00 Upayavira <u...@odoko.co.uk>:

> Alessandro,
>
> I'd suggest you review the code of the MoreLikeThisHandler. It is a
> little knotty, but it would be worth your while understanding what is
> going on there.
>
> Basically, there are three phases:
>
> phase #1: parse the source document into a list of terms (avoided if
> term vectors enabled and source doc is in index)
> phase #2: calculate a score for each of these terms and select the n
> highest scoring ones (default 25)
> phase #3: build and execute a boolean query using these 25 terms
>
> Phase #2 uses a TF/IDF like approach to calculate the scores for those
> "interesting terms".
>
> Once you understand what MLT is doing, you will probably not find it so
> hard to create your own version which is better suited to your own
> use-case.
>
> Of course, this would probably be better constructed as a QueryParser
> rather than a request handler, but that's a detail.
>
> Upayavira
>
> On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> > Hi guys,
> > was just investigating a little bit in how to include numeric fields in
> > the
> > MLT calculations.
> >
> > As we know, we are currently building a smart lucene query based on the
> > document in input ( the one to search for similar ones) and run this
> > query
> > to obtain the similar docs.
> > Because the MLT is currently built on TF/IDF , it is mainly thought for
> > textual fields.
> > What about we want to include a numeric factor  in the similarity
> > calculus ?
> >
> > e.g.
> > Solr Document ( Hotel)
> > mlt.fl=description,stars,trip_advisor_rating
> >
> > To find the similarity based not only on the description, but also on the
> > numeric fields ( stars and rating) .
> >
> > The first thought I had , is to add a support for boosting functions.
> > In this way we are more flexible and we can add how many functions we
> > want.
> >
> > For example adding :
> > bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> >
> > Also other kind of functions can be applied.
> > What do you think ? Do you have any alternative ideas ?
> >
> > Cheers
> > --
> > --------------------------
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: More Like This on numeric fields - BF accepted by MLT handler

Reply via email to