Relevancy and non-matching words

2010-07-06 Thread dbashford

Is there some sort of threshold that I can tweak which sets how many letters
in non-matching words makes a result more or less relevant?

Searching on title, q=fantasy football, and I get this:

{title:The Fantasy Football Guys,
score:2.8387074},
{title:Fantasy Football Bums,
score:2.8387074},
{title:Fantasy Football Xtreme,
score:2.7019854},
{title:Fantasy Football Fools,
score:2.7019634},
{title:Fantasy Football Brothers,
score:2.5917912}

(I have some other scoring things in there that account for the difference
between Xtreme and Fools.)

The behavior I'm noticing is that there is some threshold for the length of
non matching words that, when tripped, kicks the score down a notch.  4 to 5
seems to trip one, 6 to 7.

I would really like something like Bums to score the same as Xtreme and
Brothers and let my other criterion determine which document should come
out on top.  Is there something that can be tweaked to get this to happen?

Or is my assumption a bit off base?


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Relevancy and non-matching words

2010-07-06 Thread Erick Erickson
Underneath SOLR is Lucene. Here's a description of
Lucene's scoring algorithm (follow the Similarity link)
http://lucene.apache.org/java/2_4_0/scoring.html#Understanding%20the%20Scoring%20Formula

Letters in non-matching words isn't relevant, what is
is the relationship between the number of search terms
found and the number of tokens (think of them as words)
in the field.

I'm also assuming you've either set the default operator to
AND or that your default field is title.

Using debugQyery=on will show you a lot. you can also
access that information from the admin pages (Full Interface
link or something like that).

HTH
Erick

On Tue, Jul 6, 2010 at 12:17 PM, dbashford dbashf...@hotmail.com wrote:


 Is there some sort of threshold that I can tweak which sets how many
 letters
 in non-matching words makes a result more or less relevant?

 Searching on title, q=fantasy football, and I get this:

 {title:The Fantasy Football Guys,
 score:2.8387074},
 {title:Fantasy Football Bums,
 score:2.8387074},
 {title:Fantasy Football Xtreme,
 score:2.7019854},
 {title:Fantasy Football Fools,
 score:2.7019634},
 {title:Fantasy Football Brothers,
 score:2.5917912}

 (I have some other scoring things in there that account for the difference
 between Xtreme and Fools.)

 The behavior I'm noticing is that there is some threshold for the length of
 non matching words that, when tripped, kicks the score down a notch.  4 to
 5
 seems to trip one, 6 to 7.

 I would really like something like Bums to score the same as Xtreme and
 Brothers and let my other criterion determine which document should come
 out on top.  Is there something that can be tweaked to get this to happen?

 Or is my assumption a bit off base?


 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Relevancy-and-non-matching-words-tp946799p946799.html
 Sent from the Solr - User mailing list archive at Nabble.com.