Re: 'Down' boosting shorter docs

2009-10-15 Thread Walter Underwood

Another approach is to change the document length normalization formula.

See Similarity.lengthNorm() in Lucene.

wunder

On Oct 15, 2009, at 12:45 AM, Andrea D'Ippolito wrote:


I've read (correct me if I'm wrong)
that a solution to achieve that is overboost all the other fields.
but I guess this works easily only if u have few fields indexed ;)

bye

2009/10/15 Simon Wistow si...@thegestalt.org


Our index has some items in it which basically contain a title and a
single word body.

If the user searches for a word in the title (especially if title  
is of

itself only oen word) then that doc will get scored quite highly,
despite the fact that, in this case, it's not really relevant.

I've tried something like

qf=title^2.0 content^0.5
bf=num_pages

but that disproportionally boosts long documents to the detriment of
relevancy

bf=product(num_pages,0.05)

has no effect but

bf=product(num_pages,0.06)


has a bunch of long documents which don't seem to return any  
highlighted
fields plus the short document with only the query in the title  
which is

progress in that it's almost exactly the opposite of what I want.

Any suggestions? Am I going to need to reindex and add the length in
bytes or characters of the document?

Simon









'Down' boosting shorter docs

2009-10-14 Thread Simon Wistow
Our index has some items in it which basically contain a title and a 
single word body.

If the user searches for a word in the title (especially if title is of 
itself only oen word) then that doc will get scored quite highly, 
despite the fact that, in this case, it's not really relevant.

I've tried something like

qf=title^2.0 content^0.5
bf=num_pages

but that disproportionally boosts long documents to the detriment of 
relevancy

bf=product(num_pages,0.05)

has no effect but 

bf=product(num_pages,0.06)


has a bunch of long documents which don't seem to return any highlighted 
fields plus the short document with only the query in the title which is 
progress in that it's almost exactly the opposite of what I want.

Any suggestions? Am I going to need to reindex and add the length in 
bytes or characters of the document?

Simon






Re: 'Down' boosting shorter docs

2009-10-14 Thread Yonik Seeley
A multiplicative boost may work better than one added in:
http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

-Yonik
http://www.lucidimagination.com



On Wed, Oct 14, 2009 at 7:21 PM, Simon Wistow si...@thegestalt.org wrote:
 Our index has some items in it which basically contain a title and a
 single word body.

 If the user searches for a word in the title (especially if title is of
 itself only oen word) then that doc will get scored quite highly,
 despite the fact that, in this case, it's not really relevant.

 I've tried something like

 qf=title^2.0 content^0.5
 bf=num_pages

 but that disproportionally boosts long documents to the detriment of
 relevancy

 bf=product(num_pages,0.05)

 has no effect but

 bf=product(num_pages,0.06)


 has a bunch of long documents which don't seem to return any highlighted
 fields plus the short document with only the query in the title which is
 progress in that it's almost exactly the opposite of what I want.

 Any suggestions? Am I going to need to reindex and add the length in
 bytes or characters of the document?

 Simon