Re: Index-time Boosting

Yonik Seeley Tue, 05 Dec 2006 18:25:16 -0800

Yep, sounds like you got it.  Query-time boosting is what you want.

however, the above document *will* have a higher
score, in general, because the "title" portion was nearly
half of the "text" field.


Well, if you boost *all* of the "title" fields by 100, it also has the
net effect of boosting *all* the "text" fields by 100... it's going to
be a wash when searching on the text field.

FWIW, I don't recall any Solr collections in CNET using index
boosts... query-time boosts are far more flexible.

Some Lucene users have used index-time boosts to boost more recent
documents in the index, but with Solr's function query, that can be
done at query time too.

-Yonik

On 12/5/06, Tracey Jaquith <[EMAIL PROTECTED]> wrote:

ahh, after rereading this about 20 times today 8-)
i think i finally "get it" (your final question below).

if i do index-time boosts, and search only "text" (default field)
the boosts will propogate into "text", but only insofar that the
document will weight higher when a phrase is found in the "text"
field (regardless of whether that "hit" really was due to something
copyField-ed in with boost 1, boost 100, etc.)

so that solution would have the effect of making certain documents
have higher scores in the "text" field, not the effect we'd like.

[example documentA]
  [description] i like to commute
   [title] commuting thoughts
copyField text to:
  [text] i like to commute commuting thoughts

we, the Archive, want query hits in title to boost ^100.
if we do q=commute (which searches "text")
with index-time boosting, solr/lucene won't know
the hit due to "title" should effect a much higher ranking
compared to documents with commute in "text" but
not in "title".   however, the above document *will* have a higher
score, in general, because the "title" portion was nearly
half of the "text" field.  Yet A will have a
higher ranking even for matches like "q=like"
compared to documentB like:
  [description] i like bread
  [text] i like bread
(when in reality, we'd like them to have near equal weighting).
So index boosts won't due for us.  I'm learning!

--tracey

>>  the std handler to see the ordering of the results change for
>> "fieldless queries"
>>  (eg: "q=tracey+pooh").  I have 33 fields using <copyField dest="text"
>> source="..."/>
>>   (where "text" is our default field to query)
>>  to allow for checking across most of our std XML fields.  I gather that
>> a boost
>>   applied to "title" on indexing a docuement must somehow "propogate"
>> to the
>>   "text" field?
>  I've tried some experiments, adjusting the boosts at index time and
> running
>
> Background: for an indexed field name there is a single boost value
> per document.  This is true even if the field is multi-valued... all
> values for that document "share" the same boost.  This is a Lucene
> restriction so we can't fix it in Solr in any way.
>
> Solr *does* propagate the index-time boost when doing copyField, but
> this just ends up being multiplied into all the other boosts for
> values for that document.   Matches on the resulting text field will
> *always* score higher, regardless of which "part" matched.  Does that
> make sense?
>
*ith - http://www.archive.org/~tracey <http://www.archive.org/%7Etracey> --*

Re: Index-time Boosting

Reply via email to