Re: Question About Boosting.

2007-03-12 Thread shai deljo

Buckets it is :)
Thx

On 3/12/07, Chris Hostetter [EMAIL PROTECTED] wrote:


: I thought about this option but it doesn't sound scalable. What
: happens if i have 100 words with 100 different boost factors?

then you've got a problem :)

typically it's not this severe ... i'll frequently have half a dozen
fields that i divide text up into to boost on different amounts, but i'm
having a hard time understanding why you would need 100 unique boost
factors for 100 unique words ... putting things buckets tends be
effective.



-Hoss




Re: Question About Boosting.

2007-03-11 Thread Walter Underwood
Back up another step. What are the documents and what do you
want to show to the users? Have you tried the default configuration
with real user queries?

After you've tested it with user queries, then look at the
results where the ranking isn't performing well.

Lucene and Solr already automatically boost rare terms over
common terms, using tf.idf weighting.

I posted more detail on this in my blog last summer:

http://wunderwood.org/most_casual_observer/2006/06/good_to_great_search.html

wunder

On 3/10/07 8:04 PM, shai deljo [EMAIL PROTECTED] wrote:

 I have elements within a field that have different importance.
 I thought boosting would be an elegant way to take this into account.
 Please advise,
 
 
 On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote:
 What are you trying to achieve? Let's start with the problem
 instead of picking one solution which Solr doesn't support. --wunder
 
 On 3/10/07 5:08 PM, shai deljo [EMAIL PROTECTED] wrote:
 
 How can i boost some tokens over others in the same field (at Index
 time) ? If this is not supported directly, what's the best way around
 this problem (what's the hack to solve this :) ).
 Thanks,
 Shai
 
 



Re: Question About Boosting.

2007-03-11 Thread shai deljo

Thanks,
The only way i found to do this
(http://www.mail-archive.com/solr-user@lucene.apache.org/msg02456.html)
is to hack and repeat the word several times in the field, but
doesn't this screw up the norms?
Also, how do i boost words in a query? e.g. q=key1 key2 and i know
key2 is twice as important than key1 ? (searching 1 field).
Thanks,
S.

On 3/11/07, Walter Underwood [EMAIL PROTECTED] wrote:

Back up another step. What are the documents and what do you
want to show to the users? Have you tried the default configuration
with real user queries?

After you've tested it with user queries, then look at the
results where the ranking isn't performing well.

Lucene and Solr already automatically boost rare terms over
common terms, using tf.idf weighting.

I posted more detail on this in my blog last summer:

http://wunderwood.org/most_casual_observer/2006/06/good_to_great_search.html

wunder

On 3/10/07 8:04 PM, shai deljo [EMAIL PROTECTED] wrote:

 I have elements within a field that have different importance.
 I thought boosting would be an elegant way to take this into account.
 Please advise,


 On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote:
 What are you trying to achieve? Let's start with the problem
 instead of picking one solution which Solr doesn't support. --wunder

 On 3/10/07 5:08 PM, shai deljo [EMAIL PROTECTED] wrote:

 How can i boost some tokens over others in the same field (at Index
 time) ? If this is not supported directly, what's the best way around
 this problem (what's the hack to solve this :) ).
 Thanks,
 Shai






Re: Question About Boosting.

2007-03-11 Thread Mike Klaas

On 3/11/07, shai deljo [EMAIL PROTECTED] wrote:

Thanks,
The only way i found to do this
(http://www.mail-archive.com/solr-user@lucene.apache.org/msg02456.html)
 is to hack and repeat the word several times in the field, but
doesn't this screw up the norms?


Yes, it can influence the norms.


Also, how do i boost words in a query? e.g. q=key1 key2 and i know
key2 is twice as important than key1 ? (searching 1 field).


q=key1 key2^2

If the keywords that have more importance are the same for every
document, query-time boosting is by far the more preferable route.
You have much more flexibility and it isn't  less performant.

There are some things which are elegantly solved using index-time
boosting, and so it is likely that lucene will support it one day.

-Mike


Question About Boosting.

2007-03-10 Thread shai deljo

How can i boost some tokens over others in the same field (at Index
time) ? If this is not supported directly, what's the best way around
this problem (what's the hack to solve this :) ).
Thanks,
Shai


Re: Question About Boosting.

2007-03-10 Thread Walter Underwood
What are you trying to achieve? Let's start with the problem
instead of picking one solution which Solr doesn't support. --wunder

On 3/10/07 5:08 PM, shai deljo [EMAIL PROTECTED] wrote:

 How can i boost some tokens over others in the same field (at Index
 time) ? If this is not supported directly, what's the best way around
 this problem (what's the hack to solve this :) ).
 Thanks,
 Shai