You should be able to do that by simply changing the boosts in the nutch
properties file.
Change your title boost to 3 or 4 and bring down all the other boosts to
something less than 1.

Re-indexing is not necessary. You only need to re-index if you want to
change the boost in the norm field (NOTE: this boost is DIFFERENT from the
query boost) which is encode into the field and multiplied with the score --
the query boost is then multiplied to this further.

The only problem I see is that you don't want to index anything by content
-- for that you will need to change the query to not look in that field or
give that a very low boost as well (anything between 0 and 1 is a negative
boost). AFAIK, to change the content part you will need to modify the query
code.


 

-----Original Message-----
From: Fredrik Andersson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 03, 2005 2:23 PM
To: [email protected]
Subject: Re: Strange search results

While on the topic guys, if you require another weighting scheme than the
default one, will a re-indexing really be necessary? I'm currently trying to
search just some of the fields. For instance, I'd like to base the hits
entirely on the page title, not by anchor text, contents or other factors. I
thought this would be a matter of hacking the searcher-part of Nutch, not
the index, but I haven't figured it out yet. Any wise words on this problem?

Fredrik

On 8/3/05, Howie Wang <[EMAIL PROTECTED]> wrote:
> Thanks for the tips, Andy and Chirag! It saves me a lot of trouble.
> I'll tweak the boosting for anchors and re-index and see where it gets 
> me.
> 
> Thanks,
> Howie
> 
> 
> >Concur with Andy on both points -- Unfortunately, there is no way to 
> >"go back" and remove either of these values without reindexing, so 
> >let me save you the trouble if you were thinking of changing the 
> >similarity class as a workaround.
> >
> >IMO, the problem with anchors is that you either need to get them 
> >all, or not get them at all -- getting just a few anchors can give 
> >you really bad results as stuff like "click here" will give pages a 
> >high score that don't contain either of these terms.  Another 
> >approach is to go in the properties file and change the boost of 
> >anchors to 0.05, thus giving them a very very low boost
> >
> >Regarding the norm -- this is done at index time for each field. 
> >We've changed the indexing code so that it's always 1
> >
> >HTH,
> >CC
> >
> >
> >-----Original Message-----
> >From: Andy Liu [mailto:[EMAIL PROTECTED]
> >Sent: Wednesday, August 03, 2005 8:00 AM
> >To: [email protected]
> >Subject: Re: Strange search results
> >
> >The fieldNorm is lengthNorm * document boost.  The final value is
"rounded"
> >so that's why you're getting such clean numbers for your fieldNorm.  
> >If you're finding that these pages have too high of a boost, you can 
> >lower indexer.score.power in your conf file.
> >
> >As for your problem in #2, look at the explain page to see how your 
> >search result got there.  Maybe there's a high score for an anchor 
> >match.  The anchor text doesn't show up on the text of the page, so maybe
that's it.
> >
> >Andy
> >
> >On 8/3/05, Howie Wang <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > I've been noticing some strange search results recently. I seem to 
> > > be getting two issues.
> > >
> > > 1. The fieldNorm for certain terms is unusually high for certain 
> > > sites for anchors and titles. And they are usually just whole 
> > > numbers (4.0, 5.0, etc).
> > > I find this strange since the lengthNorm used to calculate this is 
> > > very unlikely to result in an integer. It's either 
> > > 1/sqrt(numTokens) or 1/log(e+numTokens). Where is 5.0 coming from?
> > >
> > > 2. I'm getting hits for sites that don't contain ANY of the terms 
> > > in my search. This is exacerbated by issue #1 since the fieldNorm 
> > > boosts this page to the top of the results. I thought it might be 
> > > because of my changes for stemming, but this happens for search 
> > > terms that are not changed by stemming at all.
> > >
> > > Anyone run into something like this? Any ideas on how to start
> >debugging?
> > >
> > > Thanks,
> > > Howie
> > >
> > >
> > > Howie
> > >
> > >
> > >
> >
> >
> 
> 
>




-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to