Christophe:
 
I think the issue here is as follow:
 
The score of these 2 pages is higher. Lets look at it more closely
 
- www.cetic.com -- I bet does not have a lot of inlinks to it and definitely not as many outlinks as the forum pages. Also if the work cetic appears more on the forum pages, they will score higher.
 
- From a query match perspective, the three pages would score pretty much the same (please confirm this by looking as the "explain" page)
 
- So, when the final scoring is done the forum pages, given their inlinks/outlinks get a higher score than the index page
 
 
Try turning off the link analysis (i.e don't run that process) and turn off adding links from the same site and boost link properties (in crawl properties and/or nutch-default), and they give it a try again -- the scores should be much closer now ( the explain page is your friend here)
 
In theory, you're right the main page should rank higher -- but you'll need to get a lot of external inlinks for that to hold true, or incorporate scoring where a shorter URL match gets a higher score.
 
CC-
 
 


From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Christophe Noel
Sent: Monday, January 17, 2005 10:22 AM
To: [EMAIL PROTECTED]
Subject: [Nutch-dev] searching problem illustrated

I would just like to show you a little problem with Nutch and get your comments about it :

After crawling a set of domains (cetic.be , ...)

I submit :

Search : cetic

Nutch give back some results :
===========
Hits 1-3 (out of about 1,342 total matching pages):

WWW.CETIC.BE :: Index
... WWW.CETIC.BE :: Index WWW.CETIC.BE Forum consacr� aux technologies ... 2005 3:10 pm WWW.CETIC.BE Index du Forum Voir ...
http://www.cetic.be/forum (cached) (explain) (anchors) (more from www.cetic.be)

WWW.CETIC.BE :: Voir le Forum - CETIC
... WWW.CETIC.BE :: Voir le Forum - CETIC ...
http://www.cetic.be/forum/viewforum.php?f=1&sid=e804a54c2328e69119cfdba18b1cff76 (cached) (explain) (anchors) (more from www.cetic.be)
===========


God ! Nutch should at least give "www.cetic.be" as first result !

Chris.
------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to