Nutch determines the pages' scores from the number of inbound links and the authority value of those links. HITS-ish algorithm. If a sub-level page has more inbound links and/or more important ones, it'll outscore the front page, which usually has a high score. A nice solution would be to modify the weighting step, where internal/external links are weighted to a score, and add consideration of the depth of the page as well... that should be relatively painless. You could also hack the segment manually after you created it (SegmentReader/SegmentWriter), giving fake inbound links to certain pages or just modifying their score.
Anywho, maybe that gives you an idea or two. I have pretty poor knowledge of the actual ranking algorithm, so perhaps someone will come up with better suggestions... Fredrik On 7/28/05, EM <[EMAIL PROTECTED]> wrote: > Is there a chance that the ranking algorithm in Analyze would give higher > value to a subpage than the root domain page? > > For example: > http://abc.com <- 34.432 > http://abc.com/something.html <- 50 > > > Is the above scenario possible, or does nutch always rank root pages > highest? > > Regards, > EM > > ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
