Nutch determines the pages' scores from the number of inbound links
and the authority value of those links. HITS-ish algorithm. If a
sub-level page has more inbound links and/or more important ones,
it'll outscore the front page, which usually has a high score. A nice
solution would be to modify the weighting step, where
internal/external links are weighted to a score, and add consideration
of the depth of the page as well... that should be relatively
painless. You could also hack the segment manually after you created
it (SegmentReader/SegmentWriter), giving fake inbound links to certain
pages or just modifying their score.

Anywho, maybe that gives you an idea or two. I have pretty poor
knowledge of the actual ranking algorithm, so perhaps someone will
come up with better suggestions...

Fredrik

On 7/28/05, EM <[EMAIL PROTECTED]> wrote:
> Is there a chance that the ranking algorithm in Analyze would give higher
> value to a subpage than the root domain page?
> 
> For example:
> http://abc.com  <- 34.432
> http://abc.com/something.html <- 50
> 
> 
> Is the above scenario possible, or does nutch always rank root pages
> highest?
> 
> Regards,
> EM
> 
>


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to