Byron Miller wrote:
Here is what the great Doug said:
"
Are you using link analysis? Perhaps it is doing you a disservice by
prioritizing one site above the others. Try, in place of the analyze
command, setting setting both fetchlist.score.by.link.count and
indexer.boost.by.link.count to true. Please tell us how that works for you.
Doug"
I did this and haven't ran analyze since then and you can see the results
on mozdex.com looking pretty good!
Both methods boost up well-connected pages, and penalize
poorly-connected ones. However, if I understand this correctly the
implications of using this method instead of DB analysis are the following:
* DB analysis builds a web graph to discover how many incoming links
point to a given page, and calculates the score based on that (which is
essentially what Google's PageRank is about)
* scoring by outlink count also promotes well-connected pages, but this
time the ones with a lot of _outgoing_ links.
PageRank is based on an assumption about a social behaviour, that people
will link to pages they find interesting and relevant, so a well-linked
page must be therefore important. Such page will get a higher score
(will be considered more relevant to the query, all other factors being
equal).
The method that scores by outlink count seems to promote pages that are
just link directories. However, in reality such pages don't have to be
more relevant to the query than pages with few outlinks - because they
may point to many very disparate areas. But they will still get a higher
score just by the virtue of having a lot of outgoing links. So, in this
case the relationship between the social behaviour of linking to
interesting pages, and page relevance, doesn't apply, because the links
don't reflect someone's judgment that this page is interesting.
Having said that, I'm a practical person, too - if it works well enough,
then the better for us. :-) And PageRank is not the oracle either.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com