Piotr Kosiorowski wrote:
I do not know which method of computing score is really better but I would like to clarify one issue: - all methods (dbanalyze,fetchlist.score.by.link.count,indexer.boost.by.link.count) use inlinks as far as I can tell from the code itself:
That's correct. The difference is that, with link analysis, there are higher-order effects. It matters not just how many pages link to a page, but how many pages link to the pages that link to a page, and so on.
So if both fetchlist.score.by.link.count and indexer.boost.by.link.count properties are set number of inliks would be used in fact twice in score computation.
No, the score referred to by fetchlist.score.by.link.count is only the score used to prioritize fetching, and is not reflected in the score when searching.
In my opinion the main difference between using simply number of inlinks as indexer.boost.by.link.count and fetchlist.score.by.link.count methods do and db analyze (PageRank computation) is taking into account quality of inlinks. For fetchlist.score.by.link.count and indexer.boost.by.link.count all inliks are treated equally - but PageRank takes into account score of the Page inlink originates from in its computation. So I suppose it should provide better results but because of link spam etc - I would not dare to claim so. I am doing some tests on my collections right now but it is difficult to judge if the results are really better with PageRank.
A PageRank-like link analysis is indeed harder to spam, but, as we all know, it can still be spammed.
Inlink anchor text generally affects search results more than link analysis. This is not higher-order and easy to spam. Thus the higher-order effects of PageRank provide only very limited advantages in spam fighting.
Doug
