Hi - you can safely forget about OPIC, it is useless in continuous crawls. 
LinkRank, however, only works well on very large crawls, with many hosts. It 
can work for single hosts (do not ignore internal links) but the graph will 
become very dense; that's where the IO and CPU time comes from. We don't use 
LinkRank score in Solr at all because results are already very relevant due to 
other (less costly) measures.

You can do it, but you will need some serious hardware. Also, there is the 
problem of frequently changing scores, but you are not frequently updating all 
documents in Solr, using ExternalFileField may help.

-----Original message-----
From: Tobias Marx<[email protected]>
Sent: Friday 21st February 2014 16:29
To: [email protected]
Subject: PageRank or Opic?

Hi!

We're using nutch (1.7) and solr 3.6 for indexing about 80k pages on several 
100 different hosts.

This works quiet well, but there is still room for improvement to search result 
ranking and "relevancy".

When using nutch and solr there are basically two values that influence the 
score auf a query result (correct me if I'm wrong). The score from nutch, which 
becomes the "boost" value in solr and the boost value from solr, which is e.g. 
calculated at query time.

The score in nutch is either calculated bei the "scoring-opic" plugin or with 
the "webgraph" toolchain described here: 
http://wiki.apache.org/nutch/NewScoringIndexingExample 
<http://wiki.apache.org/nutch/NewScoringIndexingExample> which gives the 
PageRank/LinkRank (btw. what with the "scoring-link" plugin? Does it do 
anything at all? What is it role in this?).

We've been playing around with PageRank lately and it's scores look a little 
better than with opic, but on the downside, calculation really takes very long 
and is very cpu intensive.

Well, to cut a long story short, what is your opinion on this? Which ranking do 
you use? Is PageRank worth the trouble? How do you boost solr queries (if you 
use solr at all)?

BR,

--
Tobias Marx

Zentrum für Informations- und Medienverarbeitung - ZIM

Bergische Universität Wuppertal

Büro: T.11.08
+49 202 439 2237
[email protected] <mailto:[email protected]>


Reply via email to