We are fairly big SOLR shop for most things except web crawling (intranet)
where we use commercial software. Generally, the PageRank algorithm does a
good job of finding the top pages (tend to be home page of sites/subsites).
A simple solr/lucene index doesn't not yield great results due to many pages
having similar content hence we are looking to see if we can use Nutch for
crawling the intranet.

Does Nutch 1.1 support PageRank/LinkRank type of model (I understand that
would be the OPIC algorithm?)
Can we use the NewScoring with 1.1?
http://wiki.apache.org/nutch/NewScoring

Reply via email to