Hi Markus, On Wed, Sep 10, 2014 at 2:00 AM, <[email protected]> wrote:
> Hey Lewis, > > We didn't use it in the end, but did run the LinkRank on large amounts of > data. We then used the scores generated by it for biasing a deduplication > algorithm. We tested it thoroughly and never stumbled on issues that could > have been resolved using the Loops algorithm. > > Thanks for reply Markus. OK so here is the deal, we are currently exhausting vertical crawls on around 20-30 domains. We are not obtaining external links at the moment to domains outside of those target domains, so I've adjusted the <linkrank> properties in nutch-site.xml accordingly along with other related properties and config to restric the crawl as such. I am going to experiment with using both options in an attempt to move towards attacking this documentation and substantiating upon my own understanding. Thanks for your reply. Lewis

