Hi Markus,

On Wed, Sep 10, 2014 at 2:00 AM, <[email protected]> wrote:

> Hey Lewis,
>
> We didn't use it in the end, but did run the LinkRank on large amounts of
> data. We then used the scores generated by it for biasing a deduplication
> algorithm. We tested it thoroughly and never stumbled on issues that could
> have been resolved using the Loops algorithm.
>
> Thanks for reply Markus.
OK so here is the deal, we are currently exhausting vertical crawls on
around 20-30 domains. We are not obtaining external links at the moment to
domains outside of those target domains, so I've adjusted the <linkrank>
properties in nutch-site.xml accordingly along with other related
properties and config to restric the crawl as such.
I am going to experiment with using both options in an attempt to move
towards attacking this documentation and substantiating upon my own
understanding.
Thanks for your reply.
Lewis

Reply via email to