Hi, I'm planing to use nutch to crawl between 1 and 2 millionen domains.
From the documentation i guess intranet crawling would be the right
method.
Are there known problems with intranet crawling and this size of domainlist? Regards, Hermann!
Hi, I'm planing to use nutch to crawl between 1 and 2 millionen domains.
From the documentation i guess intranet crawling would be the right
method.
Are there known problems with intranet crawling and this size of domainlist? Regards, Hermann!