stopping the crawl because of irrelevant domains

Patricio Galeas Tue, 25 May 2010 03:31:45 -0700

Hello,
Twenty days ago I started a web-crawl and in the last days a lot of pages from 
some irrelevant domains were crawled. 
For this reason I would like to stop the crawl to redefine my url-filter and 
start the crawl again.
Is there a way to stop the crawl without lost the whole data of the currently 
segment?
If the gracefully stop is not possible and I simply kill the crawl process, can 
I use the solution proposed in: 
https://issues.apache.org/jira/browse/NUTCH-451 ?


Thank you for your comments
Patricio

stopping the crawl because of irrelevant domains

Reply via email to