Hi,

I want to validate nutch crawl just to make sure all links (URL) has been 
crawled. For e.g if one page has 500 URL then want to make sure it crawled all 
500.
One Way is to manually identify all links on page and then check that url is 
present in crawled URLS.

Another thing is there anyway to check which URL’s could not be crawled, like 
due to some filter or website did not allow to crawl some page or some other 
reason.

Thanks


Reply via email to