Hi, I want to validate nutch crawl just to make sure all links (URL) has been crawled. For e.g if one page has 500 URL then want to make sure it crawled all 500. One Way is to manually identify all links on page and then check that url is present in crawled URLS.
Another thing is there anyway to check which URL’s could not be crawled, like due to some filter or website did not allow to crawl some page or some other reason. Thanks

