> Is this a Problem of The nutch which fails to fetch huge list of URLs?
Probably not, it is able to fetch millions of URLs in a single fetch list. Is there a timelimit set (property fetcher.timelimit.mins). That could explain why a large list isn't fetched. There should be a message in the log files about the reason. Best, Sebastian On 04/12/2017 06:55 AM, shubham.gupta wrote: > Hey > > I have around 5000 URLs in my seed Url list. If I inject the whole list, then > it fails to fetch all > documents and parse. The depth is set to 1. > > But when the list is divided into a batch of 1000 URLs then it is able to > fetch and parse all > documents successfully. > > In the former case 5141 URLs are injected, out of which 5127 URLs are > generated and only 1300 URLs > get fetched with status 2. Out of the rest 1342 do not have a status 2 and > the rest are unfetched. > > While, when the list is small, the total count of documents is 3220, out of > which the documents with > status 2 are 1298, the documents with status code other than 2 are 1922. And, > the count of documents > which have not been fetched yet is 1. > > Is this a Problem of The nutch which fails to fetch huge list of URLs? Or > some changes need to be > made in the configuration files. > > Please reply soon. >

