Hey

I have around 5000 URLs in my seed Url list. If I inject the whole list, then it fails to fetch all documents and parse. The depth is set to 1.

But when the list is divided into a batch of 1000 URLs then it is able to fetch and parse all documents successfully.

In the former case 5141 URLs are injected, out of which 5127 URLs are generated and only 1300 URLs get fetched with status 2. Out of the rest 1342 do not have a status 2 and the rest are unfetched.

While, when the list is small, the total count of documents is 3220, out of which the documents with status 2 are 1298, the documents with status code other than 2 are 1922. And, the count of documents which have not been fetched yet is 1.

Is this a Problem of The nutch which fails to fetch huge list of URLs? Or some changes need to be made in the configuration files.

Please reply soon.

--
Thanks and Regards,
Shubham Gupta

Reply via email to