I am new to Nutch.

My goal is to extract content (local listings) of a certain website. I have
obtained the urls of all the listings (only ~20K). And I also wrote a parser
to pull the contents (like address and phone). All I need is to download the
urls.

But as I used download tool to batch download the urls, very quickly I
started to get 404 responses in downloaded pages.

Is there a way I can do this in nutch? What's the risk of being blocked
again? I just want the urls, no crawl, no indexing, just plain fetch and
leaving them intact.

Reply via email to