I am new to Nutch. My goal is to extract content (local listings) of a certain website. I have obtained the urls of all the listings (only ~20K). And I also wrote a parser to pull the contents (like address and phone). All I need is to download the urls.
But as I used download tool to batch download the urls, very quickly I started to get 404 responses in downloaded pages. Is there a way I can do this in nutch? What's the risk of being blocked again? I just want the urls, no crawl, no indexing, just plain fetch and leaving them intact.
