Hi, I am new to Nutch. I would like to completely crawl through an Internal Website and retrieve all the HTML Content. I don’t intend to do further processing using Nutch. The Website/Content is rather huge. By crawl, I mean that I would go to a page, download/archive the HTML, get the links from that page, and then download/archive those pages. I would keep doing this till I don’t have any new links.
Is this possible? Is this the right tool for this job, or are there other tools out there that would be more suited for my purpose? Thanks, O.O.