Using Nutch for only retriving HTML

O. Olson Thu, 24 Sep 2009 11:54:59 -0700

Hi,
        I am new to Nutch. I would like to completely crawl through an Internal 
Website and retrieve all the HTML Content. I don’t intend to do further 
processing using Nutch. 
The Website/Content is rather huge. By crawl, I mean that I would go to a page, 
download/archive the HTML, get the links from that page, and then 
download/archive those pages. I would keep doing this till I don’t have any new 
links.


Is this possible? Is this the right tool for this job, or are there other tools 
out there that would be more suited for my purpose?

Thanks,
O.O.

Using Nutch for only retriving HTML

Reply via email to