Hi All,

        Just wanted to follow up my question with a polite request that perhaps 
the documentation for Nutch be updated? I'm trying to follow the Nutch Tutorial 
(http://wiki.apache.org/nutch/NutchTutorial) to see if I can crawl a site 
without indexing it, but the commands and examples shown are out of date: 
directories are named different than in the examples (or don't exist at all), 
and even some of the commands appear to be different. 

        Nutch being open source, I'd gladly volunteer to do the updating, 
assuming someone can give me the pertinent information...

Thanks,
Branden Makana

On Jul 21, 2010, at 11:52 AM, Branden Makana wrote:

> Hello,
> 
> 
>       We're trying to crawl a very large site, but we really just want all 
> the html/image URLs on the site - we don't care to search it. Therefor, 
> what's the best way to have Nutch crawl the site, but NOT index/store pages 
> locally? Is it even possible? 
> 
> 
> 
> Thanks,
> Branden Makana

Reply via email to