Re: Best way to crawl, but not index?

Branden Makana Wed, 21 Jul 2010 16:28:50 -0700

Hi All,


        Just wanted to follow up my question with a polite request that perhaps 
the documentation for Nutch be updated? I'm trying to follow the Nutch Tutorial 
(http://wiki.apache.org/nutch/NutchTutorial) to see if I can crawl a site 
without indexing it, but the commands and examples shown are out of date: 
directories are named different than in the examples (or don't exist at all), 
and even some of the commands appear to be different. 

        Nutch being open source, I'd gladly volunteer to do the updating, 
assuming someone can give me the pertinent information...

Thanks,
Branden Makana

On Jul 21, 2010, at 11:52 AM, Branden Makana wrote:

> Hello,
> 
> 
>       We're trying to crawl a very large site, but we really just want all 
> the html/image URLs on the site - we don't care to search it. Therefor, 
> what's the best way to have Nutch crawl the site, but NOT index/store pages 
> locally? Is it even possible? 
> 
> 
> 
> Thanks,
> Branden Makana

Re: Best way to crawl, but not index?

Reply via email to