Hi,

It's possible to set
 fetcher.store.content = false
in combination with
 fetcher.parse = true

If disk space is rare or disks are slow this combination may make sense.
But there are serious reasons why the parser is run as a separate job
per default and, as a precondition, raw content is kept: see NUTCH-872,
and http://wiki.apache.org/nutch/FAQ#Can_I_parse_during_the_fetching_process.3F

Sebastian

On 03/19/2014 11:13 PM, S.L wrote:
> Hi All,
> 
> I am not using the Nutch indexer but indexing using my own utility method
> after every page is fetched and I need to bypass any additional steps that
> Nutch executes in a crawl .Along those line I have identified the following
> steps to implement.
> 
> 
>    1. Disable LinkDB creation by commenting out LinkDB.invert() method.
>    2. Not store the fetch_content in a segment which is used to create an
>    index by setting the property fetcher.store.content to false.
> 
> 
> I am clear about #1 from discussion I have had with Sebastian earlier.
> 
> About #2 I need to know if having fetcher.store.content set to false would
> be a good idea ?
> 
> 
> Thanks.
> 

Reply via email to