Hi, It's possible to set fetcher.store.content = false in combination with fetcher.parse = true
If disk space is rare or disks are slow this combination may make sense. But there are serious reasons why the parser is run as a separate job per default and, as a precondition, raw content is kept: see NUTCH-872, and http://wiki.apache.org/nutch/FAQ#Can_I_parse_during_the_fetching_process.3F Sebastian On 03/19/2014 11:13 PM, S.L wrote: > Hi All, > > I am not using the Nutch indexer but indexing using my own utility method > after every page is fetched and I need to bypass any additional steps that > Nutch executes in a crawl .Along those line I have identified the following > steps to implement. > > > 1. Disable LinkDB creation by commenting out LinkDB.invert() method. > 2. Not store the fetch_content in a segment which is used to create an > index by setting the property fetcher.store.content to false. > > > I am clear about #1 from discussion I have had with Sebastian earlier. > > About #2 I need to know if having fetcher.store.content set to false would > be a good idea ? > > > Thanks. >

