AFAIK, the fetch job could be made to parse the document ... but updatedb wont happen for that immediately. you will have to wait for the fetch job to finish. Here is that property:
<property> <name>fetcher.parse</name> <value>true</value> <description>If true, fetcher will parse content. NOTE: previous releases would default to true. Since 2.0 this is set to false as a safer default.</description> </property> On Thu, Jun 13, 2013 at 9:51 AM, Nishant shah <[email protected]>wrote: > Hi everyone, > > We have a nutch running on local mode but the crawl is taking a long time. > So we wanted to know if its possible to add this facility wherein when the > url is fetched, we immediately try to parse, updatedb and store it's > contents and other information in hbase. I am using nutch 2.1 with hbase > 0.90.6 . > > Thanks. > > Nishant.

