Re: Continuous crawl

Tejas Patil Fri, 14 Jun 2013 00:53:05 -0700

AFAIK, the fetch job could be made to parse the document ... but updatedb
wont happen for that immediately. you will have to wait for the fetch job
to finish. Here is that property:

<property>
  <name>fetcher.parse</name>
  <value>true</value>
  <description>If true, fetcher will parse content. NOTE: previous
releases would
  default to true. Since 2.0 this is set to false as a safer
default.</description>
</property>

On Thu, Jun 13, 2013 at 9:51 AM, Nishant shah <[email protected]>wrote:

> Hi everyone,
>
> We have a nutch running on local mode but the crawl is taking a long time.
> So we wanted to know if its possible to add this facility wherein when the
> url is fetched, we immediately try to parse, updatedb and store it's
> contents and other information in hbase. I am using nutch 2.1 with hbase
> 0.90.6 .
>
> Thanks.
>
> Nishant.

Re: Continuous crawl

Reply via email to