Re: Incremental crawling with nutch

Ali Nazemian Mon, 02 Jun 2014 01:53:25 -0700

Hi,
Could you please explain more?
What parameter? How can I do that?!
Regards.



On Mon, Jun 2, 2014 at 3:42 AM, S.L <[email protected]> wrote:

> Hi Ali
>
> Please see the nutch-site.xml parameters one of them does that.
>
> Sent from my HTC
>
> ----- Reply message -----
> From: "Ali Nazemian" <[email protected]>
> To: <[email protected]>
> Subject: Incremental crawling with nutch
> Date: Sun, Jun 1, 2014 10:46 AM
>
> Hi everybody,
> I am going to use nutch for crawling some news web site. These websites
> will be updated regularly. Therefore I should recrawl them at least every 2
> hours. But the problem is I want to have incremental re-crawl, it means
> nutch should crawl only the urls that are new and not fetched before
> (except the main page of each site for extracting new urls). I want in each
> re-crawling process only the new URLs fetched and send to solr for
> indexing. Would somebody guide me through this scenario with nutch 1.8?
> Best regards.
>
> --
> A.Nazemian
>



-- 
A.Nazemian

Re: Incremental crawling with nutch

Reply via email to