Dear Bayu,
Hi,
I already read that post about recrawling. My problem is nutch does not
works in the same way that this post mentioned.
Regards.


On Fri, Jun 6, 2014 at 2:14 PM, Bayu Widyasanyata <[email protected]>
wrote:

> Hi Ali,
>
> This blog [0] may helps.
>
> [0] http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
>
>
> On Thu, Jun 5, 2014 at 12:32 AM, Ali Nazemian <[email protected]>
> wrote:
>
> > Thank you very much. But it is just a parameter for specifying the
> interval
> > between re-crawls. The problem is nutch re-crawl does not works with
> > default crawl script.
> >
> >
> > On Wed, Jun 4, 2014 at 6:49 PM, S.L <[email protected]> wrote:
> >
> > > Ali,
> > >
> > > If you have not found this out yet, I was referring to
> > > db.fetch.interval.max.
> > >
> > > Sent from my HTC
> > >
> > > ----- Reply message -----
> > > From: "Ali Nazemian" <[email protected]>
> > > To: <[email protected]>
> > > Subject: Incremental crawling with nutch
> > > Date: Mon, Jun 2, 2014 4:52 AM
> > >
> > > Hi,
> > > Could you please explain more?
> > > What parameter? How can I do that?!
> > > Regards.
> > >
> > >
> > > On Mon, Jun 2, 2014 at 3:42 AM, S.L <[email protected]> wrote:
> > >
> > > > Hi Ali
> > > >
> > > > Please see the nutch-site.xml parameters one of them does that.
> > > >
> > > > Sent from my HTC
> > > >
> > > > ----- Reply message -----
> > > > From: "Ali Nazemian" <[email protected]>
> > > > To: <[email protected]>
> > > > Subject: Incremental crawling with nutch
> > > > Date: Sun, Jun 1, 2014 10:46 AM
> > > >
> > > > Hi everybody,
> > > > I am going to use nutch for crawling some news web site. These
> websites
> > > > will be updated regularly. Therefore I should recrawl them at least
> > > every 2
> > > > hours. But the problem is I want to have incremental re-crawl, it
> means
> > > > nutch should crawl only the urls that are new and not fetched before
> > > > (except the main page of each site for extracting new urls). I want
> in
> > > each
> > > > re-crawling process only the new URLs fetched and send to solr for
> > > > indexing. Would somebody guide me through this scenario with nutch
> 1.8?
> > > > Best regards.
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> > >
> > >
> > > --
> > > A.Nazemian
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> wassalam,
> [bayu]
>



-- 
A.Nazemian

Reply via email to