So you mean the only difference(beside some parameter that should be set in
site-nutch.xml is using nutch generate -adddays instead of nutch generate?
what about other parts?) Could you please provide step by step guide?
Regards.


On Sat, Jun 7, 2014 at 4:20 PM, Bayu Widyasanyata <bwidyasany...@gmail.com>
wrote:

> Hi Ali,
>
> OK, I will share using my current script.
> I sometimes use "-adddays" parameter on "nutch generate" steps to force
> recrawling.
>
> Thanks.
>
>
> On Fri, Jun 6, 2014 at 11:02 PM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
>
> > Dear Bayu,
> > Would you please also provide me what procedure you are going to use for
> > testing recrawl? maybe I do some steps wrong.
> > Regards.
> >
> >
> > On Fri, Jun 6, 2014 at 7:01 PM, Bayu Widyasanyata <
> bwidyasany...@gmail.com
> > >
> > wrote:
> >
> > > Just curious, I will go back in lab and proof it....
> > >
> > > ---
> > > wassalam,
> > > [bayu]
> > >
> > > /sent from Android phone/
> > > On Jun 6, 2014 5:37 PM, "Ali Nazemian" <alinazem...@gmail.com> wrote:
> > >
> > > > Dear Bayu,
> > > > Hi,
> > > > I already read that post about recrawling. My problem is nutch does
> not
> > > > works in the same way that this post mentioned.
> > > > Regards.
> > > >
> > > >
> > > > On Fri, Jun 6, 2014 at 2:14 PM, Bayu Widyasanyata <
> > > bwidyasany...@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Ali,
> > > > >
> > > > > This blog [0] may helps.
> > > > >
> > > > > [0]
> > http://pascaldimassimo.com/2010/06/11/how-to-re-crawl-with-nutch/
> > > > >
> > > > >
> > > > > On Thu, Jun 5, 2014 at 12:32 AM, Ali Nazemian <
> alinazem...@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Thank you very much. But it is just a parameter for specifying
> the
> > > > > interval
> > > > > > between re-crawls. The problem is nutch re-crawl does not works
> > with
> > > > > > default crawl script.
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 4, 2014 at 6:49 PM, S.L <simpleliving...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Ali,
> > > > > > >
> > > > > > > If you have not found this out yet, I was referring to
> > > > > > > db.fetch.interval.max.
> > > > > > >
> > > > > > > Sent from my HTC
> > > > > > >
> > > > > > > ----- Reply message -----
> > > > > > > From: "Ali Nazemian" <alinazem...@gmail.com>
> > > > > > > To: <user@nutch.apache.org>
> > > > > > > Subject: Incremental crawling with nutch
> > > > > > > Date: Mon, Jun 2, 2014 4:52 AM
> > > > > > >
> > > > > > > Hi,
> > > > > > > Could you please explain more?
> > > > > > > What parameter? How can I do that?!
> > > > > > > Regards.
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 2, 2014 at 3:42 AM, S.L <simpleliving...@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi Ali
> > > > > > > >
> > > > > > > > Please see the nutch-site.xml parameters one of them does
> that.
> > > > > > > >
> > > > > > > > Sent from my HTC
> > > > > > > >
> > > > > > > > ----- Reply message -----
> > > > > > > > From: "Ali Nazemian" <alinazem...@gmail.com>
> > > > > > > > To: <user@nutch.apache.org>
> > > > > > > > Subject: Incremental crawling with nutch
> > > > > > > > Date: Sun, Jun 1, 2014 10:46 AM
> > > > > > > >
> > > > > > > > Hi everybody,
> > > > > > > > I am going to use nutch for crawling some news web site.
> These
> > > > > websites
> > > > > > > > will be updated regularly. Therefore I should recrawl them at
> > > least
> > > > > > > every 2
> > > > > > > > hours. But the problem is I want to have incremental
> re-crawl,
> > it
> > > > > means
> > > > > > > > nutch should crawl only the urls that are new and not fetched
> > > > before
> > > > > > > > (except the main page of each site for extracting new urls).
> I
> > > want
> > > > > in
> > > > > > > each
> > > > > > > > re-crawling process only the new URLs fetched and send to
> solr
> > > for
> > > > > > > > indexing. Would somebody guide me through this scenario with
> > > nutch
> > > > > 1.8?
> > > > > > > > Best regards.
> > > > > > > >
> > > > > > > > --
> > > > > > > > A.Nazemian
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > A.Nazemian
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > A.Nazemian
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > wassalam,
> > > > > [bayu]
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > A.Nazemian
> > > >
> > >
> >
> >
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> wassalam,
> [bayu]
>



-- 
A.Nazemian

Reply via email to