Thanks Markus, usually for recrawling, I see that there are options which do not use the bin/nutch crawl like "Can it recrawl section " in http://wiki.apache.org/nutch/Crawl
However, would there be a difference? other things to be kept in mind? if we end up setting up a cron job on linux to say crawl every day and each day we trigger something like bin/nutch crawl urls -dir arndme -depth 4 -topN 3 So we have the cron which calls this again and again. Regards | Vikas On Tue, May 15, 2012 at 6:07 PM, Markus Jelsma <[email protected]>wrote: > On Tuesday 15 May 2012 17:39:31 Vikas Hazrati wrote: > > So once the crawl (which abstracts iterative crawls till the depth is > > reached) is finished, is there a way to trigger a recrawl as well as a > part > > of some command line option so that Nutch continues to run as a daemon or > > is shell script the way out? > > shell scripting is the way to go. Nutch will automatically recrawl pages > that > are due to be refetched. > > > > > Regards | Vikas > > > > On Fri, May 11, 2012 at 8:26 PM, Lewis John Mcgibbney < > > > > [email protected]> wrote: > > > If you would like I could add you to the moderators group and you can > > > word it how you wish. > > > > > > Please sign up to Jira, give me your Jira username on this page, and I > > > will happily add you the the group. > > > > > > On the other-hand, if you don't wish to do this, then please reply > > > here with your suggestion and I'll make sure something gets changed to > > > accommodate your suggestions. > > > > > > Thanks > > > > > > On Fri, May 11, 2012 at 2:52 PM, Matthias Paul < > [email protected]> > > > > > > wrote: > > > > In was confused by this tutorial: > > > http://wiki.apache.org/nutch/NutchTutorial > > > > > > > Reading this page one might get to the conclusion that the crawl tool > > > > can't do iterative crawling, because under "3.2 Using Individual > > > > Commands for Whole-Web Crawling" there's the sentence "This also > > > > permits ... incremental crawling", as if the crawl command described > > > > before (3.1 Using the Crawl Command) couldn't do that. > > > > > > > > Could someone perhaps improve this part of the tutorial? > > > > > > > > Matthias > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 10, 2012 at 8:39 PM, Markus Jelsma > > > > > > > > <[email protected]> wrote: > > > >> By default each crawl is iterative. The crawl command is nothing > more > > > > > > than a wrapper around the individual crawl cycle commands. The depth > > > parameter is nothing more than executing a single crawl cycle multiple > > > times. This is, if i am not mistaken, also true for older releases, > > > certainly 1.2 and above. > > > > > > >> On Thu, 10 May 2012 19:31:27 +0100, Lewis John Mcgibbney < > > > > > > [email protected]> wrote: > > > >>> For the record, there is a patch pending review for Nutchgora which > > > >>> will sort part of this for you as well. > > > >>> > > > >>> https://issues.apache.org/jira/browse/NUTCH-1301 > > > >>> > > > >>> Susam Pal also contributed a patch for Nutchgora regarding > incremental > > > >>> indexing but I can't find it just now sorry. > > > >>> > > > >>> Lewis > > > >>> > > > >>> > > > >>> On Thu, May 10, 2012 at 5:18 PM, Matthias Paul > > > >>> > > > >>> <[email protected]> wrote: > > > >>>> Hi all, > > > >>>> > > > >>>> can the crawl-command also be used for iterative crawls? > > > >>>> In older Nutch-versions this was not possible but in 1.5 it seems > to > > > > > > work? > > > > > > >>>> Thanks > > > >>>> Matthias > > > >> > > > >> -- > > > >> Markus Jelsma - CTO - Openindex > > > > > > -- > > > Lewis > -- > Markus Jelsma - CTO - Openindex > >

