Thanks folks. Will try to do one of these... Could I also pause crawling for a while, then index the whole crawl till the time it was paused(move the indexes out of to different locations) and then continue crawling from where it was paused?
Just a simple pause - resume kind of thing On Thu, Feb 10, 2011 at 10:11 PM, Alexander Aristov < [email protected]> wrote: > Hi > > You may put separate crawling phases to separate scripts something like > > inject.sh > crawl.sh > indexing.sh > > And configure these scripts to start at certain time using any scheduling > tool > > for example I find it very easy to use linux cron scheduler. > > But you can configure that crawl can work between 12.00- 13.00. Crawl is > working until it has unfetched resources in queue or max fetch limit is > reached. And it takes as much time as needed. > > Best Regards > Alexander Aristov > > > On 9 February 2011 04:17, .: Abhishek :. <[email protected]> wrote: > > > Hi all, > > > > I am just trying to figure out if there is some way I can set Nutch > crawls > > between a time interval say like crawl from 12:00 AM to 12:00 PM and then > > start the further processing(start process of indexing and so on that > > follows the crawl) after that. > > > > I think Nutch job is tied to Hadoop's JobConf. I am not sure on how > this > > could be done. Rather, if I am to use an external shell script for doing > > this, how do I chain the crawl process and trigger further processing > after > > crawl? > > > > Thanks, > > Abi > > >

