In Nutch 1.x you cannot abort and resume the fetch process. On Thursday 10 February 2011 15:27:05 .: Abishek :. wrote: > Thanks folks. Will try to do one of these... > > Could I also pause crawling for a while, then index the whole crawl till > the time it was paused(move the indexes out of to different locations) and > then continue crawling from where it was paused? > > Just a simple pause - resume kind of thing > > On Thu, Feb 10, 2011 at 10:11 PM, Alexander Aristov < > > [email protected]> wrote: > > Hi > > > > You may put separate crawling phases to separate scripts something like > > > > inject.sh > > crawl.sh > > indexing.sh > > > > And configure these scripts to start at certain time using any scheduling > > tool > > > > for example I find it very easy to use linux cron scheduler. > > > > But you can configure that crawl can work between 12.00- 13.00. Crawl is > > working until it has unfetched resources in queue or max fetch limit is > > reached. And it takes as much time as needed. > > > > Best Regards > > Alexander Aristov > > > > On 9 February 2011 04:17, .: Abhishek :. <[email protected]> wrote: > > > Hi all, > > > > > > I am just trying to figure out if there is some way I can set Nutch > > > > crawls > > > > > between a time interval say like crawl from 12:00 AM to 12:00 PM and > > > then start the further processing(start process of indexing and so on > > > that follows the crawl) after that. > > > > > > I think Nutch job is tied to Hadoop's JobConf. I am not sure on how > > > > this > > > > > could be done. Rather, if I am to use an external shell script for > > > doing this, how do I chain the crawl process and trigger further > > > processing > > > > after > > > > > crawl? > > > > > > Thanks, > > > Abi
-- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350

