In Nutch 1.x you cannot abort and resume the fetch process.

On Thursday 10 February 2011 15:27:05 .: Abishek :. wrote:
> Thanks folks. Will try to do one of these...
> 
> Could I also pause crawling for a while, then index the whole crawl till
> the time it was paused(move the indexes out of to different locations) and
> then continue crawling from where it was paused?
> 
>  Just a simple pause - resume kind of thing
> 
> On Thu, Feb 10, 2011 at 10:11 PM, Alexander Aristov <
> 
> [email protected]> wrote:
> > Hi
> > 
> > You may put separate crawling phases to separate scripts something like
> > 
> > inject.sh
> > crawl.sh
> > indexing.sh
> > 
> > And configure these scripts to start at certain time using any scheduling
> > tool
> > 
> > for example I find it very easy to use linux cron scheduler.
> > 
> > But you can configure that crawl can work between 12.00- 13.00. Crawl is
> > working until it has unfetched resources in queue or max fetch limit is
> > reached. And it takes as much time as needed.
> > 
> > Best Regards
> > Alexander Aristov
> > 
> > On 9 February 2011 04:17, .: Abhishek :. <[email protected]> wrote:
> > > Hi all,
> > > 
> > >  I am just trying to figure out if there is some way I can set Nutch
> > 
> > crawls
> > 
> > > between a time interval say like crawl from 12:00 AM to 12:00 PM and
> > > then start the further processing(start process of indexing and so on
> > > that follows the crawl) after that.
> > > 
> > >  I think Nutch job is tied to Hadoop's JobConf. I am not sure on  how
> > 
> > this
> > 
> > > could be done. Rather, if I am to use an external shell script for
> > > doing this, how do I chain the crawl process and trigger further
> > > processing
> > 
> > after
> > 
> > > crawl?
> > > 
> > > Thanks,
> > > Abi

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Reply via email to