Re: Running crawls between a specified time interval

Markus Jelsma Thu, 10 Feb 2011 03:01:56 -0800

I'm unsure about what Hadoop can do here but with Nutch you can't. What you 
can do is create a run script that checks the current time before starting. 
Nutch job's cannot always be aborted and resumed, beware of the fetch process.


On Wednesday 09 February 2011 02:17:01 .: Abhishek :. wrote:
> Hi all,
> 
>  I am just trying to figure out if there is some way I can set Nutch crawls
> between a time interval say like crawl from 12:00 AM to 12:00 PM and then
> start the further processing(start process of indexing and so on that
> follows the crawl) after that.
> 
>  I think Nutch job is tied to Hadoop's JobConf. I am not sure on  how this
> could be done. Rather, if I am to use an external shell script for doing
> this, how do I chain the crawl process and trigger further processing after
> crawl?
> 
> Thanks,
> Abi

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: Running crawls between a specified time interval

Reply via email to