Re: Running crawls between a specified time interval

Alexander Aristov Thu, 10 Feb 2011 06:12:48 -0800

Hi

You may put separate crawling phases to separate scripts something like


inject.sh
crawl.sh
indexing.sh

And configure these scripts to start at certain time using any scheduling
tool

for example I find it very easy to use linux cron scheduler.

But you can configure that crawl can work between 12.00- 13.00. Crawl is
working until it has unfetched resources in queue or max fetch limit is
reached. And it takes as much time as needed.

Best Regards
Alexander Aristov


On 9 February 2011 04:17, .: Abhishek :. <[email protected]> wrote:

> Hi all,
>
>  I am just trying to figure out if there is some way I can set Nutch crawls
> between a time interval say like crawl from 12:00 AM to 12:00 PM and then
> start the further processing(start process of indexing and so on that
> follows the crawl) after that.
>
>  I think Nutch job is tied to Hadoop's JobConf. I am not sure on  how this
> could be done. Rather, if I am to use an external shell script for doing
> this, how do I chain the crawl process and trigger further processing after
> crawl?
>
> Thanks,
> Abi
>

Re: Running crawls between a specified time interval

Reply via email to