Hi

You may put separate crawling phases to separate scripts something like

inject.sh
crawl.sh
indexing.sh

And configure these scripts to start at certain time using any scheduling
tool

for example I find it very easy to use linux cron scheduler.

But you can configure that crawl can work between 12.00- 13.00. Crawl is
working until it has unfetched resources in queue or max fetch limit is
reached. And it takes as much time as needed.

Best Regards
Alexander Aristov


On 9 February 2011 04:17, .: Abhishek :. <[email protected]> wrote:

> Hi all,
>
>  I am just trying to figure out if there is some way I can set Nutch crawls
> between a time interval say like crawl from 12:00 AM to 12:00 PM and then
> start the further processing(start process of indexing and so on that
> follows the crawl) after that.
>
>  I think Nutch job is tied to Hadoop's JobConf. I am not sure on  how this
> could be done. Rather, if I am to use an external shell script for doing
> this, how do I chain the crawl process and trigger further processing after
> crawl?
>
> Thanks,
> Abi
>

Reply via email to