Re: Running crawls between a specified time interval

Sonal Goyal Thu, 10 Feb 2011 03:23:15 -0800

Abhishek,

You can probably take a look at Oozie or Azkaban. I am not sure they support
running process between xand y time, but definitely support scheduling a job
Thanks and Regards,
Sonal
<https://github.com/sonalgoyal/hiho>Connect Hadoop with databases,
Salesforce, FTP servers and others <https://github.com/sonalgoyal/hiho>
Nube Technologies <http://www.nubetech.co>


<http://in.linkedin.com/in/sonalgoyal>





On Thu, Feb 10, 2011 at 4:31 PM, Markus Jelsma
<[email protected]>wrote:

> I'm unsure about what Hadoop can do here but with Nutch you can't. What you
> can do is create a run script that checks the current time before starting.
> Nutch job's cannot always be aborted and resumed, beware of the fetch
> process.
>
> On Wednesday 09 February 2011 02:17:01 .: Abhishek :. wrote:
> > Hi all,
> >
> >  I am just trying to figure out if there is some way I can set Nutch
> crawls
> > between a time interval say like crawl from 12:00 AM to 12:00 PM and then
> > start the further processing(start process of indexing and so on that
> > follows the crawl) after that.
> >
> >  I think Nutch job is tied to Hadoop's JobConf. I am not sure on  how
> this
> > could be done. Rather, if I am to use an external shell script for doing
> > this, how do I chain the crawl process and trigger further processing
> after
> > crawl?
> >
> > Thanks,
> > Abi
>
> --
> Markus Jelsma - CTO - Openindex
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>

Re: Running crawls between a specified time interval

Reply via email to