Repetitive pig scripts...

Alex McLintock Sun, 06 Feb 2011 14:38:13 -0800

I'm trying to understand the best way of setting up repeated processing of
continuously generated data - like logs.


I can manually copy files from normal FS to HDFS and kick off pig scripts
but ideally I want something automatic - preferably every hour, or possibly
more often. I also want to process a day or a month's worth of data rather
than just the most recent file.

Is there a best practice way of doing this documented anywhere? I believe
that I should be looking at Flume for transferring files into HDFS and Oozie
for some kind of workflow of pig jobs. Is that right? Any example setups?

Cheers

Alex

Repetitive pig scripts...

Reply via email to