Thanks Mike. That's what I was thinking, but I was wondering if (hoping!) there was something already to do it :)
Thanks, John. -----Original Message----- From: Mike Sukmanowsky [mailto:[email protected]] Sent: 27 March 2013 14:05 To: [email protected] Subject: Re: Don't process already processed files? It's probably less work to have some kind of a script control Pig execution and keep track of what's been processed and pass in an input path to your Pig script dynamically. For example, you could create a control.py/rb/shfile which would somehow keep track of what's been processed (maybe a simple file) and then figure out the input path to pass to pig during execution via a parameter: pig --param inputpath="/some/dynamic/input/path/for/pig". You'd then setup your cron job to run your control script instead of the Pig script directly. On Wed, Mar 27, 2013 at 6:24 AM, John Farrelly < [email protected]> wrote: > Hi there, > > In our system, we have multiple pig scripts that run against a > particular HDFS directory. The pig scripts can run at different > times, and are scheduled to run regularly. Is there a way to point a > pig script at the same directory for multiple executions, but make > sure that it only processed new files that it hasn't seen before? I > was thinking of using a custom PathFilter for my loader, but I thought > I would ask to see if there is already a way to do this, rather than me > reinventing the wheel (!). > > Thanks, > John. > </pre>**************************************************************** > ************************<br>This email and any files transmitted with > are confidential and intended solely for the<br>use of the individual > or entity to whom they are addressed. If you have received > this<br>email in error then please delete it and notify the sender. Do > not make a copy or forward<br>it to anyone. This footnote also > confirms that this email message has been swept for the<br>presence of > computer viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 > Lower Mount Street, Dublin 2, Ireland<br>Directors: B. Collins, G. > Maclachlan (UK), N. Grierson (UK), J. Ennis (UK), D. Summers > (UK).<br>Registered in Ireland, Company No. 370343, VAT > Reg.No.IE6390343O<br>************************************************* > ***************************************</pre> > -- Mike Sukmanowsky Product Lead, http://parse.ly 989 Avenue of the Americas, 3rd Floor New York, NY 10018 p: +1 (416) 953-4248 e: [email protected] </pre>****************************************************************************************<br>This email and any files transmitted with are confidential and intended solely for the<br>use of the individual or entity to whom they are addressed. If you have received this<br>email in error then please delete it and notify the sender. Do not make a copy or forward<br>it to anyone. This footnote also confirms that this email message has been swept for the<br>presence of computer viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 Lower Mount Street, Dublin 2, Ireland<br>Directors: B. Collins, G. Maclachlan (UK), N. Grierson (UK), J. Ennis (UK), D. Summers (UK).<br>Registered in Ireland, Company No. 370343, VAT Reg.No.IE6390343O<br>****************************************************************************************</pre>
