Hi there, In our system, we have multiple pig scripts that run against a particular HDFS directory. The pig scripts can run at different times, and are scheduled to run regularly. Is there a way to point a pig script at the same directory for multiple executions, but make sure that it only processed new files that it hasn't seen before? I was thinking of using a custom PathFilter for my loader, but I thought I would ask to see if there is already a way to do this, rather than me reinventing the wheel (!).
Thanks, John. </pre>****************************************************************************************<br>This email and any files transmitted with are confidential and intended solely for the<br>use of the individual or entity to whom they are addressed. If you have received this<br>email in error then please delete it and notify the sender. Do not make a copy or forward<br>it to anyone. This footnote also confirms that this email message has been swept for the<br>presence of computer viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 Lower Mount Street, Dublin 2, Ireland<br>Directors: B. Collins, G. Maclachlan (UK), N. Grierson (UK), J. Ennis (UK), D. Summers (UK).<br>Registered in Ireland, Company No. 370343, VAT Reg.No.IE6390343O<br>****************************************************************************************</pre>
