Thanks Mike.  That's what I was thinking, but I was wondering if (hoping!) 
there was something already to do it :)

Thanks,
John.

-----Original Message-----
From: Mike Sukmanowsky [mailto:[email protected]] 
Sent: 27 March 2013 14:05
To: [email protected]
Subject: Re: Don't process already processed files?

It's probably less work to have some kind of a script control Pig execution and 
keep track of what's been processed and pass in an input path to your Pig 
script dynamically.  For example, you could create a control.py/rb/shfile which 
would somehow keep track of what's been processed (maybe a simple file) and 
then figure out the input path to pass to pig during execution via a parameter: 
pig --param inputpath="/some/dynamic/input/path/for/pig".

You'd then setup your cron job to run your control script instead of the Pig 
script directly.


On Wed, Mar 27, 2013 at 6:24 AM, John Farrelly < 
[email protected]> wrote:

> Hi there,
>
> In our system, we have multiple pig scripts that run against a 
> particular HDFS directory.  The pig scripts can run at different 
> times, and are scheduled to run regularly.  Is there a way to point a 
> pig script at the same directory for multiple executions, but make 
> sure that it only processed new files that it hasn't seen before?  I 
> was thinking of using a custom PathFilter for my loader, but I thought 
> I would ask to see if there is already a way to do this, rather than me 
> reinventing the wheel (!).
>
> Thanks,
> John.
> </pre>****************************************************************
> ************************<br>This email and any files transmitted with 
> are confidential and intended solely for the<br>use of the individual 
> or entity to whom they are addressed.  If you have received 
> this<br>email in error then please delete it and notify the sender. Do 
> not make a copy or forward<br>it to anyone.  This footnote also 
> confirms that this email message has been swept for the<br>presence of 
> computer viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 
> Lower Mount Street, Dublin 2, Ireland<br>Directors: B. Collins, G.
> Maclachlan (UK), N. Grierson (UK), J. Ennis (UK), D. Summers 
> (UK).<br>Registered in Ireland, Company No. 370343, VAT 
> Reg.No.IE6390343O<br>*************************************************
> ***************************************</pre>
>



--
Mike Sukmanowsky

Product Lead, http://parse.ly
989 Avenue of the Americas, 3rd Floor
New York, NY  10018
p: +1 (416) 953-4248
e: [email protected]
</pre>****************************************************************************************<br>This
 email and any files transmitted with are confidential and intended solely for 
the<br>use of the individual or entity to whom they are addressed.  If you have 
received this<br>email in error then please delete it and notify the sender. Do 
not make a copy or forward<br>it to anyone.  This footnote also confirms that 
this email message has been swept for the<br>presence of computer 
viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 Lower Mount 
Street, Dublin 2, Ireland<br>Directors: B. Collins, G. Maclachlan (UK), N. 
Grierson (UK), J. Ennis (UK), D. Summers (UK).<br>Registered in Ireland, 
Company No. 370343, VAT 
Reg.No.IE6390343O<br>****************************************************************************************</pre>

Reply via email to