Thanks. I’ll look into that. Also, I just noticed that as of 0.8.2, crunch has 
a public cleanup() on the Pipeline interface. I should be able to use that, as 
my code was just updated to that version. 

On Mar 29, 2014, at 5:58 AM, Josh Wills <[email protected]> wrote:

> IIRC (I'm away from my computer) we added the ability to add arbitrary hooks 
> that would always be executed at the end of a pipeline run to the 
> PipelineExecution interface-- the one that is returned by runAsync), which 
> could be used to ensure that the temp directories were cleaned up no matter 
> what happened on the run. Does that work for this problem?
> 
> 
> On Fri, Mar 28, 2014 at 10:05 AM, Stephen Durfey <[email protected]> wrote:
> If I have a scenario where I have already called Pipeline#run (and some 
> temporary directories were created by Crunch during the run), and have 
> continued on to do some additional processing (created some new PCollection’s 
> and specified a write location), and an exception occurs in my code, outside 
> of the pipeline, before Pipeline#run is called again, I would need a way to 
> ensure the temporary directories created in my initial run are always cleaned 
> up. I could call Pipeline#done, which calls cleanup() in MRPipeline, but it 
> also calls run(). However, I would prefer not to have run() called at all, 
> due to the exception thrown in my code.
> 
> Would it be possible to make cleanup() public in the Pipeline interface so 
> that can be used to clean up any temp directories created by the pipeline?
> 

Reply via email to