Thanks. I’ll look into that. Also, I just noticed that as of 0.8.2, crunch has a public cleanup() on the Pipeline interface. I should be able to use that, as my code was just updated to that version.
On Mar 29, 2014, at 5:58 AM, Josh Wills <[email protected]> wrote: > IIRC (I'm away from my computer) we added the ability to add arbitrary hooks > that would always be executed at the end of a pipeline run to the > PipelineExecution interface-- the one that is returned by runAsync), which > could be used to ensure that the temp directories were cleaned up no matter > what happened on the run. Does that work for this problem? > > > On Fri, Mar 28, 2014 at 10:05 AM, Stephen Durfey <[email protected]> wrote: > If I have a scenario where I have already called Pipeline#run (and some > temporary directories were created by Crunch during the run), and have > continued on to do some additional processing (created some new PCollection’s > and specified a write location), and an exception occurs in my code, outside > of the pipeline, before Pipeline#run is called again, I would need a way to > ensure the temporary directories created in my initial run are always cleaned > up. I could call Pipeline#done, which calls cleanup() in MRPipeline, but it > also calls run(). However, I would prefer not to have run() called at all, > due to the exception thrown in my code. > > Would it be possible to make cleanup() public in the Pipeline interface so > that can be used to clean up any temp directories created by the pipeline? >
