If I have a scenario where I have already called Pipeline#run (and some 
temporary directories were created by Crunch during the run), and have 
continued on to do some additional processing (created some new PCollection’s 
and specified a write location), and an exception occurs in my code, outside of 
the pipeline, before Pipeline#run is called again, I would need a way to ensure 
the temporary directories created in my initial run are always cleaned up. I 
could call Pipeline#done, which calls cleanup() in MRPipeline, but it also 
calls run(). However, I would prefer not to have run() called at all, due to 
the exception thrown in my code.

Would it be possible to make cleanup() public in the Pipeline interface so that 
can be used to clean up any temp directories created by the pipeline?

Reply via email to