Hi guys,
I'm confused by how to tell if a Crunch Pipeline has succeeded. I suspect
that the current implementation is not correct, but want to verify that
before I file a Jira and submit a patch.
Here's what I'm trying to do: I'd like to wrap a crunch job inside another
Java program (so that I can execute it in a larger scheduling/workflow tool
like Azkaban or Oozie). So, I'd like to build and execute a Crunch
pipeline, then test if it succeeded.
I'd like to do something like:
PipelineResult result = pipeline.run();
if (!result.succeeded())
{
throw new Exception("Job failed: " + result.toString() + "\n");
}
return 0;
Unfortunately, this doesn't work. Looking at the implementation of
PipelineResult, here is how the succeeded method is implemented:
public boolean succeeded() {
return !stageResults.isEmpty();
}
Looking at where the PipelineResult object is created (in MRExecutor), it
looks like the list of stage results will be non-empty regardless of
whether the job succeeded or not:
...
result = new PipelineResult(stages);
if (killSignal.getCount() != 0) {
status.set(Status.KILLED);
} else {
status.set(result.succeeded() ? Status.SUCCEEDED : Status.FAILED);
}
...
So, even if one of the stages fails, the result of executing the pipeline
will be "succeeded."
I can't find an alternative way to test if a failure occurred during a
pipeline execution. Is there something that I'm missing, or is the
implementation incorrect?
-- Joe