On 2016-01-06 17:19, Josh Wills wrote:
Hi Josh,
I added a getPipelineResult() method to the MaterializableIterable in
CRUNCH-400: does it not do what you want?
https://github.com/apache/crunch/commit/ded504eb133fa0814e2d90ff2a662e72a67e04bb
[2]
It indeed gives access to the PipelineResult, but I find it error-prone:
- It is hidden in an Iterable which needs to be cast
- The code dealing with the iterable is most likely business code which
does not care at all about infrastructure concerns
- One has to wait until iterator() is called to get the result but
cannot be notified
I might be wrong but I believe that collecting all the counters of a
pipeline is a common pattern.
My team has been burned several times by "missing counters" (dev not
knowing the MaterializeIterable trick, oversight, calling
getPipelineResult before iterator() is actually called, things just
"worked" until they moved a call to run after the materialize, etc.).
I am wondering how other Crunch users are dealing with counter
collection. Do they always carefully extract the PipelineResult from
each iterable after usage ? Are they happy with this pattern ? Did they
hack something like my HyperthymesticMRPipeline or something else ?
Regards,
Clément