On 2016-01-06 17:19, Josh Wills wrote:

Hi Josh,

I added a getPipelineResult() method to the MaterializableIterable in
CRUNCH-400: does it not do what you want?
https://github.com/apache/crunch/commit/ded504eb133fa0814e2d90ff2a662e72a67e04bb
[2]

It indeed gives access to the PipelineResult, but I find it error-prone:

 - It is hidden in an Iterable which needs to be cast

- The code dealing with the iterable is most likely business code which does not care at all about infrastructure concerns

- One has to wait until iterator() is called to get the result but cannot be notified


I might be wrong but I believe that collecting all the counters of a pipeline is a common pattern.

My team has been burned several times by "missing counters" (dev not knowing the MaterializeIterable trick, oversight, calling getPipelineResult before iterator() is actually called, things just "worked" until they moved a call to run after the materialize, etc.).

I am wondering how other Crunch users are dealing with counter collection. Do they always carefully extract the PipelineResult from each iterable after usage ? Are they happy with this pattern ? Did they hack something like my HyperthymesticMRPipeline or something else ?


Regards,

Clément



Reply via email to