Spark accumulators

Nithin Asokan Thu, 03 Sep 2015 05:26:08 -0700

We are currently testing a few capabilities using Spark and one thing we
noticed in Spark is they don't list any user defined accumulators on web
UI.


On MapReduce I would imagine counters being displayed on the job page,
however on a SparkPipeline I was only able to pull counter information from
PipelineResult#getStageResult().

I think the reason these accumulators are not visible on web UI is because
crunch does not name these accumulators. Spark expects an accumulator to
have a name to be visible on the UI.

https://github.com/apache/crunch/blob/apache-crunch-0.13.0/crunch-spark/src/main/java/org/apache/crunch/impl/spark/SparkRuntime.java#L125-L126

https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/java/JavaSparkContext.scala#L616-L624
(accumulator
API with Name)

I would like to know if it's possible in crunch to name these accumulators
so they are available in web UI. This will give us an experience where
users can monitor/watch accumulators from web UI to obtain key information
about their jobs.

Thanks,
Nithin

Spark accumulators

Reply via email to