GitHub user nfergu opened a pull request:
https://github.com/apache/spark/pull/2438
[SPARK-3051] Support looking-up named accumulators in a registry
This proposal builds on SPARK-2380 (Support displaying accumulator values
in the web UI) to allow
named accumulables to be looked-up in a "registry", as opposed to having to
be passed to every
method that need to access them. See the JIRA ticket (SPARK-3051) for some
more details, including
an example use case.
An AccumulableRegistry object is provided to allow accumulables to be
looked-up by name at task
execution time. This requires named accumulables to be broadcast to the
executors before the task
is executed. This is taken care of in the DAGScheduler in much the same way
as for Tasks.
Accumulables were already stored in thread-local variables in the
Accumulators object, so exposing
these in the registry was simply a matter of wrapping this object, and
keying the accumulables by
name (they were previously keyed only by ID).
Note that Accumulables cannot be looked-up from the registry in the driver
program; they can only be
obtained while an operation is being performed on an RDD (a task is
executing).
One important thing to note in the implementation is that it is important
that we we deserialize
any named accumulators that have been broadcast before those that are
explicitly passed with the task
as we want the explicitly-passed ones to override the broadcast ones
(otherwise the explicitly passed
ones would not work). This may be a little brittle, but there are tests (in
AccumulatorSuite) that will break
if this is ever changed.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nfergu/spark accumreg
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/2438.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2438
----
commit 7fae6af05d0ca91ac5444161a1e16b716ac3af83
Author: Neil Ferguson <[email protected]>
Date: 2014-08-14T22:37:31Z
First attempt at allowing named accumulators to be looked-up in an
AccumulableRegistry. Needs clean-up, testing, and documentation.
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
commit 7ae16d69c30baad4609c226c79625958170198be
Author: Neil Ferguson <[email protected]>
Date: 2014-09-16T21:47:26Z
Added documentation, testing, and some fixes for functionality to allow
named accumulators to be looked-up in an AccumulableRegistry.
Conflicts:
core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
commit a89c51daeb3f172d2a3116afff48dc034e4d50c9
Author: Neil Ferguson <[email protected]>
Date: 2014-09-17T21:53:13Z
Small documentation tweaks for AccumulableRegistry functionality
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]