Github user HyukjinKwon commented on a diff in the pull request:
https://github.com/apache/spark/pull/17596#discussion_r111721127
--- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
@@ -154,18 +154,22 @@ abstract class AccumulatorV2[IN, OUT] extends
Serializable {
// Called by Java when serializing an object
final protected def writeReplace(): Any = {
- if (atDriverSide) {
+ val acc = if (atDriverSide) {
if (!isRegistered) {
throw new UnsupportedOperationException(
"Accumulator must be registered before send to executor")
}
val copyAcc = copyAndReset()
assert(copyAcc.isZero, "copyAndReset must return a zero value copy")
- copyAcc.metadata = metadata
copyAcc
} else {
- this
+ val copyAcc = copy()
--- End diff --
I just took a look to help. It seems the cause here.
It seems throws an exception as below:
```
>>> from pyspark.accumulators import INT_ACCUMULATOR_PARAM
>>>
>>> acc1 = sc.accumulator(0, INT_ACCUMULATOR_PARAM)
>>> sc.parallelize(xrange(100), 20).foreach(lambda x: acc1.add(x))
17/04/17 17:10:39 ERROR DAGScheduler: Failed to update accumulators for
task 2
java.lang.ClassCastException: org.apache.spark.util.CollectionAccumulator
cannot be cast to org.apache.spark.api.python.PythonAccumulatorV2
at
org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:903)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1105)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1097)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1097)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1173)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1716)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1674)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1663)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
```
It seems because `copy()` here returns a `CollectionAccumulator` from
`PythonAccumulatorV2`.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]