[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the accumulator n...

HyukjinKwon Mon, 17 Apr 2017 01:52:29 -0700

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17596#discussion_r111721127
  
    --- Diff: core/src/main/scala/org/apache/spark/util/AccumulatorV2.scala ---
    @@ -154,18 +154,22 @@ abstract class AccumulatorV2[IN, OUT] extends 
Serializable {
     
       // Called by Java when serializing an object
       final protected def writeReplace(): Any = {
    -    if (atDriverSide) {
    +    val acc = if (atDriverSide) {
           if (!isRegistered) {
             throw new UnsupportedOperationException(
               "Accumulator must be registered before send to executor")
           }
           val copyAcc = copyAndReset()
           assert(copyAcc.isZero, "copyAndReset must return a zero value copy")
    -      copyAcc.metadata = metadata
           copyAcc
         } else {
    -      this
    +      val copyAcc = copy()
    --- End diff --
    
    I just took a look to help. It seems the cause here.
    It seems throws an exception as below:
    
    ```
    >>> from pyspark.accumulators import INT_ACCUMULATOR_PARAM
    >>>
    >>> acc1 = sc.accumulator(0, INT_ACCUMULATOR_PARAM)
    >>> sc.parallelize(xrange(100), 20).foreach(lambda x: acc1.add(x))
    17/04/17 17:10:39 ERROR DAGScheduler: Failed to update accumulators for 
task 2
    java.lang.ClassCastException: org.apache.spark.util.CollectionAccumulator 
cannot be cast to org.apache.spark.api.python.PythonAccumulatorV2
        at 
org.apache.spark.api.python.PythonAccumulatorV2.merge(PythonRDD.scala:903)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1105)
        at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1097)
        at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1097)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1173)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1716)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1674)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1663)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    ```
    
    It seems because `copy()` here returns a `CollectionAccumulator` from 
`PythonAccumulatorV2`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17596: [SPARK-12837][CORE] Do not send the accumulator n...

Reply via email to