Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12612#discussion_r61103191
  
    --- Diff: core/src/main/scala/org/apache/spark/NewAccumulator.scala ---
    @@ -0,0 +1,333 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark
    +
    +import java.{lang => jl}
    +import java.io.{ObjectInputStream, ObjectOutputStream}
    +import java.util.concurrent.atomic.AtomicLong
    +import javax.annotation.concurrent.GuardedBy
    +
    +import org.apache.spark.scheduler.AccumulableInfo
    +import org.apache.spark.util.Utils
    +
    +
    +private[spark] case class AccumulatorMetadata(
    +    id: Long,
    +    name: Option[String],
    +    countFailedValues: Boolean) extends Serializable
    +
    +
    +abstract class NewAccumulator[IN, OUT] extends Serializable {
    +  private[spark] var metadata: AccumulatorMetadata = _
    +  private[this] var atDriverSide = true
    +
    +  private[spark] def register(
    +      sc: SparkContext,
    +      name: Option[String] = None,
    +      countFailedValues: Boolean = false): Unit = {
    +    if (this.metadata != null) {
    +      throw new IllegalStateException("Cannot register an Accumulator 
twice.")
    +    }
    +    this.metadata = AccumulatorMetadata(AccumulatorContext.newId(), name, 
countFailedValues)
    +    AccumulatorContext.register(this)
    +    sc.cleaner.foreach(_.registerAccumulatorForCleanup(this))
    +  }
    +
    +  final def isRegistered: Boolean =
    +    metadata != null && 
AccumulatorContext.originals.containsKey(metadata.id)
    +
    +  private def assertMetadataNotNull(): Unit = {
    +    if (metadata == null) {
    +      throw new IllegalAccessError("The metadata of this accumulator has 
not been assigned yet.")
    +    }
    +  }
    +
    +  def id: Long = {
    +    assertMetadataNotNull()
    +    metadata.id
    +  }
    +
    +  def name: Option[String] = {
    +    assertMetadataNotNull()
    +    metadata.name
    +  }
    +
    +  def countFailedValues: Boolean = {
    +    assertMetadataNotNull()
    +    metadata.countFailedValues
    +  }
    +
    +  private[spark] def toInfo(update: Option[Any], value: Option[Any]): 
AccumulableInfo = {
    +    val isInternal = 
name.exists(_.startsWith(InternalAccumulator.METRICS_PREFIX))
    +    new AccumulableInfo(id, name, update, value, isInternal, 
countFailedValues)
    +  }
    +
    +  final private[spark] def isAtDriverSide: Boolean = atDriverSide
    +
    +  def copyAndReset(): NewAccumulator[IN, OUT]
    +
    +  def isZero(): Boolean
    +
    +  def add(v: IN): Unit
    +
    +  def +=(v: IN): Unit = add(v)
    +
    +  def merge(other: NewAccumulator[IN, OUT]): Unit
    +
    +  final def value: OUT = {
    +    if (atDriverSide) {
    +      localValue
    +    } else {
    +      throw new UnsupportedOperationException("Can't read accumulator 
value in task")
    +    }
    +  }
    +
    +  def localValue: OUT
    +
    +  // Called by Java when serializing an object
    +  private def writeObject(out: ObjectOutputStream): Unit = 
Utils.tryOrIOException {
    +    if (atDriverSide) {
    +      if (!isRegistered) {
    +        throw new UnsupportedOperationException(
    +          "Accumulator must be registered before send to executor")
    +      }
    +      // TODO: this is wrong.
    --- End diff --
    
    This is really a big problem...
    
    We need some serialization hooks to support sending accumulator back from 
executors, and I tried 2 approaches but both failed:
    
    1. Add a writing hook, which resets the accumulator before send it from 
driver to executor. The problem is we can't just reset, the accumulator states 
should be kept at driver side. And the java serializing hook isn't flex enough 
to allow us do a copy or something. One possible workaround is to create an 
`AccumulatorWrapper` so that we can have full control of accumulator 
serialization. But this will complicate the hierarchy.
    
    2. Add a reading hook, which resets the accumlator after deserialization. 
Unfortunately it doesn't work when `Accumulator` is a base class. By the time 
`readObject` is called, child's fields are not initialized yet. Calling `reset` 
here is no-op, the values of child's fileds will be filled later.
    
    Generally speaking, `writeObject` and `readObject` is not a good 
serialization hook. We'd either figure out some tricky to workaround it, or 
find out other better serialization hooks. (or do not send accumulators back)
    
    @rxin any ideas?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to