Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/11105#discussion_r55434271
--- Diff: core/src/main/scala/org/apache/spark/Accumulable.scala ---
@@ -53,42 +54,79 @@ import org.apache.spark.util.Utils
* for system and time metrics like serialization
time or bytes spilled,
* and false for things with absolute values like
number of input rows.
* This should be used for internal metrics only.
- * @tparam R the full accumulated data (result type)
+ * @param consistent if this [[Accumulable]] is consistent. Consistent
[[Accumulable]]s will only
+ * have values added once for each RDD/Partition
execution combination. This
+ * prevents double counting on reevaluation. Partial
evaluation of a partition
+ * will not increment a consistent [[Accumulable]].
Consistent [[Accumulable]]s
+ * are currently experimental and the behaviour may
change in future versions.
+ * Consistent [[Accumulable]]s can only be added to
inside is
+ * [[MapPartitionsRDD]]s and are designed for counting
"data properties".
+ * @tparam R the full accumulated data
* @tparam T partial data that can be added in
*/
-class Accumulable[R, T] private (
+class Accumulable[R, T] private[spark] (
val id: Long,
// SI-8813: This must explicitly be a private val, or else scala 2.11
doesn't compile
@transient private val initialValue: R,
param: AccumulableParam[R, T],
val name: Option[String],
internal: Boolean,
- private[spark] val countFailedValues: Boolean)
+ private[spark] val countFailedValues: Boolean,
+ private[spark] val consistent: Boolean)
extends Serializable {
private[spark] def this(
- initialValue: R,
+ @transient initialValue: R,
--- End diff --
I think this annotation isn't necssary in helper constructors (since its
just a method param, not doubling as a variable declaration), though I am not
positive. At least, it wasn't there before, so I guess I'd like some evidence
that its necessary now :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]