Github user attilapiros commented on a diff in the pull request:
https://github.com/apache/spark/pull/21971#discussion_r207898987
--- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
@@ -61,6 +62,36 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends
Serializable with Loggi
(index, data) => results(index) = data, results.flatten.toSeq)
}
+
+ /**
+ * Returns a future of an aggregation across the RDD.
+ *
+ * @see [[RDD.aggregate]] which is the synchronous version of this
method.
+ */
+ def aggregateAsync[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U)
=> U): FutureAction[U] =
+ self.withScope {
--- End diff --
IMHO one reason could be the rule of fail fast and early. As the cloning
uses serialisation/deserialisation which anyway needed for sending the neutral
element to the executors this way serialisation of the neutral element is
tested when the operator is specified not during the lazy execution in the
middle of a long chain of operations.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]