[GitHub] spark pull request #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync

attilapiros Mon, 06 Aug 2018 06:54:27 -0700

Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21971#discussion_r207898987
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/AsyncRDDActions.scala ---
    @@ -61,6 +62,36 @@ class AsyncRDDActions[T: ClassTag](self: RDD[T]) extends 
Serializable with Loggi
           (index, data) => results(index) = data, results.flatten.toSeq)
       }
     
    +
    +  /**
    +   * Returns a future of an aggregation across the RDD.
    +   *
    +   * @see [[RDD.aggregate]] which is the synchronous version of this 
method.
    +   */
    +  def aggregateAsync[U](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) 
=> U): FutureAction[U] =
    +    self.withScope {
    --- End diff --
    
    IMHO one reason could be the rule of fail fast and early. As the cloning 
uses serialisation/deserialisation which anyway needed for sending the neutral 
element to the executors this way serialisation of the neutral element is 
tested when the operator is specified not during the lazy execution in the 
middle of a long chain of operations.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21971: [SPARK-24947] [Core] aggregateAsync and foldAsync

Reply via email to