[GitHub] [spark] akpatnam25 commented on a change in pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

GitBox Thu, 12 Aug 2021 14:46:35 -0700


akpatnam25 commented on a change in pull request #33644:
URL: https://github.com/apache/spark/pull/33644#discussion_r688106700




##########
File path: project/MimaExcludes.scala
##########
@@ -104,6 +104,10 @@ object MimaExcludes {
     
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.shuffle.api.SingleSpillShuffleMapOutputWriter.transferMapSpillFile"),
     
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.SingleSpillShuffleMapOutputWriter.transferMapSpillFile"),
     
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.ShuffleMapOutputWriter.commitAllPartitions")
+
+    // [SPARK-36419][CORE] Optionally move final aggregation in 
RDD.treeAggregate to executor
+    
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.api.java.JavaRDDLike.treeAggregate")

Review comment:
       @srowen i dont think we break api compatibility on the scala side 
because we use a default param anyways, so the user would not have to adapt to 
the new signature. So i am proposing that we keep the scala file in the mima 
excludes and not worry about the binary incompatibility for that.
   
   On the java side, we will be breaking api compatibility, and we can add a 
method there that is overloaded. This will allow us to not break compatibility. 
There's already 2 `treeAggregate` methods in the javaRdd, so roughly speaking 
we will just add a 3rd one. Something like this: 
   ```
     /**
      * Aggregates the elements of this RDD in a multi-level tree pattern.
      *
      * @param depth suggested depth of the tree
      * @see [[org.apache.spark.api.java.JavaRDDLike#aggregate]]
      */
     def treeAggregate[U](
       zeroValue: U,
       seqOp: JFunction2[U, T, U],
       combOp: JFunction2[U, U, U],
       depth: Int): U = {
       rdd.treeAggregate(zeroValue)(seqOp, combOp, depth)(fakeClassTag[U])
     }
   
     /**
      * `org.apache.spark.api.java.JavaRDDLike.treeAggregate` with suggested 
depth 2.
      */
     def treeAggregate[U](
       zeroValue: U,
       seqOp: JFunction2[U, T, U],
       combOp: JFunction2[U, U, U]): U = {
       treeAggregate(zeroValue, seqOp, combOp, 2)
     }
   
     /**
      * Aggregates the elements of this RDD in a multi-level tree pattern and 
provides the option of doing 
      * the final aggregation on an executor
      *
      * @param depth suggested depth of the tree
      * @see [[org.apache.spark.api.java.JavaRDDLike#aggregate]]
      */
     def treeAggregate[U](
       zeroValue: U,
       seqOp: JFunction2[U, T, U],
       combOp: JFunction2[U, U, U],
       depth: Int,
       finalAggregateOnExecutor: Boolean): U = {
       rdd.treeAggregate(zeroValue)(seqOp, combOp, depth, 
finalAggregateOnExecutor)(fakeClassTag[U])
     }
   ```
   Then we can remove the mima exludes for the javaRdd class. what do you 
think? 
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] akpatnam25 commented on a change in pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

Reply via email to