[GitHub] [spark] mridulm commented on a change in pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

GitBox Thu, 19 Aug 2021 09:25:40 -0700


mridulm commented on a change in pull request #33644:
URL: https://github.com/apache/spark/pull/33644#discussion_r692293917




##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,22 @@ abstract class RDD[T: ClassTag](
           (i, iter) => iter.map((i % curNumPartitions, _))

Review comment:
       Actually, I like @HyukjinKwon's proposal.
   A variant of it would be add a new `treeAggregate` method to RDD which has 
all the parameters explicitly specified (no defaults, no currying).
   And have everything else delegate to it.
   
   This should take care of @srowen's concerns about binary/source 
compatibility (existing methods remain as is), while introducing a new method 
in scala api which allows for per method customization (and not global config 
affecting all aggregates).
   
   Thoughts ?
   
   In RDD:
   ```
     def treeAggregate[U: ClassTag](zeroValue: U)(
         seqOp: (U, T) => U,
         combOp: (U, U) => U,
         depth: Int = 2): U = withScope {
       treeAggregate(zeroValue, seqOp, combOp, depth, false)
     }
   
     def treeAggregate[U: ClassTag](
         zeroValue: U, 
         seqOp: (U, T) => U,
         combOp: (U, U) => U,
         depth: Int,
         finalAggregateOnExecutor: Boolean): U = withScope {
   
     // modified method taking finalAggregateOnExecutor into account
   
     }
   ```
   
   java api to mirror and delegate to the scala api as appropriate.
   
   Thoughts @HyukjinKwon, @srowen, @akpatnam25, @venkata91 ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] mridulm commented on a change in pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

Reply via email to