[GitHub] [spark] akpatnam25 commented on a change in pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

GitBox Wed, 18 Aug 2021 09:53:48 -0700


akpatnam25 commented on a change in pull request #33644:
URL: https://github.com/apache/spark/pull/33644#discussion_r690532188




##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,22 @@ abstract class RDD[T: ClassTag](
           (i, iter) => iter.map((i % curNumPartitions, _))

Review comment:
       i don't think we can do that as we did not want to cause binary 
incompabilities. that was the whole reason i had to revert back to the config 
approach. refer to this thread: 
https://github.com/apache/spark/pull/33644#discussion_r687122613

##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,22 @@ abstract class RDD[T: ClassTag](
           (i, iter) => iter.map((i % curNumPartitions, _))

Review comment:
       Overloading also does not work due to the reasons given in the above 
thread

##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,22 @@ abstract class RDD[T: ClassTag](
           (i, iter) => iter.map((i % curNumPartitions, _))

Review comment:
       yes, we can add a new method with signature that looks like this: 
   ```
     def treeAggregate[U](
       zeroValue: U,
       seqOp: JFunction2[U, T, U],
       combOp: JFunction2[U, U, U],
       depth: Int,
       finalAggregateOnExecutor: Boolean): U = {
   ...
   ```
   you're right, we dont need to worry about Mima excludes then. 
   but that is not the problem. the problem is that this method will need to be 
overloaded by the other treeAggregate methods. for example, the other 
treeAggregate methods will have to call this version of treeAggregate. this is 
an issue because of zeroValue:U is only inspected by the compiler to resolve 
the overloaded methods. Due to that the compiler throws ambiguous reference to 
overloaded definition, and this cannot be resolved..
   
   there are only 2 workarounds: 
   1. create a new method with a different name but same signature with the new 
parameter. something like `treeAggregateExecutor` and have somewhat duplicate 
code in that method
   2. use config  approach 

##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,22 @@ abstract class RDD[T: ClassTag](
           (i, iter) => iter.map((i % curNumPartitions, _))

Review comment:
       yes, we can add a new method with signature that looks like this: 
   ```
     def treeAggregate[U](
       zeroValue: U,
       seqOp: JFunction2[U, T, U],
       combOp: JFunction2[U, U, U],
       depth: Int,
       finalAggregateOnExecutor: Boolean): U = {
   ...
   ```
   you're right, we dont need to worry about Mima excludes then. 
   but that is not the problem. the problem is that this method will need to be 
overloaded by the other treeAggregate methods. for example, the other 
treeAggregate methods will have to call this version of treeAggregate. this is 
an issue because of zeroValue:U is only inspected by the compiler to resolve 
the overloaded methods. Due to that the compiler throws ambiguous reference to 
overloaded definition, and this cannot be resolved..
   
   there are only 2 workarounds: 
   1. create a new method with a different name but same signature with the new 
parameter. something like `treeAggregateExecutor` and have somewhat duplicate 
code in that method. me and @venkata91  thought this approach was kind of ugly
   2. use config  approach 

##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,22 @@ abstract class RDD[T: ClassTag](
           (i, iter) => iter.map((i % curNumPartitions, _))

Review comment:
       yes, we can add a new method with signature that looks like this: 
   ```
     def treeAggregate[U](
       zeroValue: U,
       seqOp: JFunction2[U, T, U],
       combOp: JFunction2[U, U, U],
       depth: Int,
       finalAggregateOnExecutor: Boolean): U = {
   ...
   ```
   you're right, we dont need to worry about Mima excludes then. 
   but that is not the problem. the problem is that this method will need to be 
overloaded by the other treeAggregate methods. for example, the other 
treeAggregate methods will have to call this version of treeAggregate. this is 
an issue because of zeroValue:U is only inspected by the compiler to resolve 
the overloaded methods. Due to that the compiler throws ambiguous reference to 
overloaded definition, and this cannot be resolved..
   
   there are only 2 workarounds: 
   1. create a new method with a different name but same signature with the new 
parameter. something like `treeAggregateExecutor` and have somewhat duplicate 
code in that method. me and @venkata91  thought this approach was kind of ugly
   2. use config  approach  which is the current implementation




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] akpatnam25 commented on a change in pull request #33644: [SPARK-36419][CORE] Optionally move final aggregation in RDD.treeAggregate to executor

Reply via email to