srowen commented on a change in pull request #33644:
URL: https://github.com/apache/spark/pull/33644#discussion_r683803934
##########
File path: core/src/main/scala/org/apache/spark/internal/config/package.scala
##########
@@ -315,6 +315,12 @@ package object config {
.bytesConf(ByteUnit.MiB)
.createOptional
+ private[spark] val ENABLE_EXECUTOR_TREE_AGGREGATE =
ConfigBuilder("spark.executor.treeAggregate")
Review comment:
I don't think I'd add another config for this - it becomes yet another
of 100 things that 99% of users won't know about. I'd imagine there is a
generally better or worse behavior we can agree on
##########
File path: core/src/main/scala/org/apache/spark/rdd/RDD.scala
##########
@@ -1233,6 +1233,21 @@ abstract class RDD[T: ClassTag](
(i, iter) => iter.map((i % curNumPartitions, _))
}.foldByKey(zeroValue, new
HashPartitioner(curNumPartitions))(cleanCombOp).values
}
+ if (conf.get(ENABLE_EXECUTOR_TREE_AGGREGATE) &&
partiallyAggregated.partitions.length > 1) {
+ // define a new partitioner that results in only 1 partition
+ val constantPartitioner = new Partitioner {
+ override def numPartitions: Int = 1
+
+ override def getPartition(key: Any): Int = 0
+ }
+ // map the partially aggregated rdd into a key-value rdd
+ // do the computation in the single executor with one partition
+ // get the new RDD[U]
+ partiallyAggregated = partiallyAggregated
Review comment:
I sort of get the argument about this being more robust on the executor
side, but, this does strictly more work - there is always an additional hop
from the last agg to the driver now. The amount of work done is no less. I am
not convinced this is generally helping, but, it is mostly a guess based on the
previous sentence - it's more work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]