cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] 
Optimize shuffle reader to local shuffle reader when smj converted to bhj in 
adaptive execution
URL: https://github.com/apache/spark/pull/25295#discussion_r323609170
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala
 ##########
 @@ -180,25 +180,45 @@ case class ReduceNumShufflePartitions(conf: SQLConf) 
extends Rule[SparkPlan] {
 
 case class CoalescedShuffleReaderExec(
     child: QueryStageExec,
-    partitionStartIndices: Array[Int]) extends UnaryExecNode {
+    partitionStartIndices: Array[Int],
+    var isLocal: Boolean = false) extends UnaryExecNode {
 
 Review comment:
   without local shuffle reader, a task of `ShuffledRDD` reads the shuffle 
blocks `map1-reduce1`, `map2-reduce1`, etc. With local shuffle reader, the task 
reads `map1-reduce1`, `map1-reduce2`, etc. The task output data size is 
different and we can't use the algorithm in `ReduceNumShufflePartitions` 
anymore.
   
   Furthermore, the RDD numPartitions also becomes different after switching to 
local shuffle reader, how can we apply the `ReduceNumShufflePartitions`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to