cloud-fan commented on a change in pull request #25295: [SPARK-28560][SQL] Optimize shuffle reader to local shuffle reader when smj converted to bhj in adaptive execution URL: https://github.com/apache/spark/pull/25295#discussion_r323609170
########## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/ReduceNumShufflePartitions.scala ########## @@ -180,25 +180,45 @@ case class ReduceNumShufflePartitions(conf: SQLConf) extends Rule[SparkPlan] { case class CoalescedShuffleReaderExec( child: QueryStageExec, - partitionStartIndices: Array[Int]) extends UnaryExecNode { + partitionStartIndices: Array[Int], + var isLocal: Boolean = false) extends UnaryExecNode { Review comment: without local shuffle reader, a task of `ShuffledRDD` reads the shuffle blocks `map1-reduce1`, `map2-reduce1`, etc. With local shuffle reader, the task reads `map1-reduce1`, `map1-reduce2`, etc. The task output data size is different and we can't use the algorithm in `ReduceNumShufflePartitions` anymore. Furthermore, the RDD numPartitions also becomes different after switching to local shuffle reader, how can we apply the `ReduceNumShufflePartitions`? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org