[GitHub] [spark] c21 commented on a change in pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

GitBox Wed, 05 May 2021 21:34:16 -0700


c21 commented on a change in pull request #32430:
URL: https://github.com/apache/spark/pull/32430#discussion_r627071113




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala
##########
@@ -105,13 +106,20 @@ package object debug {
    * @param plan the query plan for codegen
    * @return Sequence of WholeStageCodegen subtrees and corresponding codegen
    */
-  def codegenStringSeq(plan: SparkPlan): Seq[(String, String, ByteCodeStats)] 
= {
+  def codegenStringSeq(
+      plan: SparkPlan,
+      sparkSession: SparkSession): Seq[(String, String, ByteCodeStats)] = {
     val codegenSubtrees = new 
collection.mutable.HashSet[WholeStageCodegenExec]()
 
     def findSubtrees(plan: SparkPlan): Unit = {
       plan foreach {
         case s: WholeStageCodegenExec =>
           codegenSubtrees += s
+        case p: AdaptiveSparkPlanExec =>
+          // Find subtrees from original input plan of AQE.

Review comment:
       Yes, this doesn't match the actual plan for `df.explain("codegen")` if 
`df` is executed already. The problem is the final plan 
`AdaptiveSparkPlanExec.executedPlan` has `ShuffleQueryStageExec` to wrap the 
whole sub-plan under that shuffle.
   
   Example:
   
   
   ```
   spark.range(5).select(col("id").as("key"), 
col("id").as("value")).groupBy('key).agg(max('value))
   ```
   
   ```
   AdaptiveSparkPlan isFinalPlan=true
   +- == Final Plan ==
      *(2) HashAggregate(keys=[key#2L], functions=[max(value#3L)], 
output=[key#2L, max(value)#9L])
      +- CustomShuffleReader coalesced
         +- ShuffleQueryStage 0
            +- Exchange hashpartitioning(key#2L, 5), ENSURE_REQUIREMENTS, 
[id=#28]
               +- *(1) HashAggregate(keys=[key#2L], 
functions=[partial_max(value#3L)], output=[key#2L, max#13L])
                  +- *(1) Project [id#0L AS key#2L, id#0L AS value#3L]
                     +- *(1) Range (0, 5, step=1, splits=2)
   ```
   
   The partial aggregate `HashAggregate` is wrapped inside `ShuffleQueryStage`, 
so cannot be pattern matched to do the explain. One way to workaround is to add 
pattern matching for `ShuffleQueryStageExec` as well. But anyway we need to 
re-run the preparation physical plan rules if 
`AdaptiveSparkPlan.isFinalPlan=false`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on a change in pull request #32430: [SPARK-35133][SQL] Explain codegen works with AQE

Reply via email to