[GitHub] [spark] hvanhovell commented on a change in pull request #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

GitBox Mon, 17 Jun 2019 00:53:57 -0700

hvanhovell commented on a change in pull request #24706: [SPARK-23128][SQL] A 
new approach to do adaptive execution in Spark SQL
URL: https://github.com/apache/spark/pull/24706#discussion_r294172261


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ##########
 @@ -824,8 +775,48 @@ case class WholeStageCodegenExec(child: SparkPlan)(val 
codegenStageId: Int)
 
 /**
  * Find the chained plans that support codegen, collapse them together as 
WholeStageCodegen.
+ *
+ * The `codegenStageCounter` generates ID for codegen stages within a query 
plan.
+ * It does not affect equality, nor does it participate in destructuring 
pattern matching
+ * of WholeStageCodegenExec.
+ *
+ * This ID is used to help differentiate between codegen stages. It is 
included as a part
+ * of the explain output for physical plans, e.g.
+ *
+ * == Physical Plan ==
+ * *(5) SortMergeJoin [x#3L], [y#9L], Inner
+ * :- *(2) Sort [x#3L ASC NULLS FIRST], false, 0
+ * :  +- Exchange hashpartitioning(x#3L, 200)
+ * :     +- *(1) Project [(id#0L % 2) AS x#3L]
+ * :        +- *(1) Filter isnotnull((id#0L % 2))
+ * :           +- *(1) Range (0, 5, step=1, splits=8)
+ * +- *(4) Sort [y#9L ASC NULLS FIRST], false, 0
+ *    +- Exchange hashpartitioning(y#9L, 200)
+ *       +- *(3) Project [(id#6L % 2) AS y#9L]
+ *          +- *(3) Filter isnotnull((id#6L % 2))
+ *             +- *(3) Range (0, 5, step=1, splits=8)
+ *
+ * where the ID makes it obvious that not all adjacent codegen'd plan 
operators are of the
+ * same codegen stage.
+ *
+ * The codegen stage ID is also optionally included in the name of the 
generated classes as
+ * a suffix, so that it's easier to associate a generated class back to the 
physical operator.
+ * This is controlled by SQLConf: spark.sql.codegen.useIdInClassName
+ *
+ * The ID is also included in various log messages.
+ *
+ * Within a query, a codegen stage in a plan starts counting from 1, in 
"insertion order".
+ * WholeStageCodegenExec operators are inserted into a plan in depth-first 
post-order.
+ * See CollapseCodegenStages.insertWholeStageCodegen for the definition of 
insertion order.
+ *
+ * 0 is reserved as a special ID value to indicate a temporary 
WholeStageCodegenExec object
+ * is created, e.g. for special fallback handling when an existing 
WholeStageCodegenExec
+ * failed to generate/compile code.
  */
-case class CollapseCodegenStages(conf: SQLConf) extends Rule[SparkPlan] {
+case class CollapseCodegenStages(
+    conf: SQLConf,
+    codegenStageCounter: AtomicInteger = new AtomicInteger(0))
 
 Review comment:
   Yeah, we could have made that part of the class body. OTOH this also works 
:)...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] hvanhovell commented on a change in pull request #24706: [SPARK-23128][SQL] A new approach to do adaptive execution in Spark SQL

Reply via email to