Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20224#discussion_r163449646 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -365,6 +388,26 @@ case class WholeStageCodegenExec(child: SparkPlan) extends UnaryExecNode with Co "pipelineTime" -> SQLMetrics.createTimingMetric(sparkContext, WholeStageCodegenExec.PIPELINE_DURATION_METRIC)) + /** + * ID for codegen stages within a query plan. + * It does not affect equality, nor does it participate in destructuring pattern matching. + * + * Within a query, a codegen stage in a plan starts counting from 1, in "insertion order". + * WholeStageCodegenExec operators are inserted into a plan in depth-first post-order. + * See CollapseCodegenStages.insertWholeStageCodegen for the definition of insertion order. + * + * 0 is reserved as a special ID value to indicate a temporary WholeStageCodegenExec object + * is created, e.g. for special fallback handling when an existing WholeStageCodegenExec + * failed to generate/compile code. + */ + val codegenStageId = WholeStageCodegenId.getNextStageId() --- End diff -- my major concern is, what if we transform the physical plan tree after adding `WholeStageCodegenExec`? e.g. during transformation we may copy a plan code, then a copied `WholeStageCodegenExec` will have a different stage id.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org