sarutak opened a new pull request #25260: [SPARK-28520][SQL] WholeStageCodegen 
does not work property for LocalTableScanExec
URL: https://github.com/apache/spark/pull/25260
 
 
   Code is not generated for LocalTableScanExec although proper situations.
   
   If a LocalTableScanExec plan has the direct parent plan which supports 
WholeStageCodegen,
   the LocalTableScanExec plan also should be within a WholeStageCodegen domain.
   But code is not generated for LocalTableScanExec and InputAdapter is 
inserted for now.
   
   ```
   val df1 = spark.createDataset(1 to 10).toDF
   val df2 = spark.createDataset(1 to 10).toDF
   val df3 = df1.join(df2, df1("value") === df2("value"))
   df3.explain(true)
   
   ...
   
   == Physical Plan ==
   *(1) BroadcastHashJoin [value#1], [value#6], Inner, BuildRight
   :- LocalTableScan [value#1]                                             // 
LocalTableScanExec is not within a WholeStageCodegen domain
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
      +- LocalTableScan [value#6]
   ```
   
   ```
   scala> df3.queryExecution.executedPlan.children.head.children.head.getClass
   res4: Class[_ <: org.apache.spark.sql.execution.SparkPlan] = class 
org.apache.spark.sql.execution.InputAdapter
   ```
   
   For the current implementation of LocalTableScanExec, codegen is enabled in 
case `parent` is not null
   but `parent` is set in `consume`, which is called after `insertInputAdapter` 
so it doesn't work as intended.
   
   After applying this cnahge, we can get following plan, which means 
LocalTableScanExec is within a WholeStageCodegen domain.
   
   ```
   == Physical Plan ==
   *(1) BroadcastHashJoin [value#63], [value#68], Inner, BuildRight
   :- *(1) LocalTableScan [value#63]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)))
      +- LocalTableScan [value#68]
   
   ## How was this patch tested?
   
   New test cases are added into WholeStageCodegenSuite.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to