[GitHub] [spark] AngersZhuuuu opened a new pull request #33764: [SPARK-36352][SQL][3.0] Spark should check result plan's output schema name

GitBox Tue, 17 Aug 2021 08:02:12 -0700


AngersZhuuuu opened a new pull request #33764:
URL: https://github.com/apache/spark/pull/33764



   ### What changes were proposed in this pull request?
   Spark should check result plan's output schema name
   
   
   ### Why are the changes needed?
   In current code, some optimizer rule may change plan's output schema, since 
in the code we always use semantic equal to check output, but it may change the 
plan's output schema.
   For example, for SchemaPruning, if we have a plan
   ```
   Project[a, B]
   |--Scan[A, b, c]
   ```
   the origin output schema is `a, B`, after SchemaPruning. it become
   ```
   Project[A, b]
   |--Scan[A, b]
   ```
   It change the plan's schema. when we use CTAS, the schema is same as query 
plan's output.
   Then since we change the schema, it not consistent with origin SQL. So we 
need to check final result plan's schema with origin plan's schema
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   existed UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu opened a new pull request #33764: [SPARK-36352][SQL][3.0] Spark should check result plan's output schema name

Reply via email to