[GitHub] [spark] dtenedor opened a new pull request, #42351: [SPARK-44503][SQL] Project any PARTITION BY expressions not already returned from Python UDTF TABLE arguments

via GitHub Fri, 04 Aug 2023 15:59:59 -0700


dtenedor opened a new pull request, #42351:
URL: https://github.com/apache/spark/pull/42351


   ### What changes were proposed in this pull request?
   
   This PR adds a projection when any Python UDTF TABLE argument contains 
PARTITION BY expressions that are not simple attributes that are already 
present in the output of the relation.
   
   For example:
   
   ```
   CREATE TABLE t(d DATE, y INT) USING PARQUET;
   INSERT INTO t VALUES ...
   SELECT * FROM UDTF(TABLE(t) PARTITION BY EXTRACT(YEAR FROM d) ORDER BY y 
ASC);
   ```
   
   This will generate a plan like:
   
   ```
   +- Sort (y ASC)
     +- RepartitionByExpressions (partition_by_0)
       +- Project (t.d, t.y, EXTRACT(YEAR FROM t.d) AS partition_by_0)
         +- LogicalRelation "t"
   ```
   
   ### Why are the changes needed?
   
   We project the PARTITION BY expressions so that their resulting values 
appear in attributes that the Python UDTF interpreter can simply inspect in 
order to know when the partition boundaries have changed.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   This PR adds unit test coverage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dtenedor opened a new pull request, #42351: [SPARK-44503][SQL] Project any PARTITION BY expressions not already returned from Python UDTF TABLE arguments

Reply via email to