dtenedor opened a new pull request, #42420: URL: https://github.com/apache/spark/pull/42420
### What changes were proposed in this pull request? This PR implements query execution support for for the PARTITION BY and ORDER BY clauses for UDTF TABLE arguments. * The query planning support was added in https://github.com/apache/spark/pull/42100 and https://github.com/apache/spark/pull/42174 and https://github.com/apache/spark/pull/42351. After those changes, the planner added a projection to compute the PARTITION BY expressions, plus a repartition operator, plus a sort operator. * In this PR, the Python executor receives the indexes of these expressions within the input table's rows, and compares the values of the projected partitioning expressions between consecutive rows. * When the values change, this marks the boundary between partitions, and so we call the UDTF instance's `terminate` method, then destroy it and create a new one for the next partition. ### Why are the changes needed? This brings full end-to-end execution for the PARTITION BY and/or ORDER BY clauses for UDTF TABLE arguments. ### Does this PR introduce _any_ user-facing change? Yes, see above. ### How was this patch tested? This PR adds end-to-end testing in `test_udtf.py`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
