prakharjain09 opened a new pull request #30762:
URL: https://github.com/apache/spark/pull/30762
### What changes were proposed in this pull request?
This PR tries to prune the unrequired output partitionings in cases when the
columns are dropped from Project/Aggregates etc.
### Why are the changes needed?
Consider this query:
select t1.id from t1 JOIN t2 on t1.id = t2.id
This query will have top level Project node which will just project t1.id.
But the outputPartitioning of this project node will be:
PartitioningCollection(HashPartitioning(t1.id), HashPartitioning(t2.id)).
But since we are not propagating t2.id column, so we can drop
HashPartitioning(t2.id) from the output partitioning of Project node.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Added UTs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]