[PR] [SPARK-50593][SQL] SPJ: Support truncate transform [spark]

via GitHub Mon, 16 Dec 2024 17:43:56 -0800


szehon-ho opened a new pull request, #49211:
URL: https://github.com/apache/spark/pull/49211


      ### What changes were proposed in this pull request?
   Truncate(col, len) partition transform is not supported in SPJ.
   
   It seems for generic multi-arg transforms, support was added for write 
distribution and ordering in https://github.com/apache/spark/pull/37749 but 
somehow storage-partition join does not support it despite the comment in :  
https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/test/java/org/apache/iceberg/spark/sql/TestStoragePartitionedJoins.java#L129
 
   
   This change extends SPJ support to transforms where there is 1 reference 
(skipping literal).  Note, it seems bucket (another instance of this) is 
supported in SPJ via another mechanism, by having an explicit BucketTransform, 
but not sure its necessary here.
   
     ### Why are the changes needed?
   A join between tables partitioned by truncate transform, on the partition 
column, do not benefit from skipping shuffle.
   
   
     ### Does this PR introduce _any_ user-facing change?
   No
   
     ### How was this patch tested?
   New unit test in KeyGroupedPartitioningSuite
   
   
     ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-50593][SQL] SPJ: Support truncate transform [spark]

Reply via email to