cloud-fan commented on PR #44133:
URL: https://github.com/apache/spark/pull/44133#issuecomment-1895201489

   The design of bucketed tables is that the data is pre-partitioned by some 
keys. If the join/grouping keys happen to be the bucketed keys, we can save the 
shuffle. What if the join/grouping keys are bucketed keys with cast? I think 
the cast is fine if we don't break the data partitioning, which means the same 
data after cast are still in the same partition. This will be true for upcast 
(the value doesn't change, only the type changes), but not downcast (`a.intCol 
= try_cast(b.bigIntCol AS int)`). After down cast, data from different 
partitions can result to null, which should be in the same partition for 
join/grouping.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to