[GitHub] [spark] sunchao commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

GitBox Fri, 28 Jan 2022 10:25:00 -0800


sunchao commented on pull request #32875:
URL: https://github.com/apache/spark/pull/32875#issuecomment-1024493862



   > Then we restart the streaming query with the flag on, and the 2 tables 
report hash partitioning (not the same as Spark's murmur3).
   
   One question @cloud-fan : is this already a correctness issue previously? 
say if one side of join reports `HashPartitioning` with non-murmur3 hash while 
the other side reports `HashPartitioning` with murmur3 hash (for instance, 
there's a Spark shuffle operator between the data source scan and join).
   
   Thanks @HeartSaVioR for your comments, duly noted. Let me bring back 
`HashClusteredDistribution` then. I'll also add more comments to make it more 
future-proof and no partitioning other than `HashPartitioning` can satisfy it.
   
   > Seems like DataSourcePartitioning doesn't allow the partitioning from data 
source to be satisfy HashClusteredDistribution - it only checks with 
ClusteredDistribution.
   
   That's correct.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

Reply via email to