sunchao edited a comment on pull request #32875:
URL: https://github.com/apache/spark/pull/32875#issuecomment-1024493862


   > Then we restart the streaming query with the flag on, and the 2 tables 
report hash partitioning (not the same as Spark's murmur3).
   
   One question @cloud-fan : is this already a correctness issue previously? 
say if one side of join reports `HashPartitioning` with non-murmur3 hash while 
the other side reports `HashPartitioning` with murmur3 hash (for instance, 
there's a Spark shuffle operator between the data source scan and join). I 
wonder if the issue can happen even if data sources report `HashPartitioning` 
with Spark's murmur3 hash.
   
   Thanks @HeartSaVioR for your comments, duly noted. Let me bring back 
`HashClusteredDistribution` then. I'll also add more comments to make it more 
future-proof and no partitioning other than `HashPartitioning` can satisfy it. 
Would you please provide a test suite for this potential issue?
   
   > Seems like DataSourcePartitioning doesn't allow the partitioning from data 
source to be satisfy HashClusteredDistribution - it only checks with 
ClusteredDistribution.
   
   That's correct.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to