[GitHub] [spark] sunchao commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

GitBox Thu, 27 Jan 2022 11:03:28 -0800


sunchao commented on pull request #32875:
URL: https://github.com/apache/spark/pull/32875#issuecomment-1023547628



   @HeartSaVioR no worries, I should have pinged you too :)
   
   > In Structured Streaming, state is partitioned with grouping keys based on 
Spark's internal hash function, and the number of partition is static. That 
said, if Spark does not respect the distribution of state against stateful 
operator, it leads to correctness problem.
   
   Could you give me a concrete example of this? Currently the rule only skips 
shuffle in join if both sides report the same distribution. Also, with the 
first follow-up by @cloud-fan I think we've already restored the previous 
behavior.
   
   I'm no Spark streaming expert so still trying to know more about the problem 
here. :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on pull request #32875: [SPARK-35703][SQL] Relax constraint for bucket join and remove HashClusteredDistribution

Reply via email to