HeartSaVioR opened a new pull request #35551:
URL: https://github.com/apache/spark/pull/35551


   ### What changes were proposed in this pull request?
   
   This PR proposes to rename back `StatefulOpClusteredDistribution` to 
`HashClusteredDistribution`. This PR retains the content of the classdoc for 
stateful operators in `HashClusteredDistribution`, along with new general 
content of the classdoc.
   
   ### Why are the changes needed?
   
   We figured out that in some case `HashClusteredDistribution` is still 
desirable other than stateful operators; `HashPartitioning` with subset of 
grouping keys can satisfy `ClusteredDistribution`, which means the cardinality 
of the subset of grouping keys technically defines the max parallelism. 
Increasing the number of partitions does not always help to solve the skew of 
the partitions.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing tests since this PR just renames a class.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to