Github user cloud-fan commented on a diff in the pull request:
    --- Diff: 
    @@ -118,10 +116,13 @@ case class HashClusteredDistribution(
      * Represents data where tuples have been ordered according to the 
    - * [[Expression Expressions]].  This is a strictly stronger guarantee than
    - * [[ClusteredDistribution]] as an ordering will ensure that tuples that 
share the
    - * same value for the ordering expressions are contiguous and will never 
be split across
    - * partitions.
    + * [[Expression Expressions]].
    + *
    + * Tuples that share the same values for the ordering expressions must be 
contiguous within a
    + * partition. They can also across partitions, but these partitions must 
be contiguous. For example,
    + * if value `v` is the biggest values in partition 3, it can also be in 
partition 4 as the smallest
    + * value. If all the values in partition 4 are `v`, it can also be in 
partition 5 as the smallest
    + * value.
     case class OrderedDistribution(ordering: Seq[SortOrder]) extends 
Distribution {
    --- End diff --
    This is only used by sort, and sort doesn't require rows of same value to 
be colocated in the same partition.
    Actually we already use this knowledge to optimize 


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to