[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

viirya Fri, 07 Dec 2018 02:37:57 -0800

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23249#discussion_r239754619
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
    @@ -118,10 +115,12 @@ case class HashClusteredDistribution(
     
     /**
      * Represents data where tuples have been ordered according to the 
`ordering`
    - * [[Expression Expressions]].  This is a strictly stronger guarantee than
    - * [[ClusteredDistribution]] as an ordering will ensure that tuples that 
share the
    - * same value for the ordering expressions are contiguous and will never 
be split across
    - * partitions.
    + * [[Expression Expressions]]. Its requirement is defined as the following:
    + *   - Given any 2 adjacent partitions, all the rows of the second 
partition must be larger than or
    + *     equal to any row in the first partition, according to the 
`ordering` expressions.
    --- End diff --
    
    Why here we need this equality? Can we just have all the rows in the second 
partition must be larger than any row in the first partition?



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

Reply via email to