Github user viirya commented on a diff in the pull request:
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
    @@ -39,7 +39,9 @@ class ConfigBehaviorSuite extends QueryTest with 
SharedSQLContext {
         def computeChiSquareTest(): Double = {
           val n = 10000
           // Trigger a sort
    -      val data = spark.range(0, n, 1, 1).sort('id.desc)
    +      // Range has range partitioning in its output now. To have a range 
shuffle, we
    +      // need to run a repartition first.
    +      val data = spark.range(0, n, 1, 1).repartition(10).sort('id.desc)
    --- End diff --
    By `spark.range(0, n, 1, 10).sort('id.desc)`, there is no 3 times liner 
relation between `a` and `b`. As shown above, this is also evenly distribution, 
the chi-sq value is also under `100`.
    Here we need a redistribution on data to make sampling difficult. 
Previously, a repartition is added automatically before `sort`. Now `range` has 
correct output partition info, so the repattition must be added manually.


To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to