[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

viirya Mon, 14 May 2018 16:50:26 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21291#discussion_r188131133
  
    --- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/ConfigBehaviorSuite.scala ---
    @@ -39,7 +39,9 @@ class ConfigBehaviorSuite extends QueryTest with 
SharedSQLContext {
         def computeChiSquareTest(): Double = {
           val n = 10000
           // Trigger a sort
    -      val data = spark.range(0, n, 1, 1).sort('id.desc)
    +      // Range has range partitioning in its output now. To have a range 
shuffle, we
    +      // need to run a repartition first.
    +      val data = spark.range(0, n, 1, 1).repartition(10).sort('id.desc)
    --- End diff --
    
    This test requires a range shuffle. Previously `range` has unknown output 
partitioning/ordering, so there is a range shuffle inserted before `sort`.
    
    For now `range` has an ordered output, so planner doesn't insert the 
shuffle we need here.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21291: [SPARK-24242][SQL] RangeExec should have correct ...

Reply via email to