[GitHub] spark pull request #21498: [SPARK-24410][SQL][Core][WIP] Optimization for Un...

viirya Tue, 05 Jun 2018 17:42:27 -0700

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21498#discussion_r193263354
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -1099,6 +1099,17 @@ object SQLConf {
           .intConf
           
.createWithDefault(SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD.defaultValue.get)
     
    +  val UNION_IN_SAME_PARTITION =
    +    buildConf("spark.sql.unionInSamePartition")
    +      .internal()
    +      .doc("When true, Union operator will union children results in the 
same corresponding " +
    +        "partitions if they have same partitioning. This eliminates 
unnecessary shuffle in later " +
    +        "operators like aggregation. Note that because non-deterministic 
functions such as " +
    +        "monotonically_increasing_id are depended on partition id. By 
doing this, the values of " +
    --- End diff --
    
    Seems we have wanted to make sure non-deterministic functions have same 
values after union. Once we union children in same partitions, the values of 
such functions can be changed. So I added this config to control it. Default 
config is false.




---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21498: [SPARK-24410][SQL][Core][WIP] Optimization for Un...

Reply via email to