Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21498#discussion_r193263354
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -1099,6 +1099,17 @@ object SQLConf {
.intConf
.createWithDefault(SHUFFLE_SPILL_NUM_ELEMENTS_FORCE_SPILL_THRESHOLD.defaultValue.get)
+ val UNION_IN_SAME_PARTITION =
+ buildConf("spark.sql.unionInSamePartition")
+ .internal()
+ .doc("When true, Union operator will union children results in the
same corresponding " +
+ "partitions if they have same partitioning. This eliminates
unnecessary shuffle in later " +
+ "operators like aggregation. Note that because non-deterministic
functions such as " +
+ "monotonically_increasing_id are depended on partition id. By
doing this, the values of " +
--- End diff --
Seems we have wanted to make sure non-deterministic functions have same
values after union. Once we union children in same partitions, the values of
such functions can be changed. So I added this config to control it. Default
config is false.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]