[GitHub] [spark] wangyum commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to rebalance the query output if AQE is enabled

GitBox Fri, 25 Jun 2021 18:47:54 -0700


wangyum commented on a change in pull request #32932:
URL: https://github.com/apache/spark/pull/32932#discussion_r659102206




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -1351,6 +1351,31 @@ object RepartitionByExpression {
   }
 }
 
+/**
+ * This operator is used to rebalance the output partitions of the given 
`child`, so that every
+ * partition is of a reasonable size (not too small and not too big). It also 
try its best to
+ * partition the child output by `partitionExpressions`. If there are skews, 
Spark will split the
+ * skewed partitions, to make these partitions not too big. This operator is 
useful when you need
+ * to write the result of `child` to a table, to avoid too small/big files.
+ *
+ * Note that, this operator only makes sense when AQE is enabled.
+ */
+case class RebalancePartitions(
+    partitionExpressions: Seq[Expression],
+    child: LogicalPlan) extends UnaryNode {

Review comment:
       Make `RebalancePartitions` extends `RepartitionOperation`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] wangyum commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to rebalance the query output if AQE is enabled

Reply via email to