[GitHub] [spark] cloud-fan commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely

GitBox Wed, 23 Jun 2021 06:19:28 -0700


cloud-fan commented on a change in pull request #32932:
URL: https://github.com/apache/spark/pull/32932#discussion_r657082306




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -1351,6 +1351,31 @@ object RepartitionByExpression {
   }
 }
 
+/**
+ * This operator used to rebalance the query result output partitions, so that 
every partition
+ * is of a reasonable size (not too small and not too big). It can take column 
names as parameters,
+ * and try its best to partition the query result by these columns. If there 
are skews, Spark will
+ * split the skewed partitions, to make these partitions not too big. This 
operator is useful when
+ * you need to write the result of this query to a table, to avoid too 
small/big files.
+ *
+ * Note that, only AQE is enabled does the operator make sense.

Review comment:
       Note that, this operator only makes sense when AQE is enabled.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely

Reply via email to