[GitHub] [spark] ulysses-you commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely

GitBox Wed, 23 Jun 2021 00:26:28 -0700


ulysses-you commented on a change in pull request #32932:
URL: https://github.com/apache/spark/pull/32932#discussion_r656829598




##########
File path: docs/sql-performance-tuning.md
##########
@@ -228,6 +228,8 @@ The "REPARTITION_BY_RANGE" hint must have column names and 
a partition number is
     SELECT /*+ REPARTITION */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+    SELECT /*+ REPARTITION_BY_AQE */ * FROM t

Review comment:
       currently they are same, but I prefer the later which is more safer with 
AQE

##########
File path: docs/sql-ref-syntax-qry-select-hints.md
##########
@@ -51,6 +51,10 @@ specified, multiple nodes are inserted into the logical 
plan, but the leftmost h
 
   The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified 
number of partitions using the specified partitioning expressions. It takes 
column names and an optional partition number as parameters.
 
+* **REPARTITION_BY_AQE**

Review comment:
       so sorry for this, I prepare to write more docs after PR 
[#32883](https://github.com/apache/spark/pull/32883) ..

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -1351,6 +1351,28 @@ object RepartitionByExpression {
   }
 }
 
+/**
+ * This operator does not guarantee the output partitioning, because the 
partition number will be
+ * optimized by AQE.
+ */
+case class AdaptiveRepartition(
+    partitionExpressions: Seq[Expression],
+    child: LogicalPlan) extends UnaryNode {
+  override def maxRows: Option[Long] = child.maxRows
+  override def output: Seq[Attribute] = child.output
+
+  lazy val numPartitions: Int = conf.numShufflePartitions
+
+  def partitioning: Partitioning = if (partitionExpressions.nonEmpty) {
+    HashPartitioning(partitionExpressions, numPartitions)

Review comment:
       > Or does AQE has a logic to automatically figure out the number fo 
expand?
   
   yea, AQE can rebalance partitions automatically, can see PR 
[#32883](https://github.com/apache/spark/pull/32883)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ulysses-you commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely

Reply via email to