[GitHub] [spark] cloud-fan commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely

GitBox Tue, 22 Jun 2021 20:59:25 -0700


cloud-fan commented on a change in pull request #32932:
URL: https://github.com/apache/spark/pull/32932#discussion_r656741164




##########
File path: docs/sql-performance-tuning.md
##########
@@ -228,6 +228,8 @@ The "REPARTITION_BY_RANGE" hint must have column names and 
a partition number is
     SELECT /*+ REPARTITION */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+    SELECT /*+ REPARTITION_BY_AQE */ * FROM t

Review comment:
       Other repartition hints can also be optimized by AQE, so I think this 
name is not precise enough.
   
   The key point here is the user intention. To optimize for data writing, we 
don't need a specific number of partitions, we don't need a strict output 
partitioning (like partition by a column). We only need to make the output 
evenly distributed and be partitioned by come columns as possible as we can 
(best effort).
   
   How about `REBALANCE_OUTPUT_PARTITIONS`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #32932: [SPARK-35786][SQL] Add a new operator to distingush if AQE can optimize safely

Reply via email to