[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

GitBox Wed, 19 Feb 2020 05:22:09 -0800

cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add 
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r381285141


 ##########
 File path: docs/sql-performance-tuning.md
 ##########
 @@ -186,3 +186,75 @@ The "REPARTITION_BY_RANGE" hint must have column names 
and a partition number is
     SELECT /*+ REPARTITION(3, c) */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
     SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that 
makes use of the runtime statistics to choose the most efficient query 
execution plan. AQE is disabled by default. Spark SQL can use the umbrella 
configuration of `spark.sql.adaptive.enabled` to control whether turn it 
on/off. As of Spark 3.0, there are three major features in AQE, including 
coalescing post-shuffle partition number, optimizing local shuffle reader and 
optimizing skewed join.
+ ### Coalescing Post Shuffle Partition Number
+ This feature coalesces the post shuffle partitions based on the map output 
statistics when `spark.sql.adaptive.enabled` and 
`spark.sql.adaptive.shuffle.reducePostShufflePartitions.enabled` configuration 
properties are both enabled. There are four following sub-configurations in 
this optimization rule. And this feature can bring about 1.28x performance gain 
with query 38 in 3TB TPC-DS.
 
 Review comment:
   `And this feature can bring about 1.28x performance gain with query 38 in 
3TB TPC-DS.` This is not useful... how about something like
   ```
   This feature simplifies the tuning of shuffle partitions number when running 
queries. You don't need to
   set a proper shuffle partition number that just fits your data. You just 
need to set a large enough number and
   Spark can pick the proper shuffle partition number at runtime.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add the user guide for Adaptive Query Execution

Reply via email to