cloud-fan commented on a change in pull request #27616: [SPARK-30864] [SQL]add
the user guide for Adaptive Query Execution
URL: https://github.com/apache/spark/pull/27616#discussion_r392834156
##########
File path: docs/sql-performance-tuning.md
##########
@@ -186,3 +186,63 @@ The "REPARTITION_BY_RANGE" hint must have column names
and a partition number is
SELECT /*+ REPARTITION(3, c) */ * FROM t
SELECT /*+ REPARTITION_BY_RANGE(c) */ * FROM t
SELECT /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t
+
+## Adaptive Query Execution
+Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that
makes use of the runtime statistics to choose the most efficient query
execution plan. AQE is disabled by default. Spark SQL can use the umbrella
configuration of `spark.sql.adaptive.enabled` to control whether turn it
on/off. As of Spark 3.0, there are three major features in AQE, including
coalescing coalescing post-shuffle partitions, converting sort-merge join to
broadcast join, and skewed join optimization.
+
+### Coalescing Post Shuffle Partition Number
+This feature coalesces the post shuffle partitions based on the map output
statistics when both `spark.sql.adaptive.enabled` and
`spark.sql.adaptive.coalescePartitions.enabled` configuration properties are
enabled. There are four following sub-configurations in this optimization rule.
This feature simplifies the tuning of shuffle partition number when running
queries. You do not need to set a proper shuffle partition number to fit your
dataset. Spark can pick the proper shuffle partition number at runtime once you
set a large enough initial number of shuffle partitions via
`spark.sql.adaptive.coalescePartitions.initialPartitionNum` configuration.
Review comment:
`There are four following sub-configurations in this optimization rule.` Can
we remove this sentence? This looks not useful as users can see all the configs
in the following table.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]