xuanyuanking commented on a change in pull request #28672:
URL: https://github.com/apache/spark/pull/28672#discussion_r432790741
##########
File path: docs/sql-ref-syntax-qry-select-hints.md
##########
@@ -21,14 +21,86 @@ license: |
### Description
-Join Hints allow users to suggest the join strategy that Spark should use.
Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`,
`SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0.
When different join strategy hints are specified on both sides of a join, Spark
prioritizes hints in the following order: `BROADCAST` over `MERGE` over
`SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with
the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side
based on the join type and the sizes of the relations. Since a given strategy
may not support all join types, Spark is not guaranteed to use the join
strategy suggested by the hint.
+Hints give users a way to suggest how Spark SQL to use specific approaches to
generate its execution plan.
### Syntax
```sql
-/*+ join_hint [ , ... ] */
+/*+ hint [ , ... ] */
```
+### Partitioning Hints
+
+Partitioning hints allow users to suggest a partitioning stragety that Spark
should follow. `COALESCE`, `REPARTITION`,
+and `REPARTITION_BY_RANGE` hints are supported and are equivalent to
`coalesce`, `repartition`, and
+`repartitionByRange` [Dataset
APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints
give users
+a way to tune performance and control the number of output files in Spark SQL.
When multiple partitioning hints are
+specified, multiple nodes are inserted into the logical plan, but the leftmost
hint is picked by the optimizer.
Review comment:
Nit: The description of multiple hints is duplicated in
https://github.com/apache/spark/pull/28672/files#diff-84ec3ee2cc31db6fd14e15058e35435cR69,
maybe we just keep the one with the example.
##########
File path: docs/sql-ref-syntax-qry-select-hints.md
##########
@@ -21,14 +21,86 @@ license: |
### Description
-Join Hints allow users to suggest the join strategy that Spark should use.
Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`,
`SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0.
When different join strategy hints are specified on both sides of a join, Spark
prioritizes hints in the following order: `BROADCAST` over `MERGE` over
`SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with
the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side
based on the join type and the sizes of the relations. Since a given strategy
may not support all join types, Spark is not guaranteed to use the join
strategy suggested by the hint.
+Hints give users a way to suggest how Spark SQL to use specific approaches to
generate its execution plan.
### Syntax
```sql
-/*+ join_hint [ , ... ] */
+/*+ hint [ , ... ] */
```
+### Partitioning Hints
+
+Partitioning hints allow users to suggest a partitioning stragety that Spark
should follow. `COALESCE`, `REPARTITION`,
+and `REPARTITION_BY_RANGE` hints are supported and are equivalent to
`coalesce`, `repartition`, and
+`repartitionByRange` [Dataset
APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints
give users
+a way to tune performance and control the number of output files in Spark SQL.
When multiple partitioning hints are
+specified, multiple nodes are inserted into the logical plan, but the leftmost
hint is picked by the optimizer.
+
+### Partitioning Hints Types
Review comment:
\#### Partitioning Hints Types?
##########
File path: docs/sql-ref-syntax-qry-select-hints.md
##########
@@ -21,14 +21,86 @@ license: |
### Description
-Join Hints allow users to suggest the join strategy that Spark should use.
Prior to Spark 3.0, only the `BROADCAST` Join Hint was supported. `MERGE`,
`SHUFFLE_HASH` and `SHUFFLE_REPLICATE_NL` Joint Hints support was added in 3.0.
When different join strategy hints are specified on both sides of a join, Spark
prioritizes hints in the following order: `BROADCAST` over `MERGE` over
`SHUFFLE_HASH` over `SHUFFLE_REPLICATE_NL`. When both sides are specified with
the `BROADCAST` hint or the `SHUFFLE_HASH` hint, Spark will pick the build side
based on the join type and the sizes of the relations. Since a given strategy
may not support all join types, Spark is not guaranteed to use the join
strategy suggested by the hint.
+Hints give users a way to suggest how Spark SQL to use specific approaches to
generate its execution plan.
### Syntax
```sql
-/*+ join_hint [ , ... ] */
+/*+ hint [ , ... ] */
```
+### Partitioning Hints
+
+Partitioning hints allow users to suggest a partitioning stragety that Spark
should follow. `COALESCE`, `REPARTITION`,
+and `REPARTITION_BY_RANGE` hints are supported and are equivalent to
`coalesce`, `repartition`, and
+`repartitionByRange` [Dataset
APIs](api/scala/org/apache/spark/sql/Dataset.html), respectively. These hints
give users
+a way to tune performance and control the number of output files in Spark SQL.
When multiple partitioning hints are
+specified, multiple nodes are inserted into the logical plan, but the leftmost
hint is picked by the optimizer.
+
+### Partitioning Hints Types
+
+* **COALESCE**
+
+ The `COALESCE` hint can be used to reduce the number of partitions to the
specified number of partitions. It takes a partition number as a parameter.
+
+* **REPARTITION**
+
+ The `REPARTITION` hint can be used to repartition to the specified number of
partitions using the specified partitioning expressions. It takes a partition
number, column names, or both as parameters.
+
+* **REPARTITION_BY_RANGE**
+
+ The `REPARTITION_BY_RANGE` hint can be used to repartition to the specified
number of partitions using the specified partitioning expressions. It takes
column names and an optional partition number as parameters.
+
+
+### Examples
Review comment:
Ditto, \#### Examples
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]