GitHub user jzhuge opened a pull request:
https://github.com/apache/spark/pull/21911
[SPARK-24940][SQL] Coalesce Hint for SQL Queries
## What changes were proposed in this pull request?
Many Spark SQL users in my company have asked for a way to control the
number of output files in Spark SQL. The users prefer not to use function
repartition(n) or coalesce(n, shuffle) that require them to write and deploy
Scala/Java/Python code. We propose adding the following Hive-style Coalesce
hint to Spark SQL:
```
/*+ COALESCE(numPartitions[, shuffle]) */
/*+ REPARTITION(numPartitions[, shuffle]) */
```
Multiple hints are allowed. Multiple nodes are inserted into the logical
plan, and the optimizer picks the winner.
```
INSERT INTO s /*+ REPARTITION(100), COALESCE(500, true), COALESCE(10) */
SELECT * FROM t"
== Logical Plan ==
'InsertIntoTable 'UnresolvedRelation `s`, false, false
+- 'UnresolvedHint REPARTITION, [100]
+- 'UnresolvedHint COALESCE, [500, true]
+- 'UnresolvedHint COALESCE, [10]
+- 'Project [*]
+- 'UnresolvedRelation `t`
== Optimized Logical Plan ==
InsertIntoHadoopFsRelationCommand ...
+- Repartition 100, true
+- HiveTableRelation ...
```
Coalesce hints only apply to INSERT while Broadcast hints only apply to
SELECT. Unfortunately the hints added to the wrong command are silently
ignored. Haven't found any minimal approach to improve this error checking.
Maybe add more hint syntax definition to `SqlBase.g4`? Is this desirable? Maybe
enhance the generic hint framework? Any suggestion is welcome.
## How was this patch tested?
All unit tests. Manual tests using explain.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jzhuge/spark SPARK-24940
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21911.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21911
----
commit 4baa2c43b2338ceb68c434a9e854bc0915cf8611
Author: John Zhuge <jzhuge@...>
Date: 2018-07-28T01:46:42Z
[SPARK-24940][SQL] Coalesce Hint for SQL Queries
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]