GitHub user liancheng opened a pull request:
https://github.com/apache/spark/pull/10541
[SPARK-12592][SQL][WIP] Converts resolved logical plan back to SQL
This PR tries to enable Spark SQL to convert resolved logical plans back to
SQL query strings. For now, the major use case is to canonicalize Spark SQL
native view support.
The major entry point is `SQLBuilder.toSQL`, which returns an
`Option[String]` if the logical plan is recognized.
The current version is still in WIP status, and is quite limited. Known
limitations include:
1. The logical plan must be analyzed but not optimized
The optimizer erases `Subquery` operators, which contain necessary
scope information for SQL generation. Future versions should be able to
recover erased scope information by inserting subqueries when necessary.
1. The logical plan must be created using HiveQL query string
Query plans generated by composing arbitrary DataFrame API combinations
are not supported yet. Operators within these query plans need to be
rearranged into a canonical form that is more suitable for direct SQL
generation. For example, the following query plan
```
Filter (a#1 < 10)
+- MetastoreRelation default, src, None
```
need to be canonicalized into the following form before SQL generation:
```
Project [a#1, b#2, c#3]
+- Filter (a#1 < 10)
+- MetastoreRelation default, src, None
```
Otherwise, the SQL generation process will have to handle a large
number of special cases.
1. Only a fraction of expressions and basic logical plan operators are
supported in this PR
Support for window functions, generators, and cubes etc. will be added
in follow-up PRs.
This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a
"round-trip" manner:
* For all select queries, we try to convert it back to SQL
* If the query plan is convertible, we parse the generated SQL into a new
logical plan
* Run the new logical plan instead of the original one
If the query plan is inconvertible, the test case simply falls back to the
original logic.
TODO
- [ ] Fix failed test cases
- [ ] Support for more basic expressions and logical plan operators (e.g.
distinct aggregation etc.)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/liancheng/spark sql-generation
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10541.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10541
----
commit 17e8fba3484d19e7ca7a993359fdc28004f9dfb8
Author: Cheng Lian <[email protected]>
Date: 2015-12-28T10:12:10Z
WIP: Converting resolved logical plan back to SQL
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]