GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/10541

    [SPARK-12592][SQL][WIP] Converts resolved logical plan back to SQL

    This PR tries to enable Spark SQL to convert resolved logical plans back to 
SQL query strings.  For now, the major use case is to canonicalize Spark SQL 
native view support.
    
    The major entry point is `SQLBuilder.toSQL`, which returns an 
`Option[String]` if the logical plan is recognized.
    
    The current version is still in WIP status, and is quite limited.  Known 
limitations include:
    
    1.  The logical plan must be analyzed but not optimized
    
        The optimizer erases `Subquery` operators, which contain necessary 
scope information for SQL generation.  Future versions should be able to 
recover erased scope information by inserting subqueries when necessary.
    
    1.  The logical plan must be created using HiveQL query string
    
        Query plans generated by composing arbitrary DataFrame API combinations 
are not supported yet.  Operators within these query plans need to be 
rearranged into a canonical form that is more suitable for direct SQL 
generation.  For example, the following query plan
    
        ```
        Filter (a#1 < 10)
         +- MetastoreRelation default, src, None
        ```
    
        need to be canonicalized into the following form before SQL generation:
    
        ```
        Project [a#1, b#2, c#3]
         +- Filter (a#1 < 10)
             +- MetastoreRelation default, src, None
        ```
    
        Otherwise, the SQL generation process will have to handle a large 
number of special cases.
    
    1.  Only a fraction of expressions and basic logical plan operators are 
supported in this PR
    
        Support for window functions, generators, and cubes etc. will be added 
in follow-up PRs.
    
    This PR leverages `HiveCompatibilitySuite` for testing SQL generation in a 
"round-trip" manner:
    
    *   For all select queries, we try to convert it back to SQL
    *   If the query plan is convertible, we parse the generated SQL into a new 
logical plan
    *   Run the new logical plan instead of the original one
    
    If the query plan is inconvertible, the test case simply falls back to the 
original logic.
    
    TODO
    
    - [ ] Fix failed test cases
    - [ ] Support for more basic expressions and logical plan operators (e.g. 
distinct aggregation etc.)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark sql-generation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10541
    
----
commit 17e8fba3484d19e7ca7a993359fdc28004f9dfb8
Author: Cheng Lian <[email protected]>
Date:   2015-12-28T10:12:10Z

    WIP: Converting resolved logical plan back to SQL

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to