[GitHub] spark pull request #14539: [SPARK-16947][SQL] Improve type coercion for inli...

hvanhovell Mon, 08 Aug 2016 04:25:32 -0700

GitHub user hvanhovell opened a pull request:

    https://github.com/apache/spark/pull/14539


    [SPARK-16947][SQL] Improve type coercion for inline tables.

    ## What changes were proposed in this pull request?
    Inline tables were added in to Spark SQL in 2.0, e.g.: `select * from 
values (1, 'A'), (2, 'B') as tbl(a, b)`
    
    This is currently implemented using a `LocalRelation` and this relation is 
created during parsing. This has a weakness: type coercion is based on the 
first row in the relation, and all subsequent values are cast in to this type. 
The latter violates the principle of least surprise.
    
    This PR fixes this by creating an `Union` of (constant) `Project`s. As a 
result type coercion now follows the rules for `Union`, which is similar to 
other systems like PostgreSQL. In order to retain optimal speed, I have 
extended the `ConvertToLocalRelation`, which makes sure the table gets 
rewritten into a `LocalRelation` during optimization.
    
    The following SQL statement:
    ```SQL
    select * from values (1, exp(1)), (2.00, 'b'), (3.0, 'tt'), (40.0001, 4) 
x(a, b)
    ```
    
    ... now yields the following plan:
    ```
    == Parsed Logical Plan ==
    'Project [*]
    +- 'SubqueryAlias x
       +- 'Union
          :- 'Project [1 AS a#0, 'exp(1) AS b#1]
          :  +- OneRowRelation$
          :- Project [2.00 AS a#2, b AS b#3]
          :  +- OneRowRelation$
          :- Project [3.0 AS a#4, tt AS b#5]
          :  +- OneRowRelation$
          +- Project [40.0001 AS a#6, 4 AS b#7]
             +- OneRowRelation$
    
    == Analyzed Logical Plan ==
    a: decimal(14,4), b: string
    Project [a#18, b#19]
    +- SubqueryAlias x
       +- Union
          :- Project [cast(a#0 as decimal(14,4)) AS a#18, cast(b#1 as string) 
AS b#19]
          :  +- Project [1 AS a#0, EXP(cast(1 as double)) AS b#1]
          :     +- OneRowRelation$
          :- Project [cast(a#2 as decimal(14,4)) AS a#20, b#3]
          :  +- Project [2.00 AS a#2, b AS b#3]
          :     +- OneRowRelation$
          :- Project [cast(a#4 as decimal(14,4)) AS a#21, b#5]
          :  +- Project [3.0 AS a#4, tt AS b#5]
          :     +- OneRowRelation$
          +- Project [cast(a#6 as decimal(14,4)) AS a#22, cast(b#7 as string) 
AS b#23]
             +- Project [40.0001 AS a#6, 4 AS b#7]
                +- OneRowRelation$
    
    == Optimized Logical Plan ==
    LocalRelation [a#18, b#19]
    
    == Physical Plan ==
    LocalTableScan [a#18, b#19]
    ```
    
    ... and the following result:
    ```
    +-------+-----------------+
    |      a|                b|
    +-------+-----------------+
    | 1.0000|2.718281828459045|
    | 2.0000|                b|
    | 3.0000|               tt|
    |40.0001|                4|
    +-------+-----------------+
    ```
    
    ## How was this patch tested?
    Inline tables are converted into `Union`s during parsing. Type coercion for 
unions is already covered by various test cases. I have updated the 
`PlanParseSuite` to test the parsers the new output, and I have added tests to 
the `ConvertToLocalRelationSuite` to tests if inline table (a-like) structures 
are converted into `LocalRelation`s.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hvanhovell/spark SPARK-16947

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14539.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14539
    
----
commit 3b0a28b9626b2fac65688cf3a2702a96131e4307
Author: Herman van Hovell <[email protected]>
Date:   2016-08-08T11:10:38Z

    Improve inline table processing.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14539: [SPARK-16947][SQL] Improve type coercion for inli...

Reply via email to