GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21586

    [SPARK-24586][SQL] Upcast should not allow casting from string to other 
types

    ## What changes were proposed in this pull request?
    
    When turning a Dataset to another Dataset, Spark will up cast the fields in 
the original Dataset to the type of corresponding fields in the target DataSet.
    
    However, the current upcast behavior is a little weird, we don't allow up 
casting from string to numeric, but allow non-numeric types as the target, like 
boolean, date, etc.
    
    As a result, `Seq("str").toDS.as[Int]` fails, but 
`Seq("str").toDS.as[Boolean]` works and throw NPE during execution.
    
    The motivation of the up cast is to prevent things like runtime NPE, it's 
more reasonable to make up cast stricter.
    
    Note that, the up cast change also affects materialized view resolution. 
But since we don't support changing column types of an existing table, there is 
no behavior change here.
    
    ## How was this patch tested?
    
    new tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark cast

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21586.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21586
    
----
commit c89d12e7b32987cbe4a081fc417fb38022061cc5
Author: Wenchen Fan <wenchen@...>
Date:   2018-06-18T22:17:14Z

    Upcast should not allow casting from string to other types

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to