GitHub user cloud-fan opened a pull request:
https://github.com/apache/spark/pull/21586
[SPARK-24586][SQL] Upcast should not allow casting from string to other
types
## What changes were proposed in this pull request?
When turning a Dataset to another Dataset, Spark will up cast the fields in
the original Dataset to the type of corresponding fields in the target DataSet.
However, the current upcast behavior is a little weird, we don't allow up
casting from string to numeric, but allow non-numeric types as the target, like
boolean, date, etc.
As a result, `Seq("str").toDS.as[Int]` fails, but
`Seq("str").toDS.as[Boolean]` works and throw NPE during execution.
The motivation of the up cast is to prevent things like runtime NPE, it's
more reasonable to make up cast stricter.
Note that, the up cast change also affects materialized view resolution.
But since we don't support changing column types of an existing table, there is
no behavior change here.
## How was this patch tested?
new tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/cloud-fan/spark cast
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21586.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21586
----
commit c89d12e7b32987cbe4a081fc417fb38022061cc5
Author: Wenchen Fan <wenchen@...>
Date: 2018-06-18T22:17:14Z
Upcast should not allow casting from string to other types
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]