GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/15767
[SPARK-18269][SQL] null should be properly read when schema is lager than
parsed tokens and types are not string
## What changes were proposed in this pull request?
Currently, there are the three cases when reading CSV by datasource when it
is `PERMISSIVE` parse mode.
- schema == parsed tokens (from each line)
No problem to cast the value in the tokens to the field in the schema as
they are equal.
- schema < parsed tokens (from each line)
It slices the tokens into the number of fields in schema.
- schema > parsed tokens (from each line)
It appends `null` into parsed tokens so that safely values can be casted
with the schema.
However, when `null` is appended in the third case, we should take `null`
into account when casting the values.
In case of `StringType`, it is fine as `UTF8String.fromString(datum)`
produces `null` when the input is `null`. Therefore, this case will happen only
when schema is explicitly given and schema includes data types that are not
`StringType`.
## How was this patch tested?
Unit test in `CSVSuite.scala` and `CSVTypeCastSuite.scala`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-18269
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15767.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15767
----
commit 36920996e42ccda514e69b9aae0cf3bfe13242ce
Author: hyukjinkwon <[email protected]>
Date: 2016-11-04T13:41:07Z
Take the case of null into account
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]