GitHub user koertkuipers opened a pull request:
https://github.com/apache/spark/pull/11980
SPARK-14139 Dataset loses nullability in operations with RowEncoder
## What changes were proposed in this pull request?
RowEncoder now respects nullability for struct fields when creating
extractor expressions in the extractorsFor method.
Note that to get the correct value for nullable for the returned expression
i chose to drop the If statement checking for nulls if the field has
nullable=false. If this is undesired because we should defensively be checking
for nulls anyhow with the If statement then that can be achieved as well, by
modifying the If class, however to me that solution seems less clear/elegant.
## How was this patch tested?
Added new unit test in DataFrameSuite for the bug described in the jira
issue.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/tresata/spark feat-rowencoder-nullable
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11980.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11980
----
commit 847d7c7cdfe6626ea1f73656f9eaf868d641ae1c
Author: Koert Kuipers <[email protected]>
Date: 2016-03-26T20:14:18Z
change RowEncoder to respect nullable for struct fields when generating
extractors
commit b500c8bb1d3d8f1ea1d88a2f761056b946598871
Author: Koert Kuipers <[email protected]>
Date: 2016-03-26T20:46:54Z
merge from master
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]