GitHub user j-baker opened a pull request:
https://github.com/apache/spark/pull/22946
[SPARK-25943][SQL] Fail if mismatching nested struct fields when writing to
datasource
At present, Spark reorders mismatched columns when writing to
a datasource, but does not reorder nested structs.
This causes failure at present if the types do not match, but
not if the names do not match; this causes structs to get
silently mangled.
It's not obvious to me where I should add tests, so would
appreciate guidance on that!
## What changes were proposed in this pull request?
Unify DatasourcesV1 behaviour with DatasourcesV2, by throwing if the names
are mismatched.
## How was this patch tested?
Nothing. I'd really appreciate a suggestion of where to put tests for this;
I'm unfamiliar with the codebase and I couldn't find any obvious looking tests
of that codepath.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/j-baker/spark jbaker/nested_struct
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22946.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22946
----
commit 63dd40f47ab8e8e9c120a9801b2f037336001ea6
Author: James Baker <jbaker@...>
Date: 2018-11-05T10:35:02Z
[SPARK-25943][SQL] Fail if mismatching nested struct fields when writing to
datasource
At present, Spark reorders mismatched columns when writing to
a datasource, but does not reorder nested structs.
This causes failure at present if the types do not match, but
not if the names do not match; this causes structs to get
silently mangled.
It's not obvious to me where I should add tests, so would
appreciate guidance on that!
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]