GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/18412
Fix wrong results of insertion of Array of Struct
### What changes were proposed in this pull request?
```SQL
CREATE TABLE `tab1`
(`custom_fields` ARRAY<STRUCT<`id`: BIGINT, `value`: STRING>>)
USING parquet
INSERT INTO `tab1`
SELECT ARRAY(named_struct('id', 1, 'value', 'a'), named_struct('id', 2,
'value', 'b'))
SELECT custom_fields.id, custom_fields.value FROM tab1
```
The above query always return the last struct of the array, because the
rule `SimplifyCasts` incorrectly rewrites the query. The underlying cause is we
always use the same `GenericInternalRow` object when doing the cast.
### How was this patch tested?
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark castStruct
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18412.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18412
----
commit 3be3475d3da7e281f7c1a6599988a621c4d6b0f5
Author: gatorsmile <[email protected]>
Date: 2017-06-24T03:29:38Z
fix.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]