Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/4729#issuecomment-76295694
@liancheng Unlike the issue of `ParquetConversions`, I think the array
insertion issue may not be just a Hive specific one. The problem is when we
create Parquet table that includes array (or map, struct), by default we use a
schema that sets `containsNull` as true. But actually later we want to insert
data, the data schema could have `containsNull` as true or false. In Hive,
seems that it solves this problem by only supporting these fields containing
null elements. So no matter the inserting data contains null or not, we set its
schema to have `containsNull` as true before inserting into Parquet file. Since
I think we don't want to explicitly change the data schema and affect other
parts, doing it in `RowWriteSupport` should be ok, except you have other
concerns.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]