Github user viirya commented on the pull request:
https://github.com/apache/spark/pull/4806#issuecomment-76413623
Besides, I think that it is weird to manually set up the `containsNull` for
JSON schema inference. Sampling should not be an issue because you can also
argue that we may miss arrays with different column types.
So the main point is still the problem of inserting JSON data to parquet
data source table. I did in #4729 just copy the schema of JSON data and modify
its `containsNull` then use it for insertion, without actually modifying the
schema of the JSON data.
Both solutions are working on the unit test. @liancheng @yhuai you can
decide which one is more proper.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]