GitHub user yhuai opened a pull request:
https://github.com/apache/spark/pull/4806
[SPARK-6052][SQL]In JSON schema inference, we should always set
containsNull of an ArrayType to true
Always set `containsNull = true` when infer the schema of JSON datasets. If
we set `containsNull` based on records we scanned, we may miss arrays with null
values when we do sampling. Also, because future data can have arrays with null
values, if we convert JSON data to parquet, always setting `containsNull =
true` is a more robust way to go.
JIRA: https://issues.apache.org/jira/browse/SPARK-6052
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/yhuai/spark jsonArrayContainsNull
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/4806.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #4806
----
commit 05eab9d06b1f7f2311660aded13fac38a5b86ad4
Author: Yin Huai <[email protected]>
Date: 2015-02-27T05:47:31Z
Change containsNull to true.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]