[GitHub] spark pull request: [SPARK-6052][SQL]In JSON schema inference, we ...

yhuai Thu, 26 Feb 2015 21:53:08 -0800

GitHub user yhuai opened a pull request:

    https://github.com/apache/spark/pull/4806


    [SPARK-6052][SQL]In JSON schema inference, we should always set 
containsNull of an ArrayType to true

    Always set `containsNull = true` when infer the schema of JSON datasets. If 
we set `containsNull` based on records we scanned, we may miss arrays with null 
values when we do sampling. Also, because future data can have arrays with null 
values, if we convert JSON data to parquet, always setting `containsNull = 
true` is a more robust way to go.
    
    JIRA: https://issues.apache.org/jira/browse/SPARK-6052

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/yhuai/spark jsonArrayContainsNull

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4806.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4806
    
----
commit 05eab9d06b1f7f2311660aded13fac38a5b86ad4
Author: Yin Huai <[email protected]>
Date:   2015-02-27T05:47:31Z

    Change containsNull to true.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6052][SQL]In JSON schema inference, we ...

Reply via email to