GitHub user stephend-realitymine opened a pull request:

    https://github.com/apache/spark/pull/9249

    [SPARK-10947] [SQL] With schema inference from JSON into a Dataframe, add 
option to infer all primitive object types as strings

    Currently, when a schema is inferred from a JSON file using 
sqlContext.read.json, the primitive object types are inferred as string, long, 
boolean, etc.
    
    However, if the inferred type is too specific (JSON obviously does not 
enforce types itself), this can cause issues with merging dataframe schemas.
    
    This pull request adds the option "primitivesAsString" to the JSON 
DataFrameReader which when true (defaults to false if not set) will infer all 
primitives as strings.
    
    Below is an example usage of this new functionality.
    ```
    val jsonDf = sqlContext.read.option("primitivesAsString", 
"true").json(sampleJsonFile)
    
    scala> jsonDf.printSchema()
    root
    |-- bigInteger: string (nullable = true)
    |-- boolean: string (nullable = true)
    |-- double: string (nullable = true)
    |-- integer: string (nullable = true)
    |-- long: string (nullable = true)
    |-- null: string (nullable = true)
    |-- string: string (nullable = true)
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/RealityMineLtd/spark stephend-primitives

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9249
    
----
commit a718b8658cda4d849c9057dc9cd601bd6d31503e
Author: Stephen De Gennaro <steph...@realitymine.com>
Date:   2015-10-23T09:03:22Z

    SPARK-10947 Added option to json schema primitivesAsString when true will 
infer primative types as strings

commit 9e6411425546400ae94fd67dc143d2b68c6243aa
Author: Stephen De Gennaro <steph...@realitymine.com>
Date:   2015-10-23T09:32:38Z

    SPARK-10947 adding missed bracket

commit 3989c6aa33acb0af3e151fcd26737cb295de550e
Author: Stephen De Gennaro <steph...@realitymine.com>
Date:   2015-10-23T09:59:06Z

    SPARK-10947 removing duplicate line

commit 18d28619264dbaf10f1e27576f5c4275cbc4ef72
Author: Stephen De Gennaro <steph...@realitymine.com>
Date:   2015-10-23T10:01:30Z

    SPARK-10947 removing extra bracket

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to