GitHub user marmbrus opened a pull request:

    https://github.com/apache/spark/pull/15274

    [SPARK-17699] Support for parsing JSON string columns

    Spark SQL has great support for reading text files that contain JSON data.  
However, in many cases the JSON data is just one column amongst others.  This 
is particularly true when reading from sources such as Kafka.  This PR adds a 
new functions `from_json` that converts a string column into a nested 
`StructType` with a user specified schema.
    
    Example usage:
    ```scala
    val df = Seq("""{"a": 1}""").toDS()
    val schema = new StructType().add("a", IntegerType)
    
    df.select(from_json($"value", schema) as 'json) // => [json: <a: int>]
    ```
    
    This PR adds support for java, scala and python.  I leveraged our existing 
JSON parsing support by moving it into catalyst (so that we could define 
expressions using it).  I left SQL out for now, because I'm not sure how users 
would specify a schema.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/marmbrus/spark jsonParser

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15274
    
----
commit 62f56a7e4529b35f58a229097b012bc984fd458f
Author: Michael Armbrust <mich...@databricks.com>
Date:   2016-09-28T02:49:22Z

    [SPARK-17699] Support for parsing JSON string columns

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to