GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21626

    [SPARK-24642][SQL] New function infers schema for JSON column

    ## What changes were proposed in this pull request?
    
    In the PR, I propose new aggregate function - *infer_schema()*. The 
function infers schema for an expression contains JSON strings. 
*infer_schema()* returns schema in DDL format.
    
    One of use cases is using of *infer_schema()* in combination with 
*from_json()* in SQL:
    
    ```sql
    select from_json(json_col, infer_schema(json_col))
    from json_table;
    ```
    
    ## How was this patch tested?
    
    I added tests to `json-functions.sql` to check schema inferring for array 
and struct types.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 json_infer_schema

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21626
    
----
commit f98aea2b59025f4a41e28fac8e2b2b689ddf4d27
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-23T17:16:22Z

    Initial implementation of the infer_schema function

commit a0c9a1137c5444890f048bd480d63496e31ec599
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-23T17:29:24Z

    Move typeMerger out of the merge function

commit 17a1f98448194af984de43d4dedad99271e25189
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-23T20:14:38Z

    SQL test for the infer_schema function

commit 45fc2e419dda2e53f5ff7e7ecbbee64d2bf23cf7
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-24T10:31:44Z

    Pretty name is changed to infer_schema

commit 4db679927e35cc41e2160b91d2435dc653f368a9
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-24T10:40:01Z

    Refactoring

commit 7e5ad618b6fba583db85dd1bdb251cc824c80bc8
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-24T11:32:32Z

    bug fix

commit 96e5cd33fbc4711302f9f0cf47e851df66fda524
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-24T12:05:12Z

    Added description for InferSchema

commit 333139da49951df1aee39aeabc286a162dd92ad9
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-24T12:05:46Z

    Drop views

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to