GitHub user MaxGekk opened a pull request:

    https://github.com/apache/spark/pull/21686

    [SPARK-24709][SQL] schema_of_json() - schema inference from an example

    ## What changes were proposed in this pull request?
    
    In the PR, I propose to add new function - *schema_of_json()* which infers 
schema of JSON string literal. The result of the function is a string 
containing a schema in DDL format.
    
    One of the use cases is using of *schema_of_json()* in the combination with 
*from_json()*. Currently, _from_json()_ requires a schema as a mandatory 
argument. The *schema_of_json()* function will allow to point out an JSON 
string as an example which has the same schema as the first argument of 
_from_json()_. For instance:
    
    ```sql
    select from_json(json_column, schema_of_json('{"c1": [0], "c2": 
[{"c2":0}]}'))
    from json_table;
    ``` 
    
    ## How was this patch tested?
    
    Added new test to `JsonFunctionsSuite`, `JsonExpressionsSuite` and SQL 
tests to `json-functions.sql`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MaxGekk/spark-1 infer_schema_json

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21686.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21686
    
----
commit 891f3ce6161a079f56a84ae3ff221996561d1506
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T18:12:00Z

    Implemented new expression - SchemaOfJson

commit 26f3275bc65b8a97ad4be81bbfe1fc440fe73ff2
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T20:04:40Z

    Fix imports

commit 1848a7ac4d0aa8fc95e964d0c3e9b893ae766fe3
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T20:05:49Z

    from_json() accepts output of schema_of_json()

commit 42da3f24d63369a30944817f482818bca4917c88
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T20:06:09Z

    Added sql tests

commit 97d93b3df7907c6501dd239e66b4ba3d5a4266c2
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T20:52:45Z

    New functions - from_json which accepts a Column and schema_of_json

commit ab82bd81fdd4207d8b3321e37d31da424eba15f6
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T20:53:02Z

    Tests for new functions

commit 174f8ab7d446b4f5df48d561fdc726708e5c4a52
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T21:00:00Z

    Fix for json functions suite

commit d77ed456b6ee4b487212be9ba9a3497c0d72d374
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-06-30T22:03:41Z

    Added the schema_of_json() function to PySpark

commit 086f6c17a5d9ec947d096990eaececbcd497c132
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-07-01T09:26:07Z

    Fix schema_of_json() in PySpark

commit 2ff71e8d8d94c6d5777525a323611e95731f7dda
Author: Maxim Gekk <maxim.gekk@...>
Date:   2018-07-01T09:47:25Z

    Adding ticket's number to test titles.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to