GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21686
[SPARK-24709][SQL] schema_of_json() - schema inference from an example
## What changes were proposed in this pull request?
In the PR, I propose to add new function - *schema_of_json()* which infers
schema of JSON string literal. The result of the function is a string
containing a schema in DDL format.
One of the use cases is using of *schema_of_json()* in the combination with
*from_json()*. Currently, _from_json()_ requires a schema as a mandatory
argument. The *schema_of_json()* function will allow to point out an JSON
string as an example which has the same schema as the first argument of
_from_json()_. For instance:
```sql
select from_json(json_column, schema_of_json('{"c1": [0], "c2":
[{"c2":0}]}'))
from json_table;
```
## How was this patch tested?
Added new test to `JsonFunctionsSuite`, `JsonExpressionsSuite` and SQL
tests to `json-functions.sql`
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 infer_schema_json
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21686.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21686
----
commit 891f3ce6161a079f56a84ae3ff221996561d1506
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T18:12:00Z
Implemented new expression - SchemaOfJson
commit 26f3275bc65b8a97ad4be81bbfe1fc440fe73ff2
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T20:04:40Z
Fix imports
commit 1848a7ac4d0aa8fc95e964d0c3e9b893ae766fe3
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T20:05:49Z
from_json() accepts output of schema_of_json()
commit 42da3f24d63369a30944817f482818bca4917c88
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T20:06:09Z
Added sql tests
commit 97d93b3df7907c6501dd239e66b4ba3d5a4266c2
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T20:52:45Z
New functions - from_json which accepts a Column and schema_of_json
commit ab82bd81fdd4207d8b3321e37d31da424eba15f6
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T20:53:02Z
Tests for new functions
commit 174f8ab7d446b4f5df48d561fdc726708e5c4a52
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T21:00:00Z
Fix for json functions suite
commit d77ed456b6ee4b487212be9ba9a3497c0d72d374
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-30T22:03:41Z
Added the schema_of_json() function to PySpark
commit 086f6c17a5d9ec947d096990eaececbcd497c132
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-01T09:26:07Z
Fix schema_of_json() in PySpark
commit 2ff71e8d8d94c6d5777525a323611e95731f7dda
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-07-01T09:47:25Z
Adding ticket's number to test titles.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]