GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/21626
[SPARK-24642][SQL] New function infers schema for JSON column
## What changes were proposed in this pull request?
In the PR, I propose new aggregate function - *infer_schema()*. The
function infers schema for an expression contains JSON strings.
*infer_schema()* returns schema in DDL format.
One of use cases is using of *infer_schema()* in combination with
*from_json()* in SQL:
```sql
select from_json(json_col, infer_schema(json_col))
from json_table;
```
## How was this patch tested?
I added tests to `json-functions.sql` to check schema inferring for array
and struct types.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 json_infer_schema
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21626.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21626
----
commit f98aea2b59025f4a41e28fac8e2b2b689ddf4d27
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-23T17:16:22Z
Initial implementation of the infer_schema function
commit a0c9a1137c5444890f048bd480d63496e31ec599
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-23T17:29:24Z
Move typeMerger out of the merge function
commit 17a1f98448194af984de43d4dedad99271e25189
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-23T20:14:38Z
SQL test for the infer_schema function
commit 45fc2e419dda2e53f5ff7e7ecbbee64d2bf23cf7
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-24T10:31:44Z
Pretty name is changed to infer_schema
commit 4db679927e35cc41e2160b91d2435dc653f368a9
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-24T10:40:01Z
Refactoring
commit 7e5ad618b6fba583db85dd1bdb251cc824c80bc8
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-24T11:32:32Z
bug fix
commit 96e5cd33fbc4711302f9f0cf47e851df66fda524
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-24T12:05:12Z
Added description for InferSchema
commit 333139da49951df1aee39aeabc286a162dd92ad9
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-06-24T12:05:46Z
Drop views
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]