GitHub user MaxGekk opened a pull request:
https://github.com/apache/spark/pull/22442
[SPARK-25447][SQL] Support JSON options by schema_of_json()
## What changes were proposed in this pull request?
In the PR, I propose to extended the `schema_of_json()` function, and
accept JSON options since they can impact on schema inferring. Purpose is to
support the same options that `from_json` can use during schema inferring.
## How was this patch tested?
Added SQL, Python and Scala tests (`JsonExpressionsSuite` and
`JsonFunctionsSuite`) that checks JSON options are used.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/MaxGekk/spark-1 schema_of_json-options
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22442.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22442
----
commit 6a9ec940af1714a603d71f995201c4753b0e06c4
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-16T21:06:27Z
Accept options in schema_of_json
commit 68e1438e9bfdf9c4ab2cc68c251308f0463df4ef
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-16T21:29:59Z
Fix examples
commit 3365e086f662da859dcd74c0973c7331925c5bcd
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-17T10:12:02Z
Added sql tests
commit 62ef168336633e014c1656ff79f17205df6a81d8
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-17T11:46:49Z
Added a signature which accepts options
commit 9d3b1a2be52094c13c8543cccf3fc9c8d177e480
Author: Maxim Gekk <maxim.gekk@...>
Date: 2018-09-17T12:31:45Z
Support options in PySpark
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]