GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/17255
[SPARK-19918[SQL] Use TextFileFormat in implementation of JsonFileFormat
## What changes were proposed in this pull request?
This PR proposes to use text datasource when Json schema inference.
This basically proposes the similar approach in
https://github.com/apache/spark/pull/15813 If we use Dataset for initial
loading when inferring the schema, there are advantages. Please refer
SPARK-18362
It seems JSON one was supposed to be fixed together but taken out according
to https://github.com/apache/spark/pull/15813
> A similar problem also affects the JSON file format and this patch
originally fixed that as well, but I've decided to split that change into a
separate patch so as not to conflict with changes in another JSON PR.
Also, this affects some functionalities because it does not use
FileScanRDD. This problem is described in SPARK-19885 (but it was CSV's case).
## How was this patch tested?
Existing tests should cover this.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark json-filescanrdd
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/17255.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #17255
----
commit 5e90d04eb9b1dc011188def339f92f3e8ef7e236
Author: hyukjinkwon <[email protected]>
Date: 2017-03-11T05:42:43Z
Use TextFileFormat in implementation of JsonFileFormat
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]