GitHub user HyukjinKwon opened a pull request:
https://github.com/apache/spark/pull/11270
[SPARK-8000][SQL] Support for auto-detecting data sources.
https://issues.apache.org/jira/browse/SPARK-8000
This PR adds the support for detecting data source by extension.
As I described in comments, detection follows the steps below:
This tries to find out data source by file extension if the `format()` is
not called.
The auto-detection is based on given paths and it recognizes glob pattern
as well but
it does not recursively check the sub-paths even if the given paths are
directories.
This source detection goes the following steps
1. Check `provider` and use this if this is not `null`.
2. If `provider` is not given, then it tries to detect the source types
by extension.
at this point, if detects only if all the given paths have the same
extension.
3. if it fails to detect, use the datasource given to
`spark.sql.sources.default`.
Each tests has been added for each datasource.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HyukjinKwon/spark SPARK-8000
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11270.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11270
----
commit 23ba7266358a3de4800bb65da316c20f60bbf7a8
Author: hyukjinkwon <[email protected]>
Date: 2016-02-19T10:15:44Z
Support for auto-detecting data sources.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]