[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...

xuanyuanking Thu, 20 Apr 2017 04:20:04 -0700

GitHub user xuanyuanking opened a pull request:

    https://github.com/apache/spark/pull/17702


    [SPARK-20408][SQL] Get the glob path in parallel to reduce resolve relation 
time

    ## What changes were proposed in this pull request?
    This PR change the work of getting glob path in parallel, which can make 
complex wildcard path more quickly, the mainly changes in details:
    1.Add config named `spark.sql.globPathInParallel` , default false
    2.Add new function `getGlobbedPaths` in DataSource, return all paths 
represented by the wildcard, in parallel or not control by the config
    3.Add new function `expandGlobPath ` in SparkHadoopUtil, to expand the 
first dir represented by the wildcard
    
    ## How was this patch tested?
    Existing UT.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuanyuanking/spark SPARK-20408

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17702.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17702
    
----
commit b27ef4f9e696e2b2c2fc2e0df504baea88937234
Author: xuanyuanking <[email protected]>
Date:   2017-04-20T11:07:47Z

    [SPARK-20408][SQL]Get the glob path in parallel to reduce resolve relation 
time

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17702: [SPARK-20408][SQL] Get the glob path in parallel ...

Reply via email to