GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/12748

    [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a 
directory if there is user specified schema

    ## What changes were proposed in this pull request?
    The FileCatalog object gets created even if the user specifies schema, 
which means files in the directory is enumerated even thought its not 
necessary. For large directories this is very slow. User would want to specify 
schema in such scenarios of large dirs, and this defeats the purpose quite a 
bit.
    
    ## How was this patch tested?
    Hard to test this with unit test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-14970

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12748.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12748
    
----
commit 4e4e8dba8209513d105f0e195ca0b06a3eb6c70e
Author: Tathagata Das <[email protected]>
Date:   2016-04-28T02:29:39Z

    Fixed bug

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to