[GitHub] spark pull request #16090: [SPARK-18661] [SQL] Creating a partitioned dataso...

ericl Wed, 30 Nov 2016 15:45:12 -0800

GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/16090


    [SPARK-18661] [SQL] Creating a partitioned datasource table should not scan 
all files for table

    ## What changes were proposed in this pull request?
    
    Even though in 2.1 creating a partitioned datasource table will not 
populate the partition data by default (until the user issues MSCK REPAIR 
TABLE), it seems we still scan the filesystem for no good reason.
    We should avoid doing this when the user specifies a schema.
    
    ## How was this patch tested?
    
    Perf stat tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark spark-18661

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16090
    
----
commit 51d2a4141a3e808616d291ed089b0b1d8172b80a
Author: Eric Liang <[email protected]>
Date:   2016-11-30T23:43:59Z

    Wed Nov 30 15:43:59 PST 2016

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16090: [SPARK-18661] [SQL] Creating a partitioned dataso...

Reply via email to