GitHub user windpiger reopened a pull request:

    https://github.com/apache/spark/pull/17081

    [SPARK-18726][SQL][FOLLOW-UP]resolveRelation for FileFormat DataSource 
don't need to  listFiles twice

    ## What changes were proposed in this pull request?
    
    Currently when we resolveRelation for a `FileFormat DataSource` without 
providing user schema, it will execute `listFiles`  twice in 
`InMemoryFileIndex` during `resolveRelation`.
    
    This PR add a `FileStatusCache` for DataSource, this can avoid listFiles 
twice.
    
    But there is a bug in `InMemoryFileIndex` see:
     [SPARK-19748](https://github.com/apache/spark/pull/17079)
     [SPARK-19761](https://github.com/apache/spark/pull/17093), 
    so this pr should be after SPARK-19748/ SPARK-19761.
    
    
    ## How was this patch tested?
    unit test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/windpiger/spark 
resolveDataSourceScanFilesTwice

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17081.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17081
    
----
commit 0082b7633e8f84fe5cafa0362cd45cce4cfee459
Author: windpiger <[email protected]>
Date:   2017-02-27T08:04:30Z

    [SPAKR-18726][SQL]resolveRelation for FileFormate DataSource don't need to  
listFiles twice

commit 6b5454ad0104459565febb520fa22ef30bdb8368
Author: windpiger <[email protected]>
Date:   2017-02-27T08:39:45Z

    add test case

commit f1da0a4cf457f4efb6128beca3c08ccf95ef37a0
Author: windpiger <[email protected]>
Date:   2017-02-27T23:59:34Z

    fix a style

commit f79f12c552ee1721295c347744fc5f92f048c74b
Author: windpiger <[email protected]>
Date:   2017-03-01T22:49:13Z

    Merge branch 'master' into resolveDataSourceScanFilesTwice

commit a8c1deab0fc8e59863bf4a3d3b551f77fbebbc6d
Author: windpiger <[email protected]>
Date:   2017-03-02T01:50:30Z

    fix test failed

commit 60fa03757d223f833e2fa161326a48a9015d4c6c
Author: windpiger <[email protected]>
Date:   2017-03-02T04:49:08Z

    add a lazy

commit 9a73947efea334ba0cfc5b5508003807a93ff806
Author: windpiger <[email protected]>
Date:   2017-03-02T06:49:44Z

    fix code style

commit 850094cd3b77f6ecf33caf88532920e73de976f4
Author: windpiger <[email protected]>
Date:   2017-03-02T06:54:38Z

    Merge branch 'master' of github.com:apache/spark into 
resolveDataSourceScanFilesTwice

commit c39eb26da38f9d92e3871814be446c8d911be890
Author: windpiger <[email protected]>
Date:   2017-03-02T11:03:18Z

    make filestatuscache local var

commit f3332cb870ae2be9383969de07a07c8761230e8b
Author: windpiger <[email protected]>
Date:   2017-03-02T11:04:55Z

    modify a test case

commit 9cadd4168041fd859cc1e4b8396e5ed514129bff
Author: windpiger <[email protected]>
Date:   2017-03-02T11:05:24Z

    modify a test case

commit 28c8158a7c9d7acdbf2a07ef66ace46c1215979f
Author: windpiger <[email protected]>
Date:   2017-03-02T11:06:40Z

    modify a test case

commit 92618b3ad67c899e681a9923ad9abc5a7f2c7897
Author: windpiger <[email protected]>
Date:   2017-03-02T11:07:10Z

    remove an empty line

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to