[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15539 Just submitted a PR for `set location`: https://github.com/apache/spark/pull/16514 That issue is caused by the cache for mapping the table name to LogicalRelation. We need to refresh it after

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-09 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15539 Hmm, I don't think fileStatusCache can ever return incorrect results, only stale ones. Furthermore, its scoped by client-id to particular instances of tables, so refresh table is guaranteed to wipe

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-08 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15539 It could return incorrect results, but I need to prove it using a use case. We always call

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-08 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15539 Hm, what use cases are we trying to address? As I understand, the worst that can happen if the cache size flag is toggled at runtime is that the old settings might still apply. And when the

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-08 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15539 Yeah, I think we should document the behavior issues when different sessions are using different conf values. Will do it. I think we also need to evict all the cache that are associated with the

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-08 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15539 `(ClientId, Path), Array[FileStatus]` uh... `FileStatusCache` does not share any entries with any other client, but does share memory resources for the purpose of cache eviction. Sorry,

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2017-01-08 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15539 That one is safe to make global but mutable right? It will take effect after a table is refreshed. Most of these anomalies seem OK to me provided we document them -- it seems to solve

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15539 @ericl Great work on this. I don't know how I got an author credit in the commit... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-22 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15539 merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15539 I had some network problems, I'll ask @yhuai to merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15539 LGTM, next maybe we can refactor the `PartitionAwareFileCatalog` and make it use the new global cache better. I'm going to merge it to unblock other works. thanks! --- If your project is set up

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67346/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67346/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67346/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15539 > The biggest problem of this proposal is, invalidating the cache may be slow if there are a lot of cache entries. I don't think this is really an issue. Conservatively assuming ~1us per

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15539 We have REFRESH TABLE/PATH because we cache things, so I think we should consider caching and refreshing together. Currently we have 4 caches: 1. **table name to `LogicalRelation`

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67326/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67326 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67326/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67326 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67326/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67292/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67292/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67291/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67291/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67292/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67281/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67281 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67281/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67291/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67281 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67281/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67222/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67222 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67222/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67217/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67217/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67215/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67215/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67222 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67222/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67217/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67215/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67211/ Test FAILed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67211 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67211/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67211 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67211/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67210/ Test FAILed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67210/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/15539 @mallman those numbers seem about right. I think as long as planning time is not that much worse than with the old ListingFileCatalog we are good. --- If your project is set up for it, you can reply

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67210/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67160/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67160/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15539 @ericl I took this PR for a test drive with some large-ish tables. Everything appeared to work as expected. As far as performance goes, planning a simple select on a partitioned table

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/67152/ Test PASSed. ---

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67152/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67160/consoleFull)** for PR 15539 at commit

[GitHub] spark issue #15539: [SPARK-17994] [SQL] Add back a file status cache for cat...

2016-10-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15539 **[Test build #67152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/67152/consoleFull)** for PR 15539 at commit