[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5886/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90637/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21289 **[Test build #90637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90637/testReport)** for PR 21289 at commit [`5cabfd6`](https://github.com/apache/spark/commit/5cabfd6d9d5189b3669cf25f0ade5e8df9029d9d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3228/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21289 **[Test build #90637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90637/testReport)** for PR 21289 at commit [`5cabfd6`](https://github.com/apache/spark/commit/5cabfd6d9d5189b3669cf25f0ade5e8df9029d9d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 @cloud-fan @adrian-ionescu I added a test, please check when you have time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user adrian-ionescu commented on the issue: https://github.com/apache/spark/pull/21289 Fair enough. Actually I think that's how `CatalogFileIndex` works. So, for consistency, we might even want to consider making this the default. Please add a test, though, and improve the PR description. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 I mean the memory refresh always happen after writing data, which I think is not necessary ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 @cloud-fan @adrian-ionescu Thanks for looking into this ! The issue is that when I append into a table with sql below: ``` insert into X select ``` It always refresh the file index of table X in memory, though I don't query X after this job. From this point, the refresh is not necessary and costs much time and memory. In this pr, I proposes to mark the file index as 'outdated', and postpone the refresh until next query of the file index. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user adrian-ionescu commented on the issue: https://github.com/apache/spark/pull/21289 I just took a quick look... Do I understand correctly that your flag basically controls whether the refresh is eager (`true`, current behavior) vs lazy (`false`)? If so, let's rename the flag accordingly. Also, with lazy mode we just delay file listing from immediately after an append operation until the next Spark job involving that table, right? It doesn't seem like we gain much overall.. What's the general use-case that motivated you to make this change? Please explain that more clearly in the PR description. Also, let's add a test that verifies the intended behavior. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90447/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21289 **[Test build #90447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90447/testReport)** for PR 21289 at commit [`b49665d`](https://github.com/apache/spark/commit/b49665d16be723a7abe9fdfa9ea600bd7be349df). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3099/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21289 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/21289 cc @cloud-fan @gengliangwang @adrian-ionescu Do you have any thoughts? Please take a look when you have time, thanks ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21289: [SPARK-24240] Add a config to control whether InMemoryFi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21289 **[Test build #90447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90447/testReport)** for PR 21289 at commit [`b49665d`](https://github.com/apache/spark/commit/b49665d16be723a7abe9fdfa9ea600bd7be349df). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org