[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-22 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-200194463 @yhuai oh, I got you. I'll close this for now and keep an eye on this. --- If your project is set up for it, you can reply to this email and have your reply appear on Gi

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-22 Thread maropu
Github user maropu closed the pull request at: https://github.com/apache/spark/pull/11576 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is ena

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-195152508 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-195152507 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-195152344 **[Test build #52876 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52876/consoleFull)** for PR 11576 at commit [`6332f94`](https://g

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-195137366 But, our behavior is still pretty weird. Although `SqlNewHadoopRDD` uses this conf, we actually does list files when this conf is false. But, looks like the input file pa

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-195135834 @maropu Sorry. My bad. I think we cannot remove this conf right now since it still affects the behavior of SqlNewHadoopRDD. I will close the JIRA. --- If your project is

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11576#discussion_r55778696 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala --- @@ -314,8 +312,6 @@ private[sql] class DefaultS

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-10 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-195120445 **[Test build #52876 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52876/consoleFull)** for PR 11576 at commit [`6332f94`](https://gi

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-194112721 Yes, ... Scanning all files are pretty expensive though, we can add a metadata file for detecting it; when `DataFrameWriter` writes something in the files, it updates

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-194108153 Yea I think evenutally it might make sense -- but one problem is that it is very expensive to detect this, especially when there are a very large number of files, which is

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-194106085 Oh, I see. However, I think it is slightly difficult for users to notice this option in that case (the option is true by default). How about automatically detecting the

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-194091960 I think it is a problem when the underlying files change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-194090951 Any possible case these metadata caches causing some problems? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-193940778 I'm actually not sure it is OK to completely remove this ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread maropu
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-193731589 @yhuai Could you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not ha

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-193729884 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-193729881 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your projec

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-193729610 **[Test build #52652 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52652/consoleFull)** for PR 11576 at commit [`6332f94`](https://g

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11576#issuecomment-193687629 **[Test build #52652 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52652/consoleFull)** for PR 11576 at commit [`6332f94`](https://gi

[GitHub] spark pull request: [SPARK-13656][SQL] Remove spark.sql.parquet.ca...

2016-03-08 Thread maropu
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/11576 [SPARK-13656][SQL] Remove spark.sql.parquet.cacheMetadata in SQLConf ## What changes were proposed in this pull request? Remove `spark.sql.parquet.cacheMetadata` because most of users do not use