Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-200194463
@yhuai oh, I got you. I'll close this for now and keep an eye on this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on Gi
Github user maropu closed the pull request at:
https://github.com/apache/spark/pull/11576
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is ena
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-195152508
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-195152507
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-195152344
**[Test build #52876 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52876/consoleFull)**
for PR 11576 at commit
[`6332f94`](https://g
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-195137366
But, our behavior is still pretty weird. Although `SqlNewHadoopRDD` uses
this conf, we actually does list files when this conf is false. But, looks like
the input file pa
Github user yhuai commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-195135834
@maropu Sorry. My bad. I think we cannot remove this conf right now since
it still affects the behavior of SqlNewHadoopRDD. I will close the JIRA.
---
If your project is
Github user yhuai commented on a diff in the pull request:
https://github.com/apache/spark/pull/11576#discussion_r55778696
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala
---
@@ -314,8 +312,6 @@ private[sql] class DefaultS
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-195120445
**[Test build #52876 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52876/consoleFull)**
for PR 11576 at commit
[`6332f94`](https://gi
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-194112721
Yes, ... Scanning all files are pretty expensive though, we can add a
metadata file for detecting it;
when `DataFrameWriter` writes something in the files, it updates
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-194108153
Yea I think evenutally it might make sense -- but one problem is that it is
very expensive to detect this, especially when there are a very large number of
files, which is
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-194106085
Oh, I see. However, I think it is slightly difficult for users to notice
this option in that case (the option is true by default). How about
automatically detecting the
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-194091960
I think it is a problem when the underlying files change.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well.
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-194090951
Any possible case these metadata caches causing some problems?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-193940778
I'm actually not sure it is OK to completely remove this ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user maropu commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-193731589
@yhuai Could you check this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not ha
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-193729884
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-193729881
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your projec
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-193729610
**[Test build #52652 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52652/consoleFull)**
for PR 11576 at commit
[`6332f94`](https://g
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11576#issuecomment-193687629
**[Test build #52652 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52652/consoleFull)**
for PR 11576 at commit
[`6332f94`](https://gi
GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/11576
[SPARK-13656][SQL] Remove spark.sql.parquet.cacheMetadata in SQLConf
## What changes were proposed in this pull request?
Remove `spark.sql.parquet.cacheMetadata` because most of users do not use
21 matches
Mail list logo