[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I'm closing this PR in favor of #21320. That PR deals with simple projection and filter queries only. I will submit subsequent PRs for aggregation and join queries following the acceptance of

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 BTW I’ve been and am currently traveling with a busy itinerary. I haven’t started work on this and probably won’t get to work on it until Monday at the very earliest. > On May 5,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-04 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16578 Yeah. That is fine. Will try to review the relevant PRs ASAP. Please ping me. Thanks again! --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > To ensure the PR and review quality, we normally avoid doing everything in a single huge PR. It would be much better if you can cut it to a few smaller PRs. I'll have a go at it. Of

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-01 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16578 To ensure the PR and review quality, we normally avoid doing everything in a single huge PR. It would be much better if you can cut it to a few smaller PRs. Both @cloud-fan and I think

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89794/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #89794 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89794/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/2636/

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #89794 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89794/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-18 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16578 I only looked at the PR description, here are my 2 cents: Currently column pruning is done with 2 steps in Spark: 1) optimizer generates extra `Project` to prune unnecessary columns as

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16578 I will review this huge PR. : ) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-10 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 hi - where are we on this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-12 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16578 Please do the review @gengliangwang @jiangxb1987 . We should support this feature in Spark 2.4.0 --- - To unsubscribe,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-02 Thread zaycev
Github user zaycev commented on the issue: https://github.com/apache/spark/pull/16578 I observed about 5x better performance in reading a small subset of fields of a highly nested parquet table: master:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread Gauravshah
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 we have back-ported it to 2.2, on production by an average it has saved us at least 2x time. --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread Gauravshah
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 @marmbrus can we target it for 2.4 ? need help on reviews. Been in waiting state for very long --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87859/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #87859 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87859/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #87859 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87859/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-03-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1209/

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-02-05 Thread DaimonPl
Github user DaimonPl commented on the issue: https://github.com/apache/spark/pull/16578 So if it's not going to be included in `2.3.0` maybe we could change `spark.sql.nestedSchemaPruning.enabled` to default `true` ? I hope that this time this PR could be finalized at the early stage

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I'd just suggest trying it. Since this PR is a patch for master, please message me personally at m...@allman.ms to discuss progress and questions on a backport to 2.2. If we get it working,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-08 Thread Gauravshah
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 @mallman do you foresee any issues ? planning to backport it to spark 2.2 on personal fork. will probably make jitpack release ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85662/ Test FAILed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #85662 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85662/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #85662 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85662/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread VigneshMohan1
Github user VigneshMohan1 commented on the issue: https://github.com/apache/spark/pull/16578 @JoshRosen Can we make this pr to 2.3.0? A lot of people are interested in this and this will boost performance in reading parquet nested fields. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread Gauravshah
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 @marmbrus can we start the review process ? so that it can make it for the next release ? --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > However, I am -1 on merging a change this large after branch cut. It's disappointing, but I agree we can't merge a change this large into a branch cut. It will have to wait for 2.3.1 at

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread marmbrus
Github user marmbrus commented on the issue: https://github.com/apache/spark/pull/16578 I agree that this PR needs to be allocated more review bandwidth, and it is unfortunate that it has been blocked on that. However, I am -1 on merging a change this large after branch cut. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread ianoc
Github user ianoc commented on the issue: https://github.com/apache/spark/pull/16578 Given it has one or two deep review's already, can someone just rubber stamp this in a bias to shipping? Its been stalled more or less since July waiting on reviewers. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 We are still merging changes to the 2.3 branch :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread Gauravshah
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 @DaimonPl branch 2.3 is already cut, so its at least not making to 2.3 :( --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-02 Thread DaimonPl
Github user DaimonPl commented on the issue: https://github.com/apache/spark/pull/16578 New year, new review? ;) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-13 Thread Gauravshah
Github user Gauravshah commented on the issue: https://github.com/apache/spark/pull/16578 thank @mallman for rebasing each time. @gatorsmile can you take a look at it ? --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84820/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84820/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84820/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84812/ Test FAILed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84812/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84812 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84812/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-12 Thread abhaynahar
Github user abhaynahar commented on the issue: https://github.com/apache/spark/pull/16578 sorry for spamming, but @rxin @marmbrus @ericl @cloud-fan @liancheng can you please help taking this forward ? @viirya has reviewed it closely and is looking for someone else to review this as

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 @abhaynahar I think the reviewers are already included... --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-11 Thread abhaynahar
Github user abhaynahar commented on the issue: https://github.com/apache/spark/pull/16578 @viirya can you please help tag people you think should review ? --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 As I mentioned before, we still don't have enough eyes on this change so far. --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-09 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 yes! sorry about the delay, I think there's a lot of interests in this PR. @gatorsmile @viirya ? --- - To

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84404/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84404/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-12-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84404/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-26 Thread sriramrajendiran
Github user sriramrajendiran commented on the issue: https://github.com/apache/spark/pull/16578 @felixcheung can you help ? we are hoping to see it in 2.3 release. Feature underneath a default disabled flag looks safe option. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > But I think we still need other eyes on this too. Agreed. @rxin can you help rope anyone else in on this? It's a big PR with a bigger history, but absent some savaging by another

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Can you give an example it would fail? We didn't change clipParquetSchema, so I think even when pruning happens, why we read a super set of the file's schema and cause the exception, according to

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84041/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84041 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84041/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #84041 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84041/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-08 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 I'm going on this again. But I think we still need other eyes on this too. --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83532/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83532/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83532 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83532/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-06 Thread Swat123
Github user Swat123 commented on the issue: https://github.com/apache/spark/pull/16578 @viirya can we close this before we get another set of merge conflicts ? Thanks --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya Can you please take a look at my latest revisions and replies to your comments? Cheers. --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83387/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83387/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I can't tell what's causing the build to fail: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83390/console Any ideas? ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83390/ Test FAILed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Yeah, I think with a config for this optimization is good. I added a config switch, `spark.sql.nestedSchemaPruning.enabled`, which disables the optimizations if set to `false`. By default

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83389/ Test FAILed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83387 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83387/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-01 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 Yeah, I think with a config for this optimization is good. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-01 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 Thanks @CodingCat +1 on config switch. I think that would be a good idea. --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-01 Thread CodingCat
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/16578 made a simple test in a single-node spark environment I used a synthetic dataset which is generated as: (that’s 20M) ```scala import spark.implicits._ import

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > I'm reluctant to generalize this PR without practical experience applying it to other column-oriented file formats. The only format I'm familiar with and have production experience with is

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > @mallman I will try to go through this again. Do you think this can be generalized to data source v2 API? I'm not familiar with that API. I'm reluctant to generalize this PR

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-30 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 @mallman I will try to go through this again. Do you think this can be generalize to data source v2 API? --- - To unsubscribe,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-30 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 thanks! ping/add @rxin @hvanhovell @gatorsmile @cloud-fan @liancheng @joseph-torres --- - To unsubscribe, e-mail:

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-28 Thread bitcot
Github user bitcot commented on the issue: https://github.com/apache/spark/pull/16578 Thanks @mallman this is very helpful. @felixcheung @rxin can you please help to take this forward ? --- - To unsubscribe,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya I've rebased to resolve conflicts. All tests are passing. Can you take another look and sign off? Cheers. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83128/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83128/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83128/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @DaimonPl I'm going to resolve the merge conflicts shortly. Otherwise, I have no intention of making further modifications to this PR outside of further review. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-26 Thread DaimonPl
Github user DaimonPl commented on the issue: https://github.com/apache/spark/pull/16578 @mallman how about finalizing it as is? IMHO performance improvements are worth more than (possibly) redundant workaround - it could be cleaned later ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-13 Thread amankothari04
Github user amankothari04 commented on the issue: https://github.com/apache/spark/pull/16578 @viirya did you get a chance to review this ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-09 Thread DaimonPl
Github user DaimonPl commented on the issue: https://github.com/apache/spark/pull/16578 @mallman @viirya from my understanding current workaround is for case when reading columns which are not in file schema > Parquet-mr will throw an exception if we try to read a superset of

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82383/ Test PASSed. ---

  1   2   3   >