[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
Thanks for the response all. @mailman If it's really your preference, I 
will create a PR against that branch and close this one. My intention was never 
to take away from your efforts, and I still consider my work here to be just 
minor stylistic tweaks on top of your work. I did this as service to help 
bridge the divide and hopefully alleviate frustrations. But this has been a bit 
frustrating being stuck between two sides of this and changing merge strategies 
often and don't wish to continue being in between like this. 

As such, I will create a PR, but hope it does not dragged out to settle any 
differences in opinions between maintainers and submitters. My goal is to make 
sure this valuable feature gets merged so many can benefit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
Essentially, this PR was created to take the management of #21320 out of my 
hands, with a view towards facilitating its incorporation into Spark 2.4. It 
was my suggestion, one based in frustration. In hindsight, I no longer believe 
this strategy is the best—or most expedient—approach towards progress. 
Indeed, I believe the direction of this PR has become orthogonal to its 
motivating goal, becoming a dispute between myself and @HyukjinKwon rather than 
a means to move things along.

I believe I can shepherd #21320 in a way that will promote greater 
progress. @ajacques, I mean no disrespect, and I thank you for volunteering 
your time, patience and effort for the sake of all that are interested in 
seeing this patch become a part of Spark. And I apologize for letting you down, 
letting everyone down. In my conduct leading up to the creation of this PR I 
did not act with the greatest maturity or patience. And I did not act in the 
best interests of the community.

No one has spent more time or more effort, taken more responsibility or 
exhibited more patience with this 2+ year patch-set-in-the-making than myself. 
I respectfully submit it is mine to present and manage, and no one else's. 
Insofar as I have expressed otherwise in the past, I admit my error—one made 
in frustration—and recant in hindsight.

@ajacques, at this point I respectfully assert that managing the patch set 
I submitted in #21320 is not your responsibility, nor is it anyone else's but 
mine. I ask you to close this PR so that we can resume the review in #21320. As 
I stated there, you are welcome to open a PR on 
https://github.com/VideoAmp/spark-public/tree/spark-4502-parquet_column_pruning-foundation
 to submit the changes you've made for review.

Thank you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
>  I've only taken it as a based to make stylistic changes based on the 
code review to help move things along.

This PR doesn't only include stylistic changes. Since stylistic changes do 
not usually block a PR, mind fixing the PR description?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #4278 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4278/testReport)**
 for PR 21889 at commit 
[`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #4278 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4278/testReport)**
 for PR 21889 at commit 
[`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94805/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94805/testReport)**
 for PR 21889 at commit 
[`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> Due to the urgency of the upcoming 2.4 code freeze, I'm going to open 
this PR to collect any feedback. This can be closed if you prefer to continue 
to the work in the original PR.

That would be my preference, yes, especially if it means less work for you.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94805/testReport)**
 for PR 21889 at commit 
[`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94790/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94790/testReport)**
 for PR 21889 at commit 
[`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94790/testReport)**
 for PR 21889 at commit 
[`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94785/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)**
 for PR 21889 at commit 
[`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94785 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)**
 for PR 21889 at commit 
[`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94731/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94731 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94731/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94731 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94731/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
ok to test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-13 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
@ajacques I added a commit to enable schema pruning by default. It's a 
little more complete than your commit to do the same. Please rebase off my 
branch and remove your commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
>> @mallman, while we wait for the go-no-go, do you have the changes for 
the next PR ready? Is there anything you need help with?

> I have the hack I used originally, but I haven't tried finding a better 
solution yet. It could take some time to understand the underlying 
problem/incompatibility/misunderstanding/etc.

I spent some time yesterday digging deeper into why the hack I wrote 
worked, and I think I understand now. Practically speaking, my follow-on PR 
will be about the same as the commit I removed. However, I can support it with 
some explanatory comments instead of just "this throws an exception sometimes".


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94536/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
From a cursory look, the last failure looks unrelated.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@gatorsmile Do you think there is a on deterministic failure in this change 
that causes it to inconsistently fail? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94499/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94499 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94499/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94503/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94503 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94503/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94503 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94503/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94499/testReport)**
 for PR 21889 at commit 
[`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> @mallman, while we wait for the go-no-go, do you have the changes for the 
next PR ready? Is there anything you need help with?

I have the hack I used originally, but I haven't tried finding a better 
solution yet. It could take some time to understand the underlying 
problem/incompatibility/misunderstanding/etc.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
@ajacques Please rebase off my branch.

@gatorsmile I don't recall seeing that error before. Any idea for how I can 
reproduce and debug?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21889
  
I hit the following error in my local environment. 
```
sbt.ForkMain$ForkError: org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 220.0 failed 1 times, most recent failure: Lost 
task 0.0 in stage 220.0 (TID 465, localhost, executor driver): 
java.lang.IllegalArgumentException: Length -67059888 and offset 
140049531604288must be non-negative
at 
org.apache.spark.unsafe.memory.MemoryBlock.(MemoryBlock.java:64)
at 
org.apache.spark.unsafe.memory.OffHeapMemoryBlock.(OffHeapMemoryBlock.java:26)
at 
org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.getBytesAsUTF8String(OffHeapColumnVector.java:221)
at 
org.apache.spark.sql.execution.vectorized.WritableColumnVector.getUTF8String(WritableColumnVector.java:382)
at 
org.apache.spark.sql.vectorized.ColumnarArray.getUTF8String(ColumnarArray.java:127)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:617)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
at org.apache.spark.scheduler.Task.run(Task.scala:130)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
```

Could you turn on the flag in the PR? I want to trigger the tests multiple 
times in the PR? @ajacques 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@mallman, while we wait for the go-no-go, do you have the changes for the 
next PR ready? Is there anything you need help with?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
Are we waiting for @gatorsmile's go-ahead and merge?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94409/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94409 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94409/testReport)**
 for PR 21889 at commit 
[`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94409 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94409/testReport)**
 for PR 21889 at commit 
[`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94408/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94408 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94408/testReport)**
 for PR 21889 at commit 
[`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> just for clarification, so now .. there no outstanding bugs, some tests 
are ignored per #21320 (comment) and left comments were mostly addressed. Did i 
understand correctly?

The ignored tests—and the scenarios they are intended to test—will fail 
with a runtime exception if this feature is enabled. I put forward a fix in 
`ParquetReadSupport.scala`, but @gatorsmile didn't want to address that in this 
PR. Otherwise, there are no known bugs with this patch.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94408 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94408/testReport)**
 for PR 21889 at commit 
[`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94406/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94406 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94406/testReport)**
 for PR 21889 at commit 
[`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
just for clarification, so now .. there no outstanding bugs, some tests are 
ignored per https://github.com/apache/spark/pull/21320#issuecomment-406353694 
and left comments were mostly addressed. Did i understand correctly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
See https://github.com/apache/spark/pull/21320#issuecomment-406353694 for 
@gatorsmile's request to move the changes to `ParquetReadSupport.scala` to 
another PR.

There was another, unrelated bug reported by @jainaks and addressed in 
https://github.com/apache/spark/pull/21320#issuecomment-408588685. AFAIK, 
there's nothing outstanding blocking this PR from being merged as I stated in 
https://github.com/apache/spark/pull/21889#issuecomment-410557228.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
That comment is before 
https://github.com/apache/spark/pull/21889#issuecomment-408330791. I am okay in 
general but want to be clear if I'm ignoring his decision or not.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
>> but @gatorsmile wants to review it in a follow-on PR.

> Where did he say it after the comment above?

It was my interpretation of this comment: 
https://github.com/apache/spark/pull/21320#issuecomment-406353694

@gatorsmile, @HyukjinKwon Do we wish to block this PR to fix the issue with 
it enabled? It's not clear what your expectations are for this PR. 
1. Are you okay with it not 100% working if it's disabled by default
2. Do you want this issue to be fixed at the cost of bringing more changes 
into this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
> but @gatorsmile wants to review it in a follow-on PR.

I need a confirmation from @gatorsmile. I don't want to ignore his decision 
here in

> Just FYI, we are unable to merge it if it has a correctness bug.

@ajacques, thanks. I overlooked the recent changes made. Will take another 
look soon but don't block on this since most of them look addressed from a 
cursory look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94406 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94406/testReport)**
 for PR 21889 at commit 
[`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@HyukjinKwon Looks like most of your comments have been already addressed, 
but I've gone ahead and made a few more tweaks to help this get merged. Please 
let me know if any blocking comments have been missed.

As mentioned: This feature is not known to have any regressions in the 
default, disabled state.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
> but @gatorsmile wants to review it in a follow-on PR.

Where did he say it after the comment above?

Also why don't you address my comments if you're going to push more changes 
then.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> Assuming from #21889 (comment), we shouldn't have any identified bug 
here. What kind of bugs left to be fixed?

That bug was address by b50ddb4. We still need to fix the bug underlying 
the failing (ignored) test case. I have a tentative fix for that, but 
@gatorsmile wants to review it in a follow-on PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
Assuming from 
https://github.com/apache/spark/pull/21889#issuecomment-408330791, we shouldn't 
have any identified bug here. What kind of bugs left to be fixed?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21889
  
Can we address the comments I left on that PR too? Looks that's the only 
way to get through this? FWIW, since 
https://github.com/apache/spark/commit/51bee7aca13451167fa3e701fcd60f023eae5e61 
is merged, we can now contribute to all people involved here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
Is there anything I can do to help with this PR? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
Jenkins build successful. Any PR comments/blockers to merge for phase 1?

cc @HyukjinKwon, @gatorsmile, @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94252/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94252 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94252/testReport)**
 for PR 21889 at commit 
[`8d7f4bc`](https://github.com/apache/spark/commit/8d7f4bc1874f8ae3c2cda8e5aa96a8647a56128d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> Alright to make sure we're all on the same page, it sounds like we're 
ready to merge this PR pending:
>
> * Successful build by Jenkins
> * Any PR comments from a maintainer
>
> This feature will be merged in disabled state and can't be enabled until 
the next PR is merged, but we do not expect any regression in behavior in the 
default disabled state.

I agree.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
Alright to make sure we're all on the same page, it sounds like we're ready 
to merge this PR pending:
* Successful build by Jenkins
* Any PR comments from a maintainer

This feature will be merged in disabled state and can't be enabled until 
the next PR is merged, but we do not expect any regression in behavior in the 
default disabled state.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94252 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94252/testReport)**
 for PR 21889 at commit 
[`8d7f4bc`](https://github.com/apache/spark/commit/8d7f4bc1874f8ae3c2cda8e5aa96a8647a56128d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> @mallman Is it related to this revert in ParquetReadSupport.scala? I 
re-added this logic and all 32 tests in ParquetSchemaPruningSuite passed.

Yes. That's what we need to work on in the next PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@mallman Is it related to [this revert in 
ParquetReadSupport](https://github.com/apache/spark/pull/21889/commits/0312a5188f0d6c9fc5304195dbdc703bf0aa3fb7#diff-245e70c1f41e353e34cf29bd00fd9029L86).
 I re-added this logic and all 32 tests in ParquetSchemaPruningSuite passed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
I've pushed a commit to restore the original test coverage while also 
ensuring determinism of the output. Don't ask me how I did it. It's a secret!

The test that was failing before it was kinda passing is now failing again 
so I marked it ignored so it wouldn't break Jenkins. And I reverted the commit 
that enabled this feature by default, because it's still broken.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> select id, name.middle, address from temp - Works
> select name.middle, address from temp - Fails
> select name.middle from temp - Works
> select name.middle, id, address from temp - Works
> select name.middle, address, id from temp - Works

Removing the `order by` clause from your test query caused it to fail, but 
it has nothing to do with ordering. It appears that the failure in this case is 
manifested when the file scan schema is exactly the `name.middle` and `address` 
columns. Introducing the `order by` clauses in the test suite queries gave them 
necessary determinism for checking query answers, but these modifications also 
altered the file scan schema.

I need to fix the tests, but I think that the failure underlying the 
previously ignored test case has not been resolved after all. It was just a 
case of confusing coincidence. Unfortunately we're still not ready to merge 
this PR yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@mallman 

`select id, name.middle, address from temp` - **Works**
`select name.middle, address from temp` - **Fails**


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> Test build #94228 has finished for PR 21889 at commit 92901da.

The test failure appears to be unrelated to this PR.

Is it just me or has the test suite become flakier in the past few months?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> The tests as committed pass for me, but I removed the order by id and I 
got that error. Are you saying it works with the specific query in my comment?

@ajacques Please try this query:

```
select id, name.middle, address from temp
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21889
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94228/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94228 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94228/testReport)**
 for PR 21889 at commit 
[`92901da`](https://github.com/apache/spark/commit/92901da3785ce94db501a4c3d9be6316cfbf29a9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> The tests as committed pass for me, but I removed the order by id and I 
got that error. Are you saying it works with the specific query in my comment?

Oh! I didn't notice you changed the query.

Okay. I'll take a closer look.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
The tests as committed pass for me, but I removed the `order by id` and I 
got that error. Are you saying it works with the specific query in my comment?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> @mallman: I've rebased on top of your changes and pushed. I'm seeing the 
following:

That test passes for me locally. Also, I inspected your branch and could 
not find any errors in the rebase. What commit hash are you testing locally? 
I'm using `92901da3785ce94db501a4c3d9be6316cfbf29a9`.

Please ensure we're on the same commit. If so, try doing an `sbt clean` and 
running your test again.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> @mallman: I've rebased on top of your changes and pushed. I'm seeing the 
following

That's the test case that I "unignored". It was passing. There must be some 
simple explanation. I will look into it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@mallman: I've rebased on top of your changes and pushed. I'm seeing the 
following:

Given the following schema:
```
root
 |-- id: integer (nullable = true)
 |-- name: struct (nullable = true)
 ||-- first: string (nullable = true)
 ||-- middle: string (nullable = true)
 ||-- last: string (nullable = true)
 |-- address: string (nullable = true)
 |-- pets: integer (nullable = true)
 |-- friends: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- first: string (nullable = true)
 |||-- middle: string (nullable = true)
 |||-- last: string (nullable = true)
 |-- relatives: map (nullable = true)
 ||-- key: string
 ||-- value: struct (valueContainsNull = true)
 |||-- first: string (nullable = true)
 |||-- middle: string (nullable = true)
 |||-- last: string (nullable = true)
 |-- p: integer (nullable = true)
```

The query: `select name.middle, address from temp` throws:
```
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read 
value at 0 in block -1 in file 
file:/private/var/folders/ss/cw601dzn59b2nygs8k1bs78x75lhr0/T/spark-cab140ca-cbba-4dc1-9fe5-6ae739dab70a/contacts/p=2/part-0-91d2abf5-625f-4080-b34c-e373b89c9895-c000.snappy.parquet
  at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
  at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
  at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
  at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186)
  ... 20 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
  at java.util.ArrayList.rangeCheck(ArrayList.java:657)
  at java.util.ArrayList.get(ArrayList.java:433)
  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:99)
  at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:99)
  at 
org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:97)
  at 
org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:92)
  at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:278)
  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
  at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
  at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
  at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
  at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
  ... 25 more
```

No root cause yet, but I noticed this while working with the unit tests.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21889
  
**[Test build #94228 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94228/testReport)**
 for PR 21889 at commit 
[`92901da`](https://github.com/apache/spark/commit/92901da3785ce94db501a4c3d9be6316cfbf29a9).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread ajacques
Github user ajacques commented on the issue:

https://github.com/apache/spark/pull/21889
  
@mallman: [This 
one](https://github.com/apache/spark/pull/21889/files#diff-0c6c7481232e9637b91c179f1005426aR120)?
 I just enabled it on my branch and the test passed. Was it fixed by your 
latest changes or am I missing something?

```
Expected:
struct,address:string>

Actual:
fileSourceScanSchemata = {ArrayBuffer@12560} "ArrayBuffer" size = 1
 0 = {StructType@15492} "StructType" size = 3
  0 = {StructField@15494} "StructField(id,IntegerType,true)"
  1 = {StructField@15495} 
"StructField(name,StructType(StructField(middle,StringType,true)),true)"
  2 = {StructField@15496} "StructField(address,StringType,true)"
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue:

https://github.com/apache/spark/pull/21889
  
> Are there any other blockers to enabling this by default now that 
@mallman fixed the currently known broken queries?

The functionality exercised by the ignored test in 
`ParquetSchemaPruningSuite.scala` is still broken. That's something we're 
hoping to fix in a follow on PR. This PR has to be merged first.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >