[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Thanks for the response all. @mailman If it's really your preference, I will create a PR against that branch and close this one. My intention was never to take away from your efforts, and I still consider my work here to be just minor stylistic tweaks on top of your work. I did this as service to help bridge the divide and hopefully alleviate frustrations. But this has been a bit frustrating being stuck between two sides of this and changing merge strategies often and don't wish to continue being in between like this. As such, I will create a PR, but hope it does not dragged out to settle any differences in opinions between maintainers and submitters. My goal is to make sure this valuable feature gets merged so many can benefit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Essentially, this PR was created to take the management of #21320 out of my hands, with a view towards facilitating its incorporation into Spark 2.4. It was my suggestion, one based in frustration. In hindsight, I no longer believe this strategy is the bestâor most expedientâapproach towards progress. Indeed, I believe the direction of this PR has become orthogonal to its motivating goal, becoming a dispute between myself and @HyukjinKwon rather than a means to move things along. I believe I can shepherd #21320 in a way that will promote greater progress. @ajacques, I mean no disrespect, and I thank you for volunteering your time, patience and effort for the sake of all that are interested in seeing this patch become a part of Spark. And I apologize for letting you down, letting everyone down. In my conduct leading up to the creation of this PR I did not act with the greatest maturity or patience. And I did not act in the best interests of the community. No one has spent more time or more effort, taken more responsibility or exhibited more patience with this 2+ year patch-set-in-the-making than myself. I respectfully submit it is mine to present and manage, and no one else's. Insofar as I have expressed otherwise in the past, I admit my errorâone made in frustrationâand recant in hindsight. @ajacques, at this point I respectfully assert that managing the patch set I submitted in #21320 is not your responsibility, nor is it anyone else's but mine. I ask you to close this PR so that we can resume the review in #21320. As I stated there, you are welcome to open a PR on https://github.com/VideoAmp/spark-public/tree/spark-4502-parquet_column_pruning-foundation to submit the changes you've made for review. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 > I've only taken it as a based to make stylistic changes based on the code review to help move things along. This PR doesn't only include stylistic changes. Since stylistic changes do not usually block a PR, mind fixing the PR description? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #4278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4278/testReport)** for PR 21889 at commit [`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #4278 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4278/testReport)** for PR 21889 at commit [`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94805/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94805 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94805/testReport)** for PR 21889 at commit [`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Due to the urgency of the upcoming 2.4 code freeze, I'm going to open this PR to collect any feedback. This can be closed if you prefer to continue to the work in the original PR. That would be my preference, yes, especially if it means less work for you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94805/testReport)** for PR 21889 at commit [`8d822ee`](https://github.com/apache/spark/commit/8d822eea805e1b2dc40b866ca8ac4893e53ad51b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94790/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94790/testReport)** for PR 21889 at commit [`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94790/testReport)** for PR 21889 at commit [`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94785/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94785 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)** for PR 21889 at commit [`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94785 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94785/testReport)** for PR 21889 at commit [`1c0c4bf`](https://github.com/apache/spark/commit/1c0c4bf14172dd2257fe1d00fc0aeed78aa1cb84). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94731/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94731/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94731/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 @ajacques I added a commit to enable schema pruning by default. It's a little more complete than your commit to do the same. Please rebase off my branch and remove your commit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 >> @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? > I have the hack I used originally, but I haven't tried finding a better solution yet. It could take some time to understand the underlying problem/incompatibility/misunderstanding/etc. I spent some time yesterday digging deeper into why the hack I wrote worked, and I think I understand now. Practically speaking, my follow-on PR will be about the same as the commit I removed. However, I can support it with some explanatory comments instead of just "this throws an exception sometimes". --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94536/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94536 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94536/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 From a cursory look, the last failure looks unrelated. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @gatorsmile Do you think there is a on deterministic failure in this change that causes it to inconsistently fail? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94499/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94499/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94503/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94503 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94503/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94503/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94499/testReport)** for PR 21889 at commit [`51f0dc5`](https://github.com/apache/spark/commit/51f0dc59c6403aa862e18ff0192fc37b87d22320). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? I have the hack I used originally, but I haven't tried finding a better solution yet. It could take some time to understand the underlying problem/incompatibility/misunderstanding/etc. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 @ajacques Please rebase off my branch. @gatorsmile I don't recall seeing that error before. Any idea for how I can reproduce and debug? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21889 I hit the following error in my local environment. ``` sbt.ForkMain$ForkError: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 220.0 failed 1 times, most recent failure: Lost task 0.0 in stage 220.0 (TID 465, localhost, executor driver): java.lang.IllegalArgumentException: Length -67059888 and offset 140049531604288must be non-negative at org.apache.spark.unsafe.memory.MemoryBlock.(MemoryBlock.java:64) at org.apache.spark.unsafe.memory.OffHeapMemoryBlock.(OffHeapMemoryBlock.java:26) at org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.getBytesAsUTF8String(OffHeapColumnVector.java:221) at org.apache.spark.sql.execution.vectorized.WritableColumnVector.getUTF8String(WritableColumnVector.java:382) at org.apache.spark.sql.vectorized.ColumnarArray.getUTF8String(ColumnarArray.java:127) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:617) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) at org.apache.spark.scheduler.Task.run(Task.scala:130) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:406) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) ``` Could you turn on the flag in the PR? I want to trigger the tests multiple times in the PR? @ajacques --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Are we waiting for @gatorsmile's go-ahead and merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94409/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94409/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94409/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94408/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94408 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94408/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > just for clarification, so now .. there no outstanding bugs, some tests are ignored per #21320 (comment) and left comments were mostly addressed. Did i understand correctly? The ignored testsâand the scenarios they are intended to testâwill fail with a runtime exception if this feature is enabled. I put forward a fix in `ParquetReadSupport.scala`, but @gatorsmile didn't want to address that in this PR. Otherwise, there are no known bugs with this patch. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94408 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94408/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94406/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94406 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94406/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 just for clarification, so now .. there no outstanding bugs, some tests are ignored per https://github.com/apache/spark/pull/21320#issuecomment-406353694 and left comments were mostly addressed. Did i understand correctly? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 See https://github.com/apache/spark/pull/21320#issuecomment-406353694 for @gatorsmile's request to move the changes to `ParquetReadSupport.scala` to another PR. There was another, unrelated bug reported by @jainaks and addressed in https://github.com/apache/spark/pull/21320#issuecomment-408588685. AFAIK, there's nothing outstanding blocking this PR from being merged as I stated in https://github.com/apache/spark/pull/21889#issuecomment-410557228. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 That comment is before https://github.com/apache/spark/pull/21889#issuecomment-408330791. I am okay in general but want to be clear if I'm ignoring his decision or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 >> but @gatorsmile wants to review it in a follow-on PR. > Where did he say it after the comment above? It was my interpretation of this comment: https://github.com/apache/spark/pull/21320#issuecomment-406353694 @gatorsmile, @HyukjinKwon Do we wish to block this PR to fix the issue with it enabled? It's not clear what your expectations are for this PR. 1. Are you okay with it not 100% working if it's disabled by default 2. Do you want this issue to be fixed at the cost of bringing more changes into this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 > but @gatorsmile wants to review it in a follow-on PR. I need a confirmation from @gatorsmile. I don't want to ignore his decision here in > Just FYI, we are unable to merge it if it has a correctness bug. @ajacques, thanks. I overlooked the recent changes made. Will take another look soon but don't block on this since most of them look addressed from a cursory look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94406 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94406/testReport)** for PR 21889 at commit [`23d03fb`](https://github.com/apache/spark/commit/23d03fb9f865053dc1e1da77532271177d8002b6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @HyukjinKwon Looks like most of your comments have been already addressed, but I've gone ahead and made a few more tweaks to help this get merged. Please let me know if any blocking comments have been missed. As mentioned: This feature is not known to have any regressions in the default, disabled state. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 > but @gatorsmile wants to review it in a follow-on PR. Where did he say it after the comment above? Also why don't you address my comments if you're going to push more changes then. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Assuming from #21889 (comment), we shouldn't have any identified bug here. What kind of bugs left to be fixed? That bug was address by b50ddb4. We still need to fix the bug underlying the failing (ignored) test case. I have a tentative fix for that, but @gatorsmile wants to review it in a follow-on PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 Assuming from https://github.com/apache/spark/pull/21889#issuecomment-408330791, we shouldn't have any identified bug here. What kind of bugs left to be fixed? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21889 Can we address the comments I left on that PR too? Looks that's the only way to get through this? FWIW, since https://github.com/apache/spark/commit/51bee7aca13451167fa3e701fcd60f023eae5e61 is merged, we can now contribute to all people involved here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Is there anything I can do to help with this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Jenkins build successful. Any PR comments/blockers to merge for phase 1? cc @HyukjinKwon, @gatorsmile, @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94252/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94252/testReport)** for PR 21889 at commit [`8d7f4bc`](https://github.com/apache/spark/commit/8d7f4bc1874f8ae3c2cda8e5aa96a8647a56128d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Alright to make sure we're all on the same page, it sounds like we're ready to merge this PR pending: > > * Successful build by Jenkins > * Any PR comments from a maintainer > > This feature will be merged in disabled state and can't be enabled until the next PR is merged, but we do not expect any regression in behavior in the default disabled state. I agree. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Alright to make sure we're all on the same page, it sounds like we're ready to merge this PR pending: * Successful build by Jenkins * Any PR comments from a maintainer This feature will be merged in disabled state and can't be enabled until the next PR is merged, but we do not expect any regression in behavior in the default disabled state. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94252/testReport)** for PR 21889 at commit [`8d7f4bc`](https://github.com/apache/spark/commit/8d7f4bc1874f8ae3c2cda8e5aa96a8647a56128d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman Is it related to this revert in ParquetReadSupport.scala? I re-added this logic and all 32 tests in ParquetSchemaPruningSuite passed. Yes. That's what we need to work on in the next PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman Is it related to [this revert in ParquetReadSupport](https://github.com/apache/spark/pull/21889/commits/0312a5188f0d6c9fc5304195dbdc703bf0aa3fb7#diff-245e70c1f41e353e34cf29bd00fd9029L86). I re-added this logic and all 32 tests in ParquetSchemaPruningSuite passed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 I've pushed a commit to restore the original test coverage while also ensuring determinism of the output. Don't ask me how I did it. It's a secret! The test that was failing before it was kinda passing is now failing again so I marked it ignored so it wouldn't break Jenkins. And I reverted the commit that enabled this feature by default, because it's still broken. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > select id, name.middle, address from temp - Works > select name.middle, address from temp - Fails > select name.middle from temp - Works > select name.middle, id, address from temp - Works > select name.middle, address, id from temp - Works Removing the `order by` clause from your test query caused it to fail, but it has nothing to do with ordering. It appears that the failure in this case is manifested when the file scan schema is exactly the `name.middle` and `address` columns. Introducing the `order by` clauses in the test suite queries gave them necessary determinism for checking query answers, but these modifications also altered the file scan schema. I need to fix the tests, but I think that the failure underlying the previously ignored test case has not been resolved after all. It was just a case of confusing coincidence. Unfortunately we're still not ready to merge this PR yet. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman `select id, name.middle, address from temp` - **Works** `select name.middle, address from temp` - **Fails** --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Test build #94228 has finished for PR 21889 at commit 92901da. The test failure appears to be unrelated to this PR. Is it just me or has the test suite become flakier in the past few months? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > The tests as committed pass for me, but I removed the order by id and I got that error. Are you saying it works with the specific query in my comment? @ajacques Please try this query: ``` select id, name.middle, address from temp ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21889 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94228/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94228 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94228/testReport)** for PR 21889 at commit [`92901da`](https://github.com/apache/spark/commit/92901da3785ce94db501a4c3d9be6316cfbf29a9). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > The tests as committed pass for me, but I removed the order by id and I got that error. Are you saying it works with the specific query in my comment? Oh! I didn't notice you changed the query. Okay. I'll take a closer look. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 The tests as committed pass for me, but I removed the `order by id` and I got that error. Are you saying it works with the specific query in my comment? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman: I've rebased on top of your changes and pushed. I'm seeing the following: That test passes for me locally. Also, I inspected your branch and could not find any errors in the rebase. What commit hash are you testing locally? I'm using `92901da3785ce94db501a4c3d9be6316cfbf29a9`. Please ensure we're on the same commit. If so, try doing an `sbt clean` and running your test again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman: I've rebased on top of your changes and pushed. I'm seeing the following That's the test case that I "unignored". It was passing. There must be some simple explanation. I will look into it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman: I've rebased on top of your changes and pushed. I'm seeing the following: Given the following schema: ``` root |-- id: integer (nullable = true) |-- name: struct (nullable = true) ||-- first: string (nullable = true) ||-- middle: string (nullable = true) ||-- last: string (nullable = true) |-- address: string (nullable = true) |-- pets: integer (nullable = true) |-- friends: array (nullable = true) ||-- element: struct (containsNull = true) |||-- first: string (nullable = true) |||-- middle: string (nullable = true) |||-- last: string (nullable = true) |-- relatives: map (nullable = true) ||-- key: string ||-- value: struct (valueContainsNull = true) |||-- first: string (nullable = true) |||-- middle: string (nullable = true) |||-- last: string (nullable = true) |-- p: integer (nullable = true) ``` The query: `select name.middle, address from temp` throws: ``` Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value at 0 in block -1 in file file:/private/var/folders/ss/cw601dzn59b2nygs8k1bs78x75lhr0/T/spark-cab140ca-cbba-4dc1-9fe5-6ae739dab70a/contacts/p=2/part-0-91d2abf5-625f-4080-b34c-e373b89c9895-c000.snappy.parquet at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251) at org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207) at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:186) ... 20 more Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:99) at org.apache.parquet.io.GroupColumnIO.getFirst(GroupColumnIO.java:99) at org.apache.parquet.io.PrimitiveColumnIO.getFirst(PrimitiveColumnIO.java:97) at org.apache.parquet.io.PrimitiveColumnIO.isFirst(PrimitiveColumnIO.java:92) at org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:278) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147) at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109) at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165) at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109) at org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137) at org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222) ... 25 more ``` No root cause yet, but I noticed this while working with the unit tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21889 **[Test build #94228 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94228/testReport)** for PR 21889 at commit [`92901da`](https://github.com/apache/spark/commit/92901da3785ce94db501a4c3d9be6316cfbf29a9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman: [This one](https://github.com/apache/spark/pull/21889/files#diff-0c6c7481232e9637b91c179f1005426aR120)? I just enabled it on my branch and the test passed. Was it fixed by your latest changes or am I missing something? ``` Expected: struct,address:string> Actual: fileSourceScanSchemata = {ArrayBuffer@12560} "ArrayBuffer" size = 1 0 = {StructType@15492} "StructType" size = 3 0 = {StructField@15494} "StructField(id,IntegerType,true)" 1 = {StructField@15495} "StructField(name,StructType(StructField(middle,StringType,true)),true)" 2 = {StructField@15496} "StructField(address,StringType,true)" ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Are there any other blockers to enabling this by default now that @mallman fixed the currently known broken queries? The functionality exercised by the ignored test in `ParquetSchemaPruningSuite.scala` is still broken. That's something we're hoping to fix in a follow on PR. This PR has to be merged first. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org