[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/16156 https://issues.apache.org/jira/browse/SPARK-19409 is resolved to upgrade to parquet-1.8.2 that fixes this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16156 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16156 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72253/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16156 **[Test build #72253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72253/testReport)** for PR 16156 at commit [`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16156 **[Test build #72253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72253/testReport)** for PR 16156 at commit [`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16156 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16156 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69688/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16156 **[Test build #69688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69688/consoleFull)** for PR 16156 at commit [`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16156 @liancheng Ah, thank you. I should have tested this first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16156 Would there be another way to avoid try-catch? I think it is a normal reading path logic and it seems it might not be safe to rely on exception handling. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 Hey @xwu0226 @gatorsmile, did some investigation, and I don't think this is a bug now. Please refer to [my JIRA comment][1] for more details. [1]: https://issues.apache.org/jira/browse/SPARK-18539?focusedCommentId=15723747=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15723747 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/16156 For normal parquet reader case, we have the following code ```Scala } else { logDebug(s"Falling back to parquet-mr") // ParquetRecordReader returns UnsafeRow val reader = pushed match { case Some(filter) => new ParquetRecordReader[UnsafeRow]( new ParquetReadSupport, FilterCompat.get(filter, null)) case _ => new ParquetRecordReader[UnsafeRow](new ParquetReadSupport) } reader.initialize(split, hadoopAttemptContext) reader } ``` I am wondering we could try-catch the` reader.initialize` and recreate create the ParquetRecordReader without the filter and initialize again. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/16156 @liancheng I see. In normal parquet reader, ParquetFileFormat is using hadoop's `ParquetRecordReader`, which we can not add such toleration code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 @xwu0226 Just tested that this issue also affects the normal Parquet reader (by setting `spark.sql.parquet.enableVectorizedReader` to `false`). That's also why #9940 couldn't take a similar approach as this one. Because `ParquetRecordReader` is a 3rd party class provided by parquet-mr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 BTW, I think this PR is a cleaner fix than #9940, which introduces a temporary metadata while merging two `StructType`s and erased it in a later phase. We may want to remove the hack done in #9940 later if possible. But for now, let's make the fix as surgical as possible to lower the risk for 2.1 release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user liancheng commented on the issue: https://github.com/apache/spark/pull/16156 Actually, PR #9940 should have already fixed this issue. I'm checking why it doesn't work under 2.0.1 for 2.0.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user xwu0226 commented on the issue: https://github.com/apache/spark/pull/16156 @gatorsmile @liancheng Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16156 **[Test build #69688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69688/consoleFull)** for PR 16156 at commit [`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org