[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2017-02-03 Thread xwu0226
Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/16156
  
https://issues.apache.org/jira/browse/SPARK-19409 is resolved to upgrade to 
parquet-1.8.2 that fixes this issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16156
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2017-02-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16156
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72253/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16156
  
**[Test build #72253 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72253/testReport)**
 for PR 16156 at commit 
[`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2017-02-01 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16156
  
**[Test build #72253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72253/testReport)**
 for PR 16156 at commit 
[`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16156
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16156
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/69688/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16156
  
**[Test build #69688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69688/consoleFull)**
 for PR 16156 at commit 
[`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16156
  
@liancheng Ah, thank you. I should have tested this first.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16156
  
Would there be another way to avoid try-catch? I think it is a normal 
reading path logic and it seems it might not be safe to rely on exception 
handling. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16156
  
Hey @xwu0226 @gatorsmile, did some investigation, and I don't think this is 
a bug now. Please refer to [my JIRA comment][1] for more details.

[1]: 
https://issues.apache.org/jira/browse/SPARK-18539?focusedCommentId=15723747=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15723747


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread xwu0226
Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/16156
  
For normal parquet reader case, we have the following code
```Scala
} else {
logDebug(s"Falling back to parquet-mr")
// ParquetRecordReader returns UnsafeRow
val reader = pushed match {
  case Some(filter) =>
new ParquetRecordReader[UnsafeRow](
  new ParquetReadSupport,
  FilterCompat.get(filter, null))
  case _ =>
new ParquetRecordReader[UnsafeRow](new ParquetReadSupport)
}
reader.initialize(split, hadoopAttemptContext)
reader
  }
```
I am wondering we could try-catch the` reader.initialize` and recreate 
create the ParquetRecordReader without the filter and initialize again. 
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread xwu0226
Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/16156
  
@liancheng I see. In normal parquet reader, ParquetFileFormat is using 
hadoop's `ParquetRecordReader`, which we can not add such toleration code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16156
  
@xwu0226 Just tested that this issue also affects the normal Parquet reader 
(by setting `spark.sql.parquet.enableVectorizedReader` to `false`). That's also 
why #9940 couldn't take a similar approach as this one. Because 
`ParquetRecordReader` is a 3rd party class provided by parquet-mr.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16156
  
BTW, I think this PR is a cleaner fix than #9940, which introduces a 
temporary metadata while merging two `StructType`s and erased it in a later 
phase. We may want to remove the hack done in #9940 later if possible. But for 
now, let's make the fix as surgical as possible to lower the risk for 2.1 
release.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread liancheng
Github user liancheng commented on the issue:

https://github.com/apache/spark/pull/16156
  
Actually, PR #9940 should have already fixed this issue. I'm checking why 
it doesn't work under 2.0.1 for 2.0.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread xwu0226
Github user xwu0226 commented on the issue:

https://github.com/apache/spark/pull/16156
  
@gatorsmile @liancheng Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16156: [SPARK-18539][SQL]: tolerate pushed-down filter on non-e...

2016-12-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16156
  
**[Test build #69688 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/69688/consoleFull)**
 for PR 16156 at commit 
[`096ab18`](https://github.com/apache/spark/commit/096ab18887c40761eb7ba79e9c406fe8ca6ce7c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org