[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21224#discussion_r185988219
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -342,6 +342,7 @@ class ParquetFileFormat
   sparkSession.sessionState.conf.parquetFilterPushDown
 // Whole stage codegen (PhysicalRDD) is able to deal with batches 
directly
 val returningBatch = supportBatch(sparkSession, resultSchema)
+val pushDownDate = sqlConf.parquetFilterPushDownDate
--- End diff --

Ah, I see. Thank you, @cloud-fan !


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

2018-05-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21224


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

2018-05-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21224#discussion_r185975883
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -342,6 +342,7 @@ class ParquetFileFormat
   sparkSession.sessionState.conf.parquetFilterPushDown
 // Whole stage codegen (PhysicalRDD) is able to deal with batches 
directly
 val returningBatch = supportBatch(sparkSession, resultSchema)
+val pushDownDate = sqlConf.parquetFilterPushDownDate
--- End diff --

no we can't, see https://github.com/apache/spark/pull/21086


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

2018-05-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21224#discussion_r185876764
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
 ---
@@ -342,6 +342,7 @@ class ParquetFileFormat
   sparkSession.sessionState.conf.parquetFilterPushDown
 // Whole stage codegen (PhysicalRDD) is able to deal with batches 
directly
 val returningBatch = supportBatch(sparkSession, resultSchema)
+val pushDownDate = sqlConf.parquetFilterPushDownDate
--- End diff --

Can we pass `pushed` instead of declaring new `pushDownDate`? 
The following can be handled at line 345 here.

```scala
   // Try to push down filters when filter push-down is enabled.
   val pushed = if (enableParquetFilterPushDown) {
 filters
   // Collects all converted Parquet filter predicates. Notice that 
not all predicates can be
   // converted (`ParquetFilters.createFilter` returns an 
`Option`). That's why a `flatMap`
   // is used here.
  .flatMap(new 
ParquetFilters(pushDownDate).createFilter(requiredSchema, _))
  .reduceOption(FilterApi.and)
   } else {
 None
   }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21224: [SPARK-24167][SQL] ParquetFilters should not acce...

2018-05-02 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/21224

[SPARK-24167][SQL] ParquetFilters should not access SQLConf at executor side

## What changes were proposed in this pull request?

This PR is extracted from #21190 , to make it easier to backport.

`ParquetFilters` is used in the file scan function, which is executed in 
executor side, so we can't can't call `conf.parquetFilterPushDownDate` there.

## How was this patch tested?

it's tested in #21190

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21224


commit c58baad051259d7d2d54f1eb5e84c4bdac0867a6
Author: Wenchen Fan 
Date:   2018-05-03T05:20:06Z

ParquetFilters should not access SQLConf at executor side




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org