[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

GitBox Tue, 07 Jun 2022 23:32:44 -0700


huaxingao commented on PR #36781:
URL: https://github.com/apache/spark/pull/36781#issuecomment-1149518253


   The fix looks good but the explain result bothers me. Here is what I got 
from the explain result:
   ```
   spark.read.parquet(dir.getCanonicalPath).filter("isnotnull(f)").explain(true)
   
   == Physical Plan ==
   *(1) Filter isnotnull(f#0)
   +- *(1) ColumnarToRow
      +- FileScan parquet [f#0] Batched: true, DataFilters: [isnotnull(f#0)], 
Format: Parquet, Location: InMemoryFileIndex(1 
paths)[file:/private/var/folders/pt/_5f4sxy56x70dv9zpz032f0m0000gn/T/spark-42...,
 PartitionFilters: [], PushedFilters: [IsNotNull(f)], ReadSchema: 
struct<f:array<int>>
   ```
   
   The explain has `PushedFilters: [IsNotNull(f)]` but the filter actually is 
not pushed down.
   
   The problem is that the pushed down filter information in explain comes from 
[here](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L368).
 As long as the data filters don't include any metadata col filters and can be 
translated OK, Spark assumes the filters can be pushed down OK. 
   
   I am thinking if we should just fix the repeated primitive types for now and 
fix the explain in another PR, or we should fix the explain problem in this PR 
too.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on pull request #36781: [SPARK-39393][SQL] Parquet data source only supports push-down predicate filters for non-repeated primitive types

Reply via email to