Datasource v2 can not prune file source partitions when readDataSchema is empty

Heng Su Mon, 13 Sep 2021 23:00:36 -0700

Hi, community:

We use spark 3.1.2


In PruneFileSourcePartitions rule, the FileScan::withFilters is called to push 
partition prune filter(and this is the only place this function can be called), 
but it has a constraint that “scan.readDataSchema.nonEmpty” 
(https://github.com/apache/spark/blob/de351e30a90dd988b133b3d00fa6218bfcaba8b8/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala#L114)

We use spark sql in custom catalog and execute the count sql like:   select 
count(*) from catalog.db.tbl where dt=‘0812’ ,  in which dt is a partition key.

In this case the scan.readDataSchema is empty indeed and no scan partition 
prune performed,  which cause scan all partition at last.

Is it something I misunderstood? Any help is appreciated

Than you.



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Datasource v2 can not prune file source partitions when readDataSchema is empty

Reply via email to