[GitHub] [spark] steven-aerts commented on a change in pull request #33191: [SPARK-35985][SQL] push partitionFilters for empty readDataSchema

GitBox Wed, 07 Jul 2021 10:22:33 -0700


steven-aerts commented on a change in pull request #33191:
URL: https://github.com/apache/spark/pull/33191#discussion_r665567882




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PruneFileSourcePartitions.scala
##########
@@ -120,7 +120,7 @@ private[sql] object PruneFileSourcePartitions
 
     case op @ PhysicalOperation(projects, filters,
         v2Relation @ DataSourceV2ScanRelation(_, scan: FileScan, output))
-        if filters.nonEmpty && scan.readDataSchema.nonEmpty =>

Review comment:
       @cloud-fan I tried what you proposed and added `&& 
scan.readPartitionSchema.nonEmpty`.
   Problem is that this prevents any data filter from being pushed down when 
there is no partition filter.  As the right part of the condition at line 128 ` 
|| (dataFilters.nonEmpty && scan.dataFilters.isEmpty)` can then never be true.
   
   This also causes [some regression tests in tue avroSuite to 
fail](https://github.com/steven-aerts/spark/runs/3007716757?check_suite_focus=true).
 
   
   So I rolled back to the original proposal.
   
   Is this ok for you?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] steven-aerts commented on a change in pull request #33191: [SPARK-35985][SQL] push partitionFilters for empty readDataSchema

Reply via email to