[GitHub] [spark] viirya commented on a change in pull request #28761: [SPARK-25557][SQL][test-hadoop2.7][test-hive1.2] Nested column predicate pushdown for ORC

GitBox Thu, 06 Aug 2020 09:40:32 -0700


viirya commented on a change in pull request #28761:
URL: https://github.com/apache/spark/pull/28761#discussion_r466542348




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala
##########
@@ -60,10 +61,8 @@ case class OrcScanBuilder(
         // changed `hadoopConf` in executors.
         OrcInputFormat.setSearchArgument(hadoopConf, f, schema.fieldNames)
       }
-      val dataTypeMap = schema.map(f => quoteIfNeeded(f.name) -> 
f.dataType).toMap
-      // TODO (SPARK-25557): ORC doesn't support nested predicate pushdown, so 
they are removed.
-      val newFilters = filters.filter(!_.containsNestedColumn)
-      _pushedFilters = OrcFilters.convertibleFilters(schema, dataTypeMap, 
newFilters).toArray

Review comment:
       The config is for DSv1 compatibility issues, and so only controls DSv1 
file-based data sources. For DSv2, it is up to the DS implementation. Once the 
interface is implemented, we think it supports nested filter pushdown. Our orc 
filter tests cover both v1 and v2.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #28761: [SPARK-25557][SQL][test-hadoop2.7][test-hive1.2] Nested column predicate pushdown for ORC

Reply via email to