dbtsai commented on a change in pull request #27728:
[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][test-hive1.2] Nested
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#discussion_r398314267
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2049,6 +2049,17 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val NESTED_PREDICATE_PUSHDOWN_ENABLED =
+ buildConf("spark.sql.optimizer.nestedPredicatePushdown.enabled")
+ .internal()
+ .doc("When true, Spark tries to push down predicates for nested columns
and or names " +
+ "containing `dots` to data sources. Currently, Parquet implements both
optimizations " +
+ "while ORC only supports predicates for names containing `dots`. The
other data sources" +
+ "don't support this feature yet.")
+ .version("3.0.0")
+ .booleanConf
+ .createWithDefault(true)
Review comment:
Since the filter apis will be enhanced to support nested columns and column
name containing `dots`, it will be nice to introduce it in a major release.
It's a good idea! We can make another PR to turn this feature on for
specific data sources in a separate PR. This PR already grows too big.
Thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]