[GitHub] [spark] HyukjinKwon commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet

GitBox Wed, 25 Mar 2020 03:21:34 -0700

HyukjinKwon commented on a change in pull request #27728: 
[SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][test-hive1.2] Nested 
Column Predicate Pushdown for Parquet
URL: https://github.com/apache/spark/pull/27728#discussion_r397745115


 ##########
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -2049,6 +2049,17 @@ object SQLConf {
       .booleanConf
       .createWithDefault(true)
 
+  val NESTED_PREDICATE_PUSHDOWN_ENABLED =
+    buildConf("spark.sql.optimizer.nestedPredicatePushdown.enabled")
+      .internal()
+      .doc("When true, Spark tries to push down predicates for nested columns 
and or names " +
+        "containing `dots` to data sources. Currently, Parquet implements both 
optimizations " +
+        "while ORC only supports predicates for names containing `dots`. The 
other data sources" +
+        "don't support this feature yet.")
+      .version("3.0.0")
+      .booleanConf
+      .createWithDefault(true)
 
 Review comment:
   Besides of https://github.com/apache/spark/pull/27728#discussion_r397742247, 
one more concern about enabling by default is, after this is enabled, we will 
push down `a.b` as `b` in `a` by default.
   
   I don't think DSv1 pushed down non-existent columns before; however, now 
DSv1 implementations should understand non-existent column  named `a.b`.
   
   I think we shouldn't assume the DSv1 downstream sources handle non-existent 
column handling by default. Think about constructing query strings from filters 
like JDBC - it will fail and the implementations have to be fixed, or this 
configuration has to be disabled. However, this configuration is none or all.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #27728: [SPARK-25556][SPARK-17636][SPARK-31026][SPARK-31060][SQL][test-hive1.2] Nested Column Predicate Pushdown for Parquet

Reply via email to