[GitHub] [spark] minyyy commented on a change in pull request #35850: [SPARK-38529][SQL] Fix a bug that GeneratorNestedColumnAliasing is incorrectly applied to non-Explode generators.

GitBox Tue, 15 Mar 2022 09:52:34 -0700


minyyy commented on a change in pull request #35850:
URL: https://github.com/apache/spark/pull/35850#discussion_r827204043




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -321,6 +321,40 @@ object GeneratorNestedColumnAliasing {
     // need to prune nested columns through Project and under Generate. The 
difference is
     // when `nestedSchemaPruningEnabled` is on, nested columns will be pruned 
further at
     // file format readers if it is supported.
+
+    // There are [[ExtractValue]] expressions on or not on the output of the 
generator. Generator
+    // can also have different types:
+    // 1. For [[ExtractValue]]s not on the output of the generator, 
theoretically speaking, there
+    //    lots of expressions that we can push down, including non 
ExtractValues and GetArrayItem
+    //    and GetMapValue. But to be safe, we only handle GetStructField and 
GetArrayStructFields.
+    // 2. For [[ExtractValue]]s on the output of the generator, the situation 
depends on the type
+    //    of the generator expression.
+    //   2.1 Inline

Review comment:
       The reason that I don't remove `Inline` from `canPruneGenerator` is that 
I don't want to change our existing behavior of the pushdown of exprs not on 
generators, if I change `canPruneGenerator` then less expressions enter this 
branch, we call NestedColumnAliasing for less expressions. But if you are fine 
with it, I can simply change to it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] minyyy commented on a change in pull request #35850: [SPARK-38529][SQL] Fix a bug that GeneratorNestedColumnAliasing is incorrectly applied to non-Explode generators.

Reply via email to