filter plans

GitBox Wed, 01 Jul 2020 23:27:17 -0700


viirya commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448774792




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -172,13 +188,23 @@ object NestedColumnAliasing {
           (f, Alias(f, s"_gen_alias_${exprId.id}")(exprId, Seq.empty, None))
         }
 
+
+        // Do deduplication based on semanticEquals, and then sum.
+        val nestedFieldNum = nestedFieldToAlias
+          .foldLeft(Seq[ExtractValue]()) {
+            (unique, curr) => if (!unique.exists(curr._1.semanticEquals(_))) {
+              curr._1 +: unique
+            } else {
+              unique
+            }
+          }
+          .map { t => totalFieldNum(t.dataType)  }
+          .sum

Review comment:
       No, I mean this comment thread.
   
   I am not sure if you are aware of it. The reason you need to deduplicate 
here, is because the semantically same `ExtractValue`s apply on attributes with 
different qualifier, e.g. there are two `name.first`, but one refers to `name` 
with qualifier `a` and another refers to qualifier `b`.
   
   I did a test using your query and cleaned up all qualifiers as I showed, it 
works well.
   
   And what I said in above comment is, you select arbitrary one `ExtractValue` 
from these `ExtractValue` with different qualifiers, but later we will look 
into the map using given `ExtractValue`. You might fail a case that you select 
the `name.first` with qualifier `a`, but later you look at the map using 
`name.first` with qualifier `b`.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #28898: [SPARK-32059][SQL] Allow nested schema pruning thru window/sort/filter plans

Reply via email to