viirya commented on a change in pull request #28898:
URL: https://github.com/apache/spark/pull/28898#discussion_r448774792
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NestedColumnAliasing.scala
##########
@@ -172,13 +188,23 @@ object NestedColumnAliasing {
(f, Alias(f, s"_gen_alias_${exprId.id}")(exprId, Seq.empty, None))
}
+
+ // Do deduplication based on semanticEquals, and then sum.
+ val nestedFieldNum = nestedFieldToAlias
+ .foldLeft(Seq[ExtractValue]()) {
+ (unique, curr) => if (!unique.exists(curr._1.semanticEquals(_))) {
+ curr._1 +: unique
+ } else {
+ unique
+ }
+ }
+ .map { t => totalFieldNum(t.dataType) }
+ .sum
Review comment:
No, I mean this comment thread.
I am not sure if you are aware of it. The reason you need to deduplicate
here, is because the semantically same `ExtractValue`s apply on attributes with
different qualifier, e.g. there are two `name.first`, but one refers to `name`
with qualifier `a` and another refers to qualifier `b`.
I did a test using your query and cleaned up all qualifiers as I showed, it
works well.
And what I said in above comment is, you select arbitrary one `ExtractValue`
from these `ExtractValue` with different qualifiers, but later we will look
into the map using given `ExtractValue`. You might fail a case that you select
the `name.first` with qualifier `a`, but later you look at the map using
`name.first` with qualifier `b`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]