[GitHub] [spark] rdblue commented on a change in pull request #28027: [SPARK-31255][SQL] Add SupportsMetadataColumns to DSv2

GitBox Tue, 27 Oct 2020 12:18:27 -0700


rdblue commented on a change in pull request #28027:
URL: https://github.com/apache/spark/pull/28027#discussion_r512961848




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala
##########
@@ -880,6 +880,12 @@ case class SubqueryAlias(
     val qualifierList = identifier.qualifier :+ alias
     child.output.map(_.withQualifier(qualifierList))
   }
+
+  override def metadataOutput: Seq[Attribute] = {
+    val qualifierList = identifier.qualifier :+ alias
+    child.metadataOutput.map(_.withQualifier(qualifierList))
+  }

Review comment:
       They are _eventually_ part of the output, but they can't be at first 
because `*` expansion uses all of `output`. If we added them immediately, we 
would get metadata columns in a `select *`.
   
   Instead, we add the metadata columns to this and then update column 
resolution to look up columns here. The result is that we can resolve 
everything just like normal, including `*`, but the columns are missing from 
output. Then the new analyzer rule adds the columns to the output if they are 
resolved, but missing. Since the parent node is already resolved, we know that 
this is safe and happens after `*` expansion.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] rdblue commented on a change in pull request #28027: [SPARK-31255][SQL] Add SupportsMetadataColumns to DSv2

Reply via email to