[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

via GitHub Thu, 09 Mar 2023 20:24:14 -0800


ryan-johnson-databricks commented on code in PR #40321:
URL: https://github.com/apache/spark/pull/40321#discussion_r1131931758



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1033,9 +1033,12 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
         requiredAttrIds.contains(a.exprId)) =>
         s.withMetadataColumns()
       case p: Project if p.metadataOutput.exists(a => 
requiredAttrIds.contains(a.exprId)) =>
+        // Inject the requested metadata columns into the project's output, if 
not already present.

Review Comment:
   I hit a weird endless loop with this while debugging this `SubqueryAlias` 
issue. Basically, if the plan root already has a metadata attribute (perhaps 
added manually by a query rewrite), but it's not available because the 
`SubqueryAlias` blocked it, this rule kept endlessly (re)appending the metadata 
column to the projections below the `SubqueryAlias`. Once the rule ran 100 
times (leaving 100 copies of `_metadata` in the `Project` output), the endless 
loop detector kicked in and killed it.
   
   I don't think filtering by `inputAttrs` helps, when the problem is what's 
already in the `output` we're appending to?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

Reply via email to