[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

via GitHub Thu, 09 Mar 2023 20:21:49 -0800


ryan-johnson-databricks commented on code in PR #40321:
URL: https://github.com/apache/spark/pull/40321#discussion_r1131931758



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -1033,9 +1033,12 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
         requiredAttrIds.contains(a.exprId)) =>
         s.withMetadataColumns()
       case p: Project if p.metadataOutput.exists(a => 
requiredAttrIds.contains(a.exprId)) =>
+        // Inject the requested metadata columns into the project's output, if 
not already present.

Review Comment:
   I hit a weird endless loop with this while debugging this `SubqueryAlias` 
issue. Basically, if the plan root already has a metadata attribute, but it's 
not available because the `SubqueryAlias` blocked it, this rule kept endlessly 
(re)appending the metadata column to the projections below the `SubqueryAlias`. 
Once the rule ran 100 times (leaving 100 copies of `_metadata` in the `Project` 
output), the endless loop detector kicked in and killed it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ryan-johnson-databricks commented on a diff in pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs

Reply via email to