[GitHub] [spark] cloud-fan commented on a diff in pull request #39925: [SPARK-41812][SPARK-41823][CONNECT][SQL][PYTHON] Resolve ambiguous columns issue in `Join`

via GitHub Thu, 09 Feb 2023 23:28:45 -0800


cloud-fan commented on code in PR #39925:
URL: https://github.com/apache/spark/pull/39925#discussion_r1102376289



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/ColumnResolutionHelper.scala:
##########
@@ -369,4 +382,45 @@ trait ColumnResolutionHelper extends Logging {
       throws = true,
       allowOuter = allowOuter)
   }
+
+  private def resolveExpressionByPlanId(
+      e: Expression,
+      q: LogicalPlan): Expression = {
+    if (!e.exists(_.getTagValue(LogicalPlan.PLAN_ID_TAG).nonEmpty)) {
+      return e
+    }
+
+    e match {
+      case u: UnresolvedAttribute =>
+        resolveUnresolvedAttributeByPlanId(u, q).getOrElse(u)
+      case l: LeafExpression => l
+      case _ =>
+        e.mapChildren(c => resolveExpressionByPlanId(c, q))
+    }
+  }
+
+  private def resolveUnresolvedAttributeByPlanId(
+      u: UnresolvedAttribute,
+      q: LogicalPlan): Option[AttributeReference] = {
+    val planIdOpt = u.getTagValue(LogicalPlan.PLAN_ID_TAG)
+    if (planIdOpt.isEmpty) return None
+    val planId = planIdOpt.get
+    logDebug(s"Extract plan_id $planId from $u")
+
+    val planOpt = 
q.find(_.getTagValue(LogicalPlan.PLAN_ID_TAG).contains(planId))
+    if (planOpt.isEmpty) return None
+    val plan = planOpt.get
+    logDebug(s"Find child node $plan with plan_id==$planId")
+
+    try {
+      plan.resolve(u.nameParts, conf.resolver) match {
+        case Some(attr: AttributeReference) if plan.outputSet.contains(attr) 
=> Some(attr)

Review Comment:
   we don't need to do this check. Even if the attribute reference is dangling, 
we should still use it and fail later. You can check the behavior of normal 
dataframe using `df1.select(df2.col)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #39925: [SPARK-41812][SPARK-41823][CONNECT][SQL][PYTHON] Resolve ambiguous columns issue in `Join`

Reply via email to