Github user dilipbiswal commented on a diff in the pull request:
https://github.com/apache/spark/pull/17330#discussion_r107343122
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/subquery.scala
---
@@ -61,6 +63,37 @@ abstract class SubqueryExpression(
}
}
+/**
+ * This expression is used to represent any form of subquery expression
namely
+ * ListQuery, Exists and ScalarSubquery. This is only used to make sure the
+ * expression equality works properly when LogicalPlan.sameResult is called
+ * on plans containing SubqueryExpression(s). This is only a transient
expression
+ * that only lives in the scope of sameResult function call. In other
words, analyzer,
+ * optimizer or planner never sees this expression type during
transformation of
+ * plans.
+ */
+case class CanonicalizedSubqueryExpr(expr: SubqueryExpression)
--- End diff --
@viirya Thank you for your suggestion. I thought about this and went
through my notes i had prepared on this while debugging this. The reason i had
opted for comparing subquery expressions as opposed to just the plans is i
wanted take advantage of expr.canonicalized which re-orders expression nicely
to maximize cache hit.
Example - plan1 - Filter In || Exists || Scalar
plan2- Filter Scalar | In | Exists
When we compare the above two plans .. all other things being equal should
cause a cache hit. I added a test case now to make sure. One other aspect is in
the future the subquery expression may evolve to hold more attributes and not
considering them didn't feel safe. The other thing is i suspect we may still
have to deal with the outer references in the same way i am handling now.
Please let me know your thoughts.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]