viirya commented on a change in pull request #32586:
URL: https://github.com/apache/spark/pull/32586#discussion_r636514764
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -167,7 +167,28 @@ class EquivalentExpressions {
* Returns all the equivalent sets of expressions.
*/
def getAllEquivalentExprs: Seq[Seq[Expression]] = {
- equivalenceMap.values.map(_.toSeq).toSeq
+ equivalenceMap.values.map(_.toSeq).toSeq.sortBy(_.head)(new
ExpressionContainmentOrdering)
+ }
+
+ /**
+ * Orders [Expression] by parent/child relations. The child expression is
smaller
+ * than parent expression. If there is child-parent relationships among the
subexpressions,
+ * we want the child expressions come first than parent expressions, so we
can replace
+ * child expressions in parent expressions with subexpression evaluation.
Note that
+ * this is not for general expression ordering. For example, two irrelevant
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the
usage here,
+ * the order of irrelevant expressions does not matter.
+ */
+ class ExpressionContainmentOrdering extends Ordering[Expression] {
+ override def compare(x: Expression, y: Expression): Int = {
+ if (x.semanticEquals(y)) {
+ 0
+ } else if (x.find(_.semanticEquals(y)).isDefined) {
Review comment:
BTW, I think better approach is to sort after filter (e.g. size > 1 in
most use-case), because the number of sub-exprs should be smaller.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]