viirya commented on a change in pull request #32586:
URL: https://github.com/apache/spark/pull/32586#discussion_r636531199
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
##########
@@ -167,7 +167,28 @@ class EquivalentExpressions {
* Returns all the equivalent sets of expressions.
*/
def getAllEquivalentExprs: Seq[Seq[Expression]] = {
- equivalenceMap.values.map(_.toSeq).toSeq
+ equivalenceMap.values.map(_.toSeq).toSeq.sortBy(_.head)(new
ExpressionContainmentOrdering)
+ }
+
+ /**
+ * Orders [Expression] by parent/child relations. The child expression is
smaller
+ * than parent expression. If there is child-parent relationships among the
subexpressions,
+ * we want the child expressions come first than parent expressions, so we
can replace
+ * child expressions in parent expressions with subexpression evaluation.
Note that
+ * this is not for general expression ordering. For example, two irrelevant
expressions
+ * will be considered as e1 < e2 and e2 < e1 by this ordering. But for the
usage here,
+ * the order of irrelevant expressions does not matter.
+ */
+ class ExpressionContainmentOrdering extends Ordering[Expression] {
+ override def compare(x: Expression, y: Expression): Int = {
+ if (x.semanticEquals(y)) {
+ 0
+ } else if (x.find(_.semanticEquals(y)).isDefined) {
Review comment:
Ran `TPCDSQuerySuite`.
Before (master):
```
23.233160578 seconds
22.501728011 seconds
23.547332524 seconds
```
After:
```
23.995751468 seconds
22.262832936 seconds
21.503776059 seconds
```
I don't see significant difference there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]